Issue
Users may encounter a failure in Query Pipelines at the Machine Learning (ML) stage, specifically when using Named Entity Recognition (NER) or other custom models. The pipeline stops at the ML stage, and subsequent stages are not executed.
The following error is observed in the Query Pipeline logs:
ERROR [com.lucidworks.apollo.pipeline.query.stages.ml.MLQueryStage] - Failed to generate prediction in Machine Learning Stage [Stage_Name] due to Model execution error: UNKNOWN: Application error processing RPCDiagnosis
To diagnose this issue, review the logs for the ml-model-service and the specific Seldon deployment pods.
Check the
seldon-container-enginelogs for readiness failures:
{"error":"dial tcp 127.0.0.1:9500: connect: connection refused", "level":"error", "logger":"SeldonRestApi", "msg":"Ready check failed"}Verify the environment variables for the
ml-model-servicedeployment usingkubectl:
kubectl get deployment ml-model-service -o yaml | grep -A2 "JAVA_TOOL_OPTIONS"If the output does not contain the -Dcom.google.protobuf.use_unsafe_pre22_gencode=true flag, the service will fail to process gRPC calls due to a Protobuf runtime exception.
Environment
Managed Fusion 5.9.16
Kubernetes (K8s)
Seldon Core
Cause
The issue is typically caused by two distinct factors:
Missing Protobuf Flag: In Fusion version 5.9.16, a specific JVM flag is required for the Protobuf runtime to handle gRPC calls correctly. If the
JAVA_TOOL_OPTIONSenvironment variable is overwritten during deployment (instead of being augmented), this flag is dropped, leading toUnsupportedOperationExceptionon gRPC calls.Image Dependencies: Models may fail to start if the Docker image is missing the
setuptoolspackage, specifically thepkg_resourcesmodule required by the Seldon wrapper.
Resolution
Step 1: Patch ml-model-service JVM Options
Update the ml-model-service deployment to include the mandatory Protobuf flag.
Modify the
JAVA_TOOL_OPTIONSto include:
-Dcom.google.protobuf.use_unsafe_pre22_gencode=trueThis can be applied via
kubectl edit deployment ml-model-serviceor by updating the Helm values and redeploying the service.
Step 2: Update Model Dockerfile (Optional)
Ensure the custom model image includes the necessary build dependencies. Update the Dockerfile to install setuptools before other Python dependencies.
Add the following line to the
Dockerfile:
RUN pip install --no-cache-dir setuptools>=65.0.0Rebuild and push the image:
docker build -t [image_name]:[tag] .
docker push [image_name]:[tag]Step 3: Verify and Redeploy
Delete the existing ML pods to force a pull of the updated image:
kubectl delete pod -l seldon-deployment-id=[model_id]Validate that the model is generating predictions by testing the Query Pipeline in the Query Workbench.