Issue
A user has reported an issue when transitioning from the Forked Tika Parser - deprecated in Fusion 5.10 - to the recommended upgrade to the Asynchronous Tika Parser
In synchronous Tika parsing, indexing and parsing are performed concurrently. This can result in slow indexing for a large number of documents, as the parser and indexer must share resources.
Asynchronous Tika parsing, on the other hand, performs parsing in the background. This allows Fusion to continue indexing documents while the parser is processing others, resulting in improved indexing performance for large numbers of documents.
Environment
Fusion 5.10 and later in Azure
Diagnosis
Normally the asynchronous parser deploys as part of a Fusion deployment and you can verify this by using kubectl:
kubectl get pods | grep async
Which should show the pod running in Kubernetes, which looks something like this:
<YOUR_NAMESPACE>-async-parsing-0 2/2 Running
If this kubectl command does not return a pod you may need to make some changes to your Kubernetes configuration.
Resolution
The first thing to determine is whether you are using the default Fusion charts from the Lucidworks Helm repository or if you are hosting the charts internally.
If the charts are being hosted internally, then they you to pull and extract the 5.10 or later version - so that the new services (async-parsing specifically) get deployed properly.
The fact that the pod(s) doesn't exist, indicates that the charts didn't deploy that service.
Additionally, have the take a look at the cluster events, specifically looking for something to do with failing the scheduling or something to do with the async parsing pods.
After a deploy using new Helm chart it may be that the async-parsing pod is requiring a huge amount of storage which might require more storage resources than you are able to provide. The error message may look something like this:
25s Warning FailedCreate statefulset/fusion-sandbox-async-parsing create Claim async-parsing-data-claim-fusion-sandbox-async-parsing-0 for Pod fusion-sandbox-async-parsing-0 in StatefulSet fusion-sandbox-async-parsing failed error: persistentvolumeclaims "async-parsing-data-claim-fusion-sandbox-async-parsing-0" is forbidden: exceeded quota: limit-storage, requested: requests.storage=100Gi, used: requests.storage=843Gi, limited: requests.storage=918049259520 25s Warning FailedCreate statefulset/fusion-sandbox-async-parsing create Pod fusion-sandbox-async-parsing-0 in StatefulSet fusion-sandbox-async-parsing failed error: failed to create PVC async-parsing-data-claim-fusion-sandbox-async-parsing-0: persistentvolumeclaims "async-parsing-data-claim-fusion-sandbox-async-parsing-0" is forbidden: exceeded quota: limit-storage, requested: requests.storage=100Gi, used: requests.storage=843Gi, limited: requests.storage=918049259520
If that is the case you'll need to add a section to the
fusion_values.yaml
similar to this:async-parsing:
volume:
storage: 10Gi
There is one other issue that might need to be addressed. When applying that configuration with the YAML file you may see this error reported during deployment:
Error: UPGRADE FAILED: cannot patch "fusion-sandbox-async-parsing" with kind StatefulSet: StatefulSet.apps "fusion-sandbox-async-parsing" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
If you get this error message take these steps:
Delete the stateful set:
kubectl delete statefulset <statefulset-name>
Delete the PVC:
kubectl delete pvc -l <label-key>=<label-value>
Recreate the stateful set by applying your yaml file.
kubectl apply -f <statefulset-config-file.yaml>
Now follow the steps above to check that the async-parsing pod(s) is/are running.
Comments
0 comments
Article is closed for comments.