A user has reported an issue when transitioning from the Forked Tika Parser - deprecated in Fusion 5.10 - to the recommended upgrade to the Asynchronous Tika Parser
In synchronous Tika parsing, indexing and parsing are performed concurrently. This can result in slow indexing for a large number of documents, as the parser and indexer must share resources.
Asynchronous Tika parsing, on the other hand, performs parsing in the background. This allows Fusion to continue indexing documents while the parser is processing others, resulting in improved indexing performance for large numbers of documents.
Fusion 5.10 and later in Azure
Normally the asynchronous parser deploys as part of a Fusion deployment and you can verify this by using kubectl:
Which should show the pod running in Kubernetes, which looks something like this:
If this kubectl command does not return a pod you may need to make some changes to your Kubernetes configuration.
The first thing to determine is whether you are using the default Fusion charts from the Lucidworks Helm repository or if you are hosting the charts internally.
If the charts are being hosted internally, then they you to pull and extract the 5.10 or later version - so that the new services (async-parsing specifically) get deployed properly.
The fact that the pod(s) doesn't exist, indicates that the charts didn't deploy that service.
Additionally, have the take a look at the cluster events, specifically looking for something to do with failing the scheduling or something to do with the async parsing pods.
After a deploy using new Helm chart it may be that the async-parsing pod is requiring a huge amount of storage which might require more storage resources than you are able to provide. The error message may look something like this:
If that is the case you'll need to add a section to the
fusion_values.yaml similar to this:
There is one other issue that might need to be addressed. When applying that configuration with the YAML file you may see this error reported during deployment:
If you get this error message take these steps:
Delete the stateful set:
kubectl delete statefulset <statefulset-name>
Delete the PVC:
kubectl delete pvc -l <label-key>=<label-value>
Recreate the stateful set by applying your yaml file.
kubectl apply -f <statefulset-config-file.yaml>
Now follow the steps above to check that the async-parsing pod(s) is/are running.