Troubleshoot JDBC V2 connector datasource job failures – Lucidworks

Issue:

The JDBC connector fails with one of the below Errors:

com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException: 
Cannot resolve Oracle CLOB data.

Timeout errors with some remote JDBC v2 datasources

Plugin can not handle the request : Timeout on blocking read for 
60000000000 NANOSECONDS

Environment:

Fusion 5.9.0 and above (however, Compatible with Fusion version: 5.4.0 and later)

Resolution:

Scenario 1: JDBC V2 - Non Oracle crawl throws errors

 "com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException: 
Cannot resolve Oracle CLOB data."

The issue arises when running the JDBC v2 connector with lucidworks.jdbc 4.1.3 (lucidworks.connector.jdbc-2.3.0.zip). Although the datasource reports success, no documents are indexed (as none appear in Query Workbench), and the counters fail to increment. However, this problem does not exist in the older version of the JDBC connector. When clients upgraded their Fusion version that included lucidworks.jdbc 4.1.3 (lucidworks.connector.jdbc-2.3.0.zip) as the latest release (until the previous month), they encountered the following issues.

Logs from the connector backend and JDBC connector plugin pods identified the occurrence of the following error:

ERROR [fetch-input-receiver.opentext_gallery-95:com.lucidworks.connector.plugins.jdbc.fetcher.processor.PageProcessor@76] - 
Error while iterating results com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException: 
Cannot resolve Oracle CLOB data.
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.internal.OracleClob
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)

Few similar issues captured :

JDBC v2 Connector: Customer with complex SQL can't index CLOBs

Explanation:

This problem affects non-Oracle datasources that involve crawling non-Oracle databases like MS SQL Server or IBM DB2. As a workaround (before the fix was available), clients could upload the Oracle JDBC driver to the Blobstore → Add → JDBC Driver. We have provided the driver for uploading. However, this is a known issue and is fixed in the latest JDBC v2 release (2.4.0- https://plugins.lucidworks.com/).

Download Oracle JDBC Driver from https://www.oracle.com/database/technologies/appdev/jdbc-downloads.html

The earlier workaround involved uploading the oracle jdbc driver to the connector-jdbc/drivers directory on the remote VM. The issue is related to classpath scanning, as the connector always scans some Oracle classes, necessitating the addition of the Oracle driver.

The JDBC V2 connector (2.3.0) introduced support for Oracle CLOB objects, but it only functions with Oracle CLOB objects. However, the new connector version (2.4.0) extends support to non-Oracle databases, such as IBM CLOB objects.

(Reference: https://doc.lucidworks.com/fusion-connectors/5.8/4/release-notes)

Scenario 2: Timeout errors with some remote JDBC v2 datasources

The JDBC V2 connector successfully indexes millions of documents but encounters timeout errors towards completion. Fusion displays error messages like:

errorMessage: "The following components failed: [class com.lucidworks.connectors.service.components.job.processor.DefaultDataProcessor : 
Job terminated due to no plugin activity within 600 seconds]"

errorMessage: "The following components failed: [class com.lucidworks.connectors.service.components.job.processor.DefaultDataProcessor : 
Job terminated due to no plugin activity within 5000 seconds]"

“cannot obtain status from the job backend in 60000 ms”.

Various reasons could cause these errors. To begin debugging, it is advisable to examine the status of the following pods:

Connector
Connector-backend
Connector plugin
Kafka pods

Kafka :

Regarding Kafka, it is crucial that the pods remain healthy and do not experience any connection issues. All instances of Kafka (kafka0, kafka1, kafka2) should communicate effectively and work together. Examining the pod logs provides valuable insights.

Specifically, look for occurrences of "java.net.UnknownHostException." If found, please restart each Kafka pod before initiating the subsequent crawl.

Verifying the resource limits :

Ensuring the required resource configurations for the above-mentioned pods is imperative. Start by checking the describe pod status of the involved pods.

If any errors like OOMKilled occur, begin by adjusting the resources. Take the following steps:

Increase resource limits and requests.
Check the required CPU/memory and adjust accordingly.

Note: A one-size-fits-all suggestion is challenging since it depends on each client's requirements. Determining sensible limits can be a lengthy process. Connectors consume significant resources and their resource allocation depends on usage. Accurately sizing these components requires conducting load testing in your specific setup. Different clusters may necessitate vastly different resource allocations.

An example of a successful scenario involved indexing 10 million or more documents with the following configurations:

fusion-indexing - 4 pods	connectors-backend - 2 pods	connectors - 2 pods	connector-plugin - 2 pods
-Xms8g -Xmx8g	-Xms12g -Xmx12g	-Xms6g -Xmx6g	-Xms6g -Xmx6g
resources:	resources:	resources:	resources:
limits:	limits:	limits:	limits:
cpu: 4	cpu: 6	cpu: 4	cpu: 10
memory: 16G	memory: 24G	memory: 24G	memory: 15G
requests:	requests:	requests:	requests:
cpu: 4	cpu: 4	cpu: 2	cpu: 6
memory:12G	memory:12G	memory:12G	memory:4G

Verifying the Datasource configuration :

Sometimes, infra-level configurations alone may not suffice, and tweaking the datasource configuration becomes necessary.

For instance, indexing the aforementioned 10 million plus documents required the following datasource configurations:

Reducing the number of connection pool to 10, (Max connections)
Reducing the batch size to 100,
Increasing the indexing timeout to 300000s
Core properties (Advanced) Fetch threads: 5

After making the changes, clear the Job state collection for the datasource before rerunning:

1. Go to Collections Manager → Collection name.

2. Find the job state collection for the datasource: <datasource_name>_job_state.

3. Click on Settings → Clear collection.

If the crawl still fails, clear the job_state collection and rerun the crawl without clearing the datasource.

Note that only the job_state collection should be cleared. This will clear the crawldb data of the run. Clearing the datasource is considered a fresh crawl.

Scenario 3: Plugin can not handle the request

INTERNAL: Plugin can not handle the request
Timeout on blocking read for 60000000000 NANOSECONDS}]

The connectors-backend can sometimes fail in an unrecoverable state when a V2 remote datasource encounters an error. The UI continues to display "running" even when the remote connector is stopped on the external client.

To address this, restart the connector-backend services. Restarting both the connector-backend and connector-plugin may be necessary to resume crawling after a connector-backend crash.

The confusing state of the plugin or connectors-backend arises due to communication over Kafka (or Pulsar in earlier versions). Since communication is not direct, a crash on one side is not immediately detected on the other side. When the crashing process restarts, a state mismatch can cause various issues. In some cases, restarting the plugin or remote connector is sufficient, but for guaranteed recovery, restarting both the backend and the plugin is recommended.

Even after clearing the datasource, restarting the connector-backend and plugin services, or adding ${limit} to the query, a Timeout error may persist. Currently, there is an internal JIRA to make the validation timeout configurable. Therefore, we suggest contacting LW support regarding this issue.

Verifying the Query :

Once you have verified and adjusted the infrastructure and allocated the necessary resources, and if you continue to encounter these errors, it’s crucial to assess the size of the query.

If the query returns a large number of rows, it may be worth trying to leverage the pagination of the results. The client can try using Fusion’s built-in offset and limit variables to manage that. The query would need to be modified to leverage those.

In the query description

A SQL SELECT statement to choose the records to be retrieved. 
For paginated queries, use the special variables ${limit} and ${offset}

For detailed instructions, please refer to the jdbc-sql-V2-connector.

In the case of Oracle queries, the syntax is explained in the SQL Beginner's Guide.

For Example:

When dealing with a large dataset, such as approximately 4 million records, it becomes imperative to divide the task into 3-4 smaller jobs. For instance, by breaking down the query into chunks of 100,000 records, the process works seamlessly when data is limited.

Please note that pagination will not function correctly if the query lacks the ${limit} and ${offset} variables, as described in the field definition:

A SQL SELECT statement to choose the records to be retrieved. 
For paginated queries, use the special variables ${limit} and ${offset}. 
The specific syntax will be driver dependent. 
Examples: 
Mysql - SELECT * FROM test_table LIMIT ${limit} OFFSET ${offset}, 
Microsoft SQLServer - SELECT * FROM test_table ORDER BY primary_key OFFSET ${offset} FETCH NEXT ${limit} ROWS ONLY

References:

JDBC-Sql-V2-connector

JDBC-V2-connector-configuration-reference

Troubleshoot-a-jdbc-datasource