Issue:
The JDBC connector fails with one of the below Errors :
com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException:
Cannot resolve Oracle CLOB data.
OR
Timeout errors with some remote JDBC v2 datasources
OR
Plugin can not handle the request : Timeout on blocking read for
60000000000 NANOSECONDS
Environment:
Fusion 5.9.0 and above (however, Compatible with Fusion version: 5.4.0 and later)
Resolution:
Scenario 1: JDBC V2 - Non Oracle crawl throws errors
"com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException:
Cannot resolve Oracle CLOB data."
Logs from the connector backend and JDBC connector plugin pods identified the occurrence of the following error:
ERROR [fetch-input-receiver.opentext_gallery-95:com.lucidworks.connector.plugins.jdbc.fetcher.processor.PageProcessor@76] -
Error while iterating results com.lucidworks.connector.plugins.jdbc.exception.JdbcRuntimeException:
Cannot resolve Oracle CLOB data.
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.internal.OracleClob
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
Few similar issues captured :
Explanation:
The JDBC V2 connector (2.3.0) introduced support for Oracle CLOB objects, but it only functions with Oracle CLOB objects. However, the new connector version (2.4.0) extends support to non-Oracle databases, such as IBM CLOB objects.
(Reference: https://doc.lucidworks.com/fusion-connectors/5.8/4/release-notes)
Scenario 2: Timeout errors with some remote JDBC v2 datasources
The JDBC V2 connector successfully indexes millions of documents but encounters timeout errors towards completion. Fusion displays error messages like:errorMessage: "The following components failed: [class com.lucidworks.connectors.service.components.job.processor.DefaultDataProcessor :
Job terminated due to no plugin activity within 600 seconds]"
errorMessage: "The following components failed: [class com.lucidworks.connectors.service.components.job.processor.DefaultDataProcessor :
Job terminated due to no plugin activity within 5000 seconds]"
“cannot obtain status from the job backend in 60000 ms”.
- Connector
- Connector-backend
- Connector plugin
- Kafka pods
Kafka :
Specifically, look for occurrences of "java.net.UnknownHostException." If found, please restart each Kafka pod before initiating the subsequent crawl.
Verifying the resource limits :
Ensuring the required resource configurations for the above-mentioned pods is imperative. Start by checking the describe pod status of the involved pods.
If any errors like OOMKilled occur, begin by adjusting the resources. Take the following steps:
- Increase resource limits and requests.
-
Check the required CPU/memory and adjust accordingly.
fusion-indexing - 4 pods | connectors-backend - 2 pods | connectors - 2 pods | connector-plugin - 2 pods |
-Xms8g -Xmx8g | -Xms12g -Xmx12g | -Xms6g -Xmx6g | -Xms6g -Xmx6g |
resources: | resources: | resources: | resources: |
limits: | limits: | limits: | limits: |
cpu: 4 | cpu: 6 | cpu: 4 | cpu: 10 |
memory: 16G | memory: 24G | memory: 24G | memory: 15G |
requests: | requests: | requests: | requests: |
cpu: 4 | cpu: 4 | cpu: 2 | cpu: 6 |
memory:12G | memory:12G | memory:12G | memory:4G |
Verifying the Datasource configuration :
- Reducing the number of connection pool to 10, (Max connections)
- Reducing the batch size to 100,
- Increasing the indexing timeout to 300000s
- Core properties (Advanced) Fetch threads: 5
After making the changes, clear the Job state collection for the datasource before rerunning:
1. Go to Collections Manager → Collection name.
2. Find the job state collection for the datasource: <datasource_name>_job_state.
3. Click on Settings → Clear collection.
If the crawl still fails, clear the job_state collection and rerun the crawl without clearing the datasource.
Note that only the job_state collection should be cleared. This will clear the crawldb data of the run. Clearing the datasource is considered a fresh crawl.
Scenario 3: Plugin can not handle the request
INTERNAL: Plugin can not handle the request
Timeout on blocking read for 60000000000 NANOSECONDS}]
Verifying the Query :
A SQL SELECT statement to choose the records to be retrieved.
For paginated queries, use the special variables ${limit} and ${offset}
For detailed instructions, please refer to the jdbc-sql-V2-connector.
In the case of Oracle queries, the syntax is explained in the SQL Beginner's Guide.
When dealing with a large dataset, such as approximately 4 million records, it becomes imperative to divide the task into 3-4 smaller jobs. For instance, by breaking down the query into chunks of 100,000 records, the process works seamlessly when data is limited.
Please note that pagination will not function correctly if the query lacks the ${limit} and ${offset} variables, as described in the field definition:
A SQL SELECT statement to choose the records to be retrieved.
For paginated queries, use the special variables ${limit} and ${offset}.
The specific syntax will be driver dependent.
Examples:
Mysql - SELECT * FROM test_table LIMIT ${limit} OFFSET ${offset},
Microsoft SQLServer - SELECT * FROM test_table ORDER BY primary_key OFFSET ${offset} FETCH NEXT ${limit} ROWS ONLY
Comments
0 comments
Article is closed for comments.