Goal
Ensure reliable and safe deletion of a crawlDB used by Web Connector V1 with on-disk storage in a multi-node Fusion 4 deployment.
Environment
Fusion 4.2.6 (self-hosted)
Web Connector V1 using on-disk crawlDB
Multi-node cluster deployment
Guide
Use the REST API to delete crawlDB
Fusion 4 provides an endpoint to delete the crawlDB associated with a connector data source. To ensure deletion across all nodes, avoid using localhost in the API call.
Correct API format:
curl -X DELETE -u USERNAME:PASSWORD 'http://YOURFUSIONHOST:8764/api/apollo/connectors/datasources/YOUR_DATASOURCE_NAME/db'Important:
Always replace
localhostwith the actual hostname or IP address of the Fusion node.This API call can be executed from any node, but the hostname must resolve correctly to the Fusion service.
The API does not return a list of nodes where the deletion occurred, and the operation is not fully deterministic in multi-node environments.
Ensure that the connector data source is not running when issuing this command.
Alternative: Delete crawlDB directly from the file system
If REST deletion is unreliable, you can manually delete the crawlDB file structure from each node.
Path to delete:
<fusion-base-path>/data/connectors/connectors-classic/crawldb/lucid.web/YOUR_DATASOURCE_NAMETo ensure safe deletion:
Stop the connector job before removing files.
Repeat the deletion manually on all nodes in the cluster.
There is no system-wide propagation when deleting files manually; each node must be handled individually.
What happens when using “Clear Datasource” in the UI?
When the "Clear Datasource" option is selected from the Fusion UI, two internal operations are triggered:
A Solr delete-by-query call is made to remove indexed documents associated with the data source.
The same REST API mentioned above is called to delete the crawlDB:
curl -X DELETE -u USERNAME:PASSWORD 'http://YOURFUSIONHOST:8764/api/apollo/connectors/datasources/YOUR_DATASOURCE_NAME/db'Can the crawlDB be permanently deactivated?
No. The crawlDB is required for several core Web Connector features, including incremental crawling and dead URI detection. Deactivation is not supported.
Additional notes
Note: The key to safe and reliable crawlDB removal in Fusion 4 is ensuring consistent targeting of nodes and avoiding the use of localhost. Manual deletion is safe if the connector job is stopped beforehand.