Using a CloudSolrClient in a JavaScript Stage

I need to say at the onset, and for the record, officially, using your own CloudSolrClient in a custom JavaScript stage is NOT considered best practice.  Whenever possible, you would want to use the Solr Indexer stage in lieu of a custom implementation.  However, sometimes you need to do things "your way," and thus spinning up a CloudSolrClient in a custom stage may be necessary to reach your goal.

To that end, here is how you would go about declaring this implementation:

 

function (doc) {
        var client = org.apache.http.client.HttpClient;
        var cloudServer = org.apache.solr.client.solrj.impl.CloudSolrClient;
        var DefaultHttpClient = org.apache.http.impl.client.DefaultHttpClient;
        var ClientConnectionManager = org.apache.http.conn.ClientConnectionManager;
        var PoolingClientConnectionManager = org.apache.http.impl.conn.PoolingClientConnectionManager;
        var CloudSolrClient = org.apache.solr.client.solrj.impl.CloudSolrClient;
        var cm = org.apache.http.impl.conn.PoolingClientConnectionManager;
        var String = java.lang.String;
        var pdoc = com.lucidworks.apollo.common.pipeline.PipelineDocument;

        var ZOOKEEPER_URL = new String("localhost:9983");// change this to point to your zookeeper instance
        var DEFAULT_COLLECTION = new String("NAME-OF-TARGET-COLLECTION");// change this to the name of your collection
        var server = ZOOKEEPER_URL;
        var collection = DEFAULT_COLLECTION;
        var docList = java.util.ArrayList;
        var inputDoc = org.apache.solr.common.SolrInputDocument;
        var pingResp = org.apache.solr.client.solrj.response.SolrPingResponse;
        var res = org.apache.solr.client.solrj.response.UpdateResponse;
        var SolrInputDocument = org.apache.solr.common.SolrInputDocument;
        var UUID = java.util.UUID;


        try {
          
            cm = new PoolingClientConnectionManager();
            client = new DefaultHttpClient(cm);
            cloudServer = new CloudSolrClient(server, client);
            cloudServer.setDefaultCollection(collection);
            logger.info("CLOUD SERVER INIT OK...");
            docList = new java.util.ArrayList();
            pingResp = cloudServer.ping();
            logger.info(pingResp);
 
            for each(pdoc in doc) {
                inputDoc = new SolrInputDocument();
                inputDoc.addField("id", UUID.randomUUID().toString());
                inputDoc.addField("q_txt", pdoc.getFirstFieldValue("extracted_text"));
                docList.add(inputDoc);
            }

            logger.info(" DO SUBMIT OF " + docList.size() + " DOCUMENTS TO SOLR **** ");
            cloudServer.add(docList);
            res = cloudServer.commit();
            logger.info(res);


        } catch (ex) {
            logger.error(ex);
        }

        return doc;
    }

This is a sample implementation. Your final implementation may vary

Breaking it down:

First, we declare the Java classes we'll be using:

        var client = org.apache.http.client.HttpClient;
        var cloudServer = org.apache.solr.client.solrj.impl.CloudSolrClient;
        var DefaultHttpClient = org.apache.http.impl.client.DefaultHttpClient;
        var ClientConnectionManager = org.apache.http.conn.ClientConnectionManager;
        var PoolingClientConnectionManager = org.apache.http.impl.conn.PoolingClientConnectionManager;
        var CloudSolrClient = org.apache.solr.client.solrj.impl.CloudSolrClient;
        var cm = org.apache.http.impl.conn.PoolingClientConnectionManager;
        var String = java.lang.String;
        var pdoc = com.lucidworks.apollo.common.pipeline.PipelineDocument;

Next, we declare our local variables:

var ZOOKEEPER_URL = new String("localhost:9983");// change this to point to your zookeeper instance
        var DEFAULT_COLLECTION = new String("NAME-OF-TARGET-COLLECTION");// change this to the name of your collection
        var server = ZOOKEEPER_URL;
        var collection = DEFAULT_COLLECTION;
        var docList = java.util.ArrayList;
        var inputDoc = org.apache.solr.common.SolrInputDocument;
        var pingResp = org.apache.solr.client.solrj.response.SolrPingResponse;
        var res = org.apache.solr.client.solrj.response.UpdateResponse;
        var SolrInputDocument = org.apache.solr.common.SolrInputDocument;
        var UUID = java.util.UUID;

Note that we need to set the values for ZOOKEEPER_URL and DEFAULT_COLLECTION. These settings should be self-explanatory. The ZOOKEEPER_URL is going to point to the host instance where your ZK is running. The DEFAULT_COLLECTION is the name of the collection you wish to update.

From there, we get into the heavy lifting. First, we instantiate our client:

 cm = new PoolingClientConnectionManager();
            client = new DefaultHttpClient(cm);
            cloudServer = new CloudSolrClient(server, client);
            cloudServer.setDefaultCollection(collection);

To make sure we've connected to the server okay, we do a quick ping, and print the response. This is optional, of course. 

  pingResp = cloudServer.ping();
            logger.info(pingResp);

Next, we do our document processing. In this case I'm using the UUID class to generate an id, but your implementation may vary.

      inputDoc = new SolrInputDocument();
                inputDoc.addField("id", UUID.randomUUID().toString());
                inputDoc.addField("q_txt", pdoc.getFirstFieldValue("extracted_text"));
                docList.add(inputDoc);

As we process each document, we add it to an ArrayList of SolrInputDocuments. Once we're done processing, we add the list to our cloud client and commit:

       cloudServer.add(docList);
            res = cloudServer.commit();
            logger.info(res);

Again, this is not considered best practice, and should really only be used in the event that you absolutely cannot make things work using the SolrIndexer stage.

Verified on Fusion 2.4.3

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk