How can I diagnose/identify problems with the Kerboros plugin setup in Solr?
Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography. Please follow the official documentation to setup Solrcloud with Kerberos authentication plugin: https://lucene.apache.org/solr/guide/7_4/kerberos-authentication-plugin.html
Though the steps are very straightforward, there are certain key points to keep in mind to avoid hiccups as diagnosing / identifying problems in Kerberos plugin setup in Solr is not that obvious:
1. Rightful permissions provided to ‘keytab’ files for solr and zookeeper user for read and access respectively for solr and zookeeper nodes.
chmod 440 […].keytab
2. Making sure the ‘client’ principle and ‘server’ principle are same at solr and zookeeper jaas files respectively, as mentioned in the documentation. Failing that will result in serious issues in inter-shard and solr-zk communication.
3. Inter-server communication between KDC (Kerberos DataCenter), Solr node machines and Zookeeper node machines should be setup correctly. Communication via private IPs between servers should be in place for the client to get authenticated.
4. Kerberos need both TCP and UDP communication between the servers, 88 is the default port (TCP UDP both) for kerberos network talk.
5. For Kerberos specific issues / errors, refer: https://github.com/steveloughran/kerberos_and_hadoop/blob/master/sections/errors.md. It is a useful guide.
There are number of improvements introduced in Kerberos from Solr 6.3:
- Kerberos delegation support; leading to smoother inter-shard and solr-zk communication. Not every request b/w the servers / nodes needs to be authentication from KDC.
- TGT caching was improved while setting up multiple solr nodes on same machine.
- Solr Admin UI (which works with Angular JS 2.0) now works when Kerberos enabled.
- Various fixes in Kerberos plugin already in latest versions: SOLR-9518, SOLR-10282, SOLR-9520.
Moving forward in future releases the above mentioned explanations and examples may no longer apply.