If you want to run a high performance Solr instance on AWS you should keep these in mind while picking instance types -
- Have your Solr cluster and your ZooKeeper ensemble in the same region and availability zone. Having them in the same availability zone is required for low latency communication between Solr instance and Solr instance and ZooKeeper. Here are the docs which explain what these are - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
- Have your Solr cluster and your ZooKeeper ensemble in the same placement group which gives you low latency, high network throughput . Unfortunately not all instances are available if you want to use this feature. Here is the AWS documentation which explains this - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html
- Solr benefits from having a lot memory so that the index can be memory mapped. So you should keep this in mind while choosing a instance for Solr. Note you should keep the heap size small and leave the remaining memory to the OS so that it can memory map your index.
- ZK is not that resource hungry so it can run on smaller instances as compared to your Solr instances.
- When you have a requirement for high indexing rates or when your index is larger then the memory for your system, you should opt for a EBS General Purpose (SSD) Volume or a EBS Provisioned IOPS (SSD) Volume. You could find more information here - http://aws.amazon.com/ebs/details/ . The faster the disk the better the performance.
- You should also check out the Solr Scale Toolkit project to help you manage and deploy your Solr and ZK services on AWS. Here is the link to the project - https://github.com/LucidWorks/solr-scale-tk/ and this is a link to the blogpost introducing it - http://lucidworks.com/blog/introducing-the-solr-scale-toolkit/
If you feel you need to run the infrastructure in multiple AWS Availability Zones (AZs), here's some other things to keep in mind:
- We do NOT recommend doing this.
- Using multiple AZ's will increase risk of issues (and decreased performance) due to network communication latency.
- As of this writing, Amazon's public policy for pricing involves charges for network traffic between AZ's. The SolrCloud/zk infrastructure will be generating quite a bit of this, and it will be difficult to predict the cost of it.
- If you are trying to get improved availability from the multiple AZ's, keep in mind...
- You'll need to use at least 3 AZ's, with a zookeeper ensemble of at least 3 nodes and at least 1 zk node in each AZ. This will ensure that you have a zk quorum if one AZ goes away.
- SolrCloud is good at spreading multiple replicas of a single shard across multiple nodes to increase availability. But it isn't aware of AZ boundaries, so it might place all the replicas within a single AZ. To prevent this, you'll need to specify node locations whenever creating collections or performing similar configuration changes.