Bigdata, HDP Search, Solr, SolrCloud

SolrCloud vs HDPSearch…

Let us start to remove some confusion we have related to SolrCloud and HDPSearch.

First what is the SolrCloud:-
Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability, called SolrCloud, these capabilities provide distributed indexing and search capabilities, supporting the following features:

  • Central configuration for the entire cluster.
  • Automatic load balancing and fail-over for queries.
  • ZooKeeper integration for cluster coordination and configuration.

Lets clear some confusion on Solr and SolrCloud(ZooKeeper coordinating, Solr with HDFS, HA mode) too:-
Solr and SolrCloud are not separate things; Solr is the application while SolrCloud is a mode of running Solr. The alternative to running Solr in SolrCloud mode is running it in standalone mode(Master/slave).

SolrCloud mode offers index replication, failover, load balancing, and distributed queries with the help of ZooKeeper and other specialized features in Solr. In standalone mode, Solr still offers index replication and distributed queries, but these activities are not coordinated with ZooKeeper but are managed manually. In the case of running Solr in standalone mode, failover and load balancing also need to be configured and managed manually.

When using Solr in SolrCloud mode, every index update is distributed across the cluster to every shard and replica of the cluster. SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards, and replicas. Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Queries and updates can be sent to any server. Solr will use the information in the ZooKeeper database to figure out which servers need to handle the request.Solr Standalone mode

Solr Standalone mode, use it whenever required:-

For some use cases, when heavyweight indexing is required, we prefer standalone mode. Also, another advantage of the standalone mode is the separate nodes used for indexing/queries.

What is the difference between SolrCloud vs HDPSearch:-

Ok, When using HDPSearch and Solr in SolrCloud mode(ZooKeeper coordinating, Solr with HDFS, HA mode) that means we are using the bundle provided by HDP. Which again means we are bound to use a compatible version of Solr provided by HDP with our HDP cluster.
Now if we are managing our stack from Ambari and Solr is required to serve our application data, then HDPSearch makes much more sense.

What am I loosing out on using HDP-Search over Solr.

With HDP Search, the version we have of Solr and Banana 1.6.  By the time writing this blog the current version of Solr is 6.6.0. and Even though Solr has added beutiful features in Solr 5.8.0 releaese which you will definily be missing if your application required proactive use Solr. For example Single query from multiple collections and many facet queries.

We have Fruits with flavor and taste:-
Yes, it is called as Cloudera Search the similar bundle from Cloudera and many others.

I did not get a chance to use Cloudera Search but if you are using it please add comments or mail me


Leave a Reply

Your email address will not be published. Required fields are marked *