AWS and GCE both great! Some more powerful configuration of load balancing puts GCE over the top…

I work with Hadoop so I come across or sometimes management ask me a common question, “Why we need Hadoop in cloud” and to answer this question I keep my bold points ready like below… Cloud is your data center, No need to deal with reliability & scaling issues. Pay What You Need. Deployed in Minutes. Cloud storage enables economic flexibility, scale, and rich features. Size clusters independent of storage needs and price continues decreasing. Geo-Redundancy allows for business continuity/disaster recovery planning. Now they move forward to ask me a detail comparison and to find out the difference between GCP […]

Hadoop, HDFS

HDFS is really not designed for many small files!!!

Few of my friends new to Hadoop ask frequently what the good file size is for Hadoop and how to decide file size. Obviously it should not be small size and file size should be as per the block size. HDFS is really not designed for many small files. For each file, the client has to talk to the namenode, which gives it the location(s) of the block(s) of the file, and then the client streams the data from the datanode. Now, in the best case, the client does this once, and then finds that it is the machine with […]