Hadoop moves the code to the data, Storm moves the data to the code. This […]
Tag: hadoop
Approach to execute Machine Learning project, “Halt the Hate”…
Disclaimer: The analysis was done in this project touches a sensitive issue in India. So […]
Fundamantals of Apache Spark…
You can view my other articles on Spark RDD at below links… Apache Spark RDD […]
Advertisement attributes or Ad Attributes…An Idea!!!
Some time ago i was working on an idea called as Ad Attributes or Advertisement attributes. I’d […]
SolrCloud vs HDPSearch…
Let us start to remove some confusion we have related to SolrCloud and HDPSearch. First […]
Multiple WAL in Apache HBase 1.3 and performance enhancements!!!
Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction […]
Install and smoketest R and RHadoop on Hortonworks Data Platform (HDP25-CentOS7)
Before going to Installation steps i’d like to give a small introduction on RHADOOP. What […]
JRuby code to purge data on Hbase over Hive table…
Problem to Solve:- How to delete/update/query Binary format stored values in a HBase column family column. Hive […]
Python and Python bites
Python and Python bites “lambda” Hi everyone, this article show you one powerful function […]
Past and Future of Apache Kylin!!!
Short Description: Apache Kylin (Chinese: Kirin) appears, can solve the problems based on Hadoop. Article […]
Heterogeneous Storage in HDFS(Part-1)…
An Introduction of heterogeneous storage type, and the flexible configuration of heterogeneous storage! Heterogeneous Storage […]
A Step-by-Step Guide to HDFS Data Protection Solution for Your Organization on Cloudera CHD
An enterprise-ready encryption solution should provide the following Comprehensive encryption offering wherever it resides, […]
Performance utilities in Hive
Before taking you in details of utilities provided by Hive, let me explain few components […]
Best Practices for Hive Authorization when using connector to HiveServer2
Recently we are in process of working with Presto and configuring Hive Connector to it. […]
HPL/SQL Make SQL-on-Hadoop More Dynamic
Think about the old days when we solved many business problems using Dynamic SQL, exception […]
Coding Tips and Best Practice in Hive and Oozie…
Many time during the code review found some common mistakes done by the developer. Here […]
HDFS is really not designed for many small files!!!
Few of my friends new to Hadoop ask frequently what the good file size is […]
HBase Replication and comparison with popular online backup programs…
Short Description: HBase Replication: Hbase Replication solution can solve the cluster security, data security, read […]
Introduction to Spark
Introduction to Apache Spark:- Spark As a Unified Stack and Computational Engine is responsible for […]
Kafka: A detail introduction
I’ll cover Kafka in detail with introduction to programmability and will try to cover almost […]
The ACID properties and the CAP theorem are two concepts in data management to distributed system.
Started working on HBase again!! Thought why not refresh few concepts before proceeding to actual […]
Data Analysis Approach to a successful outcome
I have done data analysis for one of my project using below approach and hopefully […]