Apache Spark, Hbase

Multiple WAL in Apache HBase 1.3 and performance enhancements!!!

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total. Below are some bold points on enhancement made in HBase 1.3.0:- The “date-based tiered compaction” support shipped in HBase 1.3.0 is beneficial for where data is infrequently deleted or updated and recent data is scanned more often than an older one. Records time-to-live (TTL) can be easily enforced with this new compaction strategy. Improved multiple WAL support in Apache HBase […]

Hbase, HbaseFcsk

Hbase Administration using HBaseFsck (hbck) and other tools…

HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. Sometime we need to run hbck in reguler interval because some inconsistencies can be transient (e.g. cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g. via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected. Simple command to run hbck are below: hbase hbck or hbase hbck -details If you […]

Hadoop, Hbase, Hive

JRuby code to purge data on Hbase over Hive table…

Problem to Solve:- How to delete/update/query Binary format stored values in a HBase column family column. Hive over HBase table, where we cant use standard API and unable to apply filters on binary values, you can use below solution for programmability.   Find JRuby source code at github location github.com/mkjmkumar/JRuby_HBase_API This program written in JRuby to purge data using HBase shell and deletes required data applying filter on given binary column.   So you have already heard many advantages of storing data in HBase(specially binary block format) and create Hive table on top of that to query your data. I am not going to explain use case for this, why […]

Database, Hbase, Tephra

Tephra is open-sourced projects that adds complete transaction support to Apache HBase…

Transaction support in Hbase? Yes, a wide range of use case require transaction support. Firstly, we want the client to have great insight and fine-grained control of what the transaction system can do. Having full control on the client side not only allows you to make the best decisions for optimizing for specific use cases, but it also makes integration with third-party systems simpler. Secondly, when different types of components in your application share the data and update the data in multiple data stores in many different ways(Hadoop applications), it is important for the transaction system to support you. Thirdly, […]