Hbase, HbaseFcsk

Hbase Administration using HBaseFsck (hbck) and other tools…

AAEAAQAAAAAAAApRAAAAJGY2MzI1MzgzLWE5ZWQtNGFkMy1hMTdjLTQ2M2RhZWM5Yzc3YwHBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. Sometime we need to run hbck in reguler interval because some inconsistencies can be transient (e.g. cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g. via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected.

Simple command to run hbck are below:

hbase hbck
hbase hbck -details

If you just want to know if some tables are corrupted, you can limit hbck to identify inconsistencies in only specific tables.

hbase hbck HOLIDEX_EVNT peoples_mob

To fix deployment, partial repair problems and if there is data inconsistency use the following command to fix a lower risk region:

hbase hbck -fixAssignments

To repair region not assigned (unassigned), misallocation (incorrectly assigned) as well as multiple distribution (multiply assigned) problems:

hbase hbck -fixMeta

There are a few classes of table integrity problems that are low risk repairs. Some of them are like, delete META list but there is no data recording HDFS in the region. added HDFS data but there is no record of the region META table to table META

hbase hbck -repairHoles

If rowkey appears empty, that is, two adjacent region of rowkey not continuous, then use this parameter to create a new region in HDFS inside. After you create a new region -fixMeta and -fixAssignments parameters to use to mount this region, and it is generally used in conjunction with the first two parameters:

hbase hbck -fixAssignments -fixMeta -fixHdfsHoles

Following operation is very dangerous, because underlying operations will modify the file system, caution! 

First use hbck -details view detailed questions and them perform following operations, if you need to stop applications, and then data operations may not cause the exception, if you execute the following command.

hbase hbck -fixHdfsOrphans

The file system with no metadata file (.regioninfo) added to the region hbase directory, using below command that directory and create .regioninfo region assigned to regionser:

hbase hbck -fixHdfsOverlaps

If the overlap of data, incorporated directly into a large region will generate a lot of split and compact operation, the following parameters can be controlled region is too large:

-maxMerge merge overlapping region merge into one large region.

-sidelineBigOverlaps if there is greater than the number of the region maxMerge overlap, overlapping manner using the sideline and the other region.

-maxOverlapsToSideline if treated with sideline overlapping region, and up to n sideline region.

We can combine above like below:

hbase hbck -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps

When Hbase data file needs to start a version file, if this file is missing, you can use this command to create a new, but to ensure that version and the version hbck Hbase cluster is the same:

hbase hbck -fixVersionFile

If ROOT table and META tables are a problem Hbase not start, you can use this command to create a new table ROOT and META.

hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

When the region do split operation, the parent region will be automatically removed. But sometimes sub-region before the parent region is cleared and made a split. Cause some delay offline parent region exists in the META table and HDFS, but not deployed, HBASE can not remove them. In this case you can use this command to reset the region in META table is online and there is no split. Then you can use this command to repair before the repair region :

hbase hbck -fixSplitParents

How to manually merge region:

Some operations like the first balancer closed, then open another balancer operation is completed and after a period of operation may produce some small region, the need for regular check these region and the adjacent region and they combined to reduce the total number of system region, reducing administrative overhead.

To consolidate use below methods:

1. Locate the need to merge the region’s name encoded

2. enter the shell HBase

3. perform merge_region ‘region1’, ‘region2’

How to copy a table:

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name='HOLIDEX_EVNT1' 'HOLIDEX_EVNT'

How to Export/Import on HBase Table:

hbase org.apache.hadoop.hbase.mapreduce.Export HOLIDEX_EVNT /home/hbase/text.txt
hbase org.apache.hadoop.hbase.mapreduce.Import HOLIDEX_EVNT /home/hbase/text.txt

You can also do a direct copy of the corresponding file in hdfs:-

1. first hdfs copy files, such as

hadoop distcp hdfs://srcnamenode:9000/hbase/testtable/ hdfs://distnamenode:9000/hbase/testtable/

2. then excute hbase hbase org.jruby.Main JRuby program to load data in Hbase

generate meta information, restart HBase

Keep on updating!!

Leave a Reply

Your email address will not be published. Required fields are marked *