hive

Short Description:

Naming conventions help to ease programmer and architect to understand whats inside going on in a business.

Article

I have worked with almost 20 to 25 applications. Whenever i start working first i have to understand each applications naming convention and i keep thinking why we all not follow single naming convention. As Hadoop is evolving rapidly therefore would like to share my naming convention so that may be if you come to my project will feel comfortable and so as I if you follow too.

Database Names:

If application serve to technology then database name would be like

<APPID>_<TECHNOLOGY>_TBLS

<APPID>_<TECHNOLOGY>_VIEW

If application serve to vendor then database name would be like

<APPID>_<VENDORNAME>_TBLS

<APPID>_<VENDORNAME>_VIEW

If database application further required to divide by module then database name would be like

<APPLID>_<MODULE>_TBLS

<APPLID>_<MODULE>_VIEW

Fact Table Names:

TFXXX_<FREQUENCY>_<AGRT>

Note: AGGRT is will not be there for the table stores lowest granularity table. It will be added only to aggregate data table.

XXX: Range from 001 to 999 (We can set number according to our requirement)

FREQUENCY:

  • HOURLY (range from 201 to 399)
  • DAILY (range from 401 to 599)

 

External Table Names:

TEXXX_<FREQUENCY>

Dim Table Names:

TDXXX_<DIM_TYPE_NAME>

XXX: Range from 001 to 999

Lookup\Config tables

TLXXX_<REF>

XXX: Range from 001 to 999

Control tables:

 

TCXXX_<TABLENAME>

XXX: Range from 001 to 999

Temporary Tables:

 

TMP_<JOBNAME>_<Name>

Note: (This should be used for the tables which is created and dropped by job while it’s executing)

 

PRM_<JOBNAME>_<Name>

Note: (This should be used for the tables which are used to insert and drop data while it’s executing)

View Names:

VFXXX_<FREQUENCY>_<AGRT>

Note: AGGRT is will not be there for the table stores lowest granularity table. It will be added only to aggregate data table.

 

XXX: Range from 001 to 999

FREQUENCY:

  • HOURLY
  • DAILY etc

Column Names:

 

  • Should not start with number
  • Should not have any special chars except “_”
  • Start with a Capital letter
  • Few downstream databases have column limitation is 128 characters.

Stored Procs or HQL Query:

PSXXX_[<FREQUENCY>|<CALC>|<AGRT>|<DownStream>]

Example: PS001_ENGINEERING_HOURLY

XXX: Range from 001 to 999

Macro:

MCXXX_<MODULENAME>

XXX: Range from 001 to 999

UDF(Hadoop):

UDFXXX_<MODULENAME>

XXX: Range from 001 to 999

 

Index:

Index Names TFXXX _ PRI _ IDX#_<NUSI/USI>

IDX = constant for primary index

# = secondary index sequential numeric number(1, 2, 3, 4, …)

PRI – primary index (used to distribute data across amps and then for access performance

NUSI– non unique secondary index used for access performance

USI – unique secondary index used for access performance

Next Article i’ll share more naming convention on Oozie, file naming and Data types…

10 thoughts on “Hive Naming conventions and database naming…

  1. Εxcellent way of Ԁeѕcribing, and pleasant paragraph to take
    information regarding my presentation subjeсt, which i am going
    to present in ɑcademy.

  2. ѡhoah this weblog is fantastic i love reading your articles.
    Keep up the good work! Yοᥙ aⅼready know, lоts of indivіduals are hunting around for this information, you could aіd them greatly.

  3. Hеy this is kinda of off toρic but I was wanting to know if blogs use
    WYSIWYG eⅾitors or if you hаve to manually
    code with HᎢML. I’m starting a blog soon but have no coding
    expᥱrtise so I wanted to get advice from someone with experience.
    Any help would be grеatly appreciated!

  4. It’s amazіng to pay a quick visіt this site and reading the views of all mаtes about this post, while
    I am ɑlso keen of getting knowⅼedge.

Leave a Reply

Your email address will not be published. Required fields are marked *