Hive Naming conventions and database naming…

Short Description: Naming conventions help to ease programmer and architect to understand whats inside going on in a business. Article I have worked with almost 20 to 25 applications. Whenever i start working first i have to understand each applications naming convention and i keep thinking why we all not follow single naming convention. As …

Hive Naming conventions and database naming… Read More »

The ACID properties and the CAP theorem are two concepts in data management to distributed system.

Started working on HBase again!! Thought why not refresh few concepts before proceeding to actual work. Important things comes into mind when we work with NoSQL is distributed environment are sharding and partitions.  Let’s dive into ACID properties of database and CAP theorem for distributed system. The ACID properties and the CAP theorem are two …

The ACID properties and the CAP theorem are two concepts in data management to distributed system. Read More »

Encourage you to switch to Jupyter Lab…

Notebooks are great for prototyping, longer pipelines or processes. If you are a user of PyCharm or Jupyter Notebook and an exploratory data scientist, I would encourage you to switch you to Jupyter Lab. For Jupyter Lab installation steps go here Below are some of the advantages that I see using Jupyter Lab over Jupyter Noteb

Why and when we need Machine Learning…

I’m into the data management/data quality from several years. When I ask some people what is data management processes they simply reply, “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.” This isn’t data management…it’s data storage. If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning. Data management should be your first step before diving into any other data project(s).