c-b

Think about the old days when we solved many business problems using Dynamic SQL, exception handling, flow-of-control, iterations. Now when I worked with couple of migration projects found few business rules that need to transform to Hive compatible (some of them are very complex and nearly impossible).

Solution is HPL/SQL (formerly PL/HQL), is a language translation and execution layer developed by Dmitry Tolpeko (http://www.hplsql.org/)
Why HPL/SQL

The role of Hadoop in Data Warehousing is huge. But to implement comprehensive ETL, reporting, analytics and data mining processes you not only need distributed processing engines such as MapReduce, Spark or Tez, you also need a way to express comprehensive business rule.
HPL/SQL Make SQL-on-Hadoop More Dynamic

One of the key features of HPL/SQL is that it allows you to make SQL much more dynamic. You can use advanced expressions, various built-in functons, conditions to generate SQL on the fly based on the user configuration, the results of the previous queries, data from files or non-Hadoop data sources and so on.
Hadoop Quick Start

HPL/SQL offers the fastest way to start working with Hadoop. Later you can re-design and implement advanced data processing workflows using Spark, Tez, Storm, Flink and other frameworks, but right now you can use your current skills and existing code to run your business logic on Hadoop.
Procedural SQL on Hadoop, NoSQL and RDBMS

HPL/SQL is an open source tool (Apache License 2.0) that implements procedural SQL language for Apache Hive, SparkSQL, Impala as well as any other SQL-on-Hadoop implementation, any NoSQL and any RDBMS.

HPL/SQL is a hybrid and heterogeneous language that understands syntaxes and semantics of almost any existing procedural SQL dialect, and you can use with any database, for example, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift.

HPL/SQL language is compatible to a large extent with Oracle PL/SQL, ANSI/ISO SQL/PSM (IBM DB2, MySQL, Teradata i.e), Teradata BTEQ, PostgreSQL PL/pgSQL (Netezza), Transact-SQL (Microsoft SQL Server and Sybase) that allows you leveraging existing SQL/DWH skills and familiar approach to implement data warehouse solutions on Hadoop. It also facilitates migration of existing business logic to Hadoop.

HPL/SQL is an efficient way to implement ETL processes in Hadoop.
Use case that solved using HPL/SQL:-

We have a use case where we need cursors. We have duplicate transactions that we need to eliminate in Hive.
So we may have Transaction A have 4 duplicates. We need to open a cursor with Transaction A and its 4 duplicate rows.
Then compare 1 to 2 with complex logic, then lets say that 1 is the winner, we would compare 1 to 3, then lets say that 3 is the winner, we would compare 3 to 4 with lets say 3 the winner. We would keep row 3 of Transaction A and then get the next set of duplicate rows for Transaction B and determine which row we need to keep.

Leave a Reply

Your email address will not be published. Required fields are marked *