Comprehensive Guide to Hive SQL Syntax and Operations
This article provides a detailed walkthrough of Hive SQL, categorizing its features and syntax for practical use. Hive SQL is segmented into the following categories:
-
DDL Statements:
- Operations on databases, including their creation, modification, viewing, and deletion.
- Table operations such as manipulating internal/external tables, partitions, and bucketed tables.
-
DQL Statements:
- Queries for single tables or multi-table joins.
- Hive functions including aggregate, conditional, date, and string-based operations.
- Advanced features like lateral views, transforms, and window analytics.
-
Miscellaneous:
- Techniques for data manipulation using load, export, and import.
- Converting rows to columns and vice versa.
Examples of specific use cases, syntax, and Hive SQL commands are provided in detail:
DDL Operations: Database and Table Management
- For database creation:
CREATE DATABASE IF NOT EXISTS my_database;
- Altering a database:
ALTER DATABASE my_database SET DBPROPERTIES ('created_at' = '2023-01-01');
- Table creation example:
CREATE TABLE students (id INT, name STRING);
- Table structure inspection:
DESCRIBE FORMATTED students;
Query Syntax (DQL)
Hive supports single-table queries aswell as table joins for complex queriees. Some examples:
- Single table query:
SELECT * FROM students WHERE id > 10;
- Joins:
SELECT a.name, b.course FROM students a JOIN courses b ON a.id = b.student_id;
Hive Functions:
- Aggregate Functions:
COUNT,SUM,AVG,MIN,MAX.
- Conditional Function:
CASE WHEN,IF,COALESCE.
- Date Functions:
- Manipulating and formatting dates (
unix_timestamp,to_date,date_sub).
- Manipulating and formatting dates (
- String Functions:
- Methods for manipulating text properties (
substr,concat,trim).
- Methods for manipulating text properties (
Advanced Operations: Lateral Views and JSON Parsing
Hive enables row-to-column transformation using lateral views:
SELECT exploded_item FROM my_table LATERAL VIEW EXPLODE(my_array_column) exploded_table AS exploded_item;
It also provides tools to extract specific keys from JSON strings:
SELECT get_json_object(json_column, '$.key') FROM my_table;
Window Functions:
Window functions allow calculations across partitions:
SELECT course_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY course_name) AS rank FROM course_table;
Other examples include RANK(), LEAD(), and LAG().
Grouping Sets and Rollups:
For aggregated reporting:
SELECT month, COUNT(*) AS user_count FROM log_data GROUP BY ROLLUP(month, day);
This enables bucket-level analysis.
This article is a practical referance for Hive SQL users, covering common use cases and advanced methodology for managing big data effectively.
For consistent updates and insights, follow the blog [Five Minutes Learning Big Data].