Fading Coder

An Old Coder’s Final Dance

You are here: Home > Tech > Content

Comprehensive Guide to Hive SQL Syntax and Operations

Tech 7

This article provides a detailed walkthrough of Hive SQL, categorizing its features and syntax for practical use. Hive SQL is segmented into the following categories:

  1. DDL Statements:

    • Operations on databases, including their creation, modification, viewing, and deletion.
    • Table operations such as manipulating internal/external tables, partitions, and bucketed tables.
  2. DQL Statements:

    • Queries for single tables or multi-table joins.
    • Hive functions including aggregate, conditional, date, and string-based operations.
    • Advanced features like lateral views, transforms, and window analytics.
  3. Miscellaneous:

    • Techniques for data manipulation using load, export, and import.
    • Converting rows to columns and vice versa.

Examples of specific use cases, syntax, and Hive SQL commands are provided in detail:

DDL Operations: Database and Table Management

  • For database creation:
CREATE DATABASE IF NOT EXISTS my_database;
  • Altering a database:
ALTER DATABASE my_database SET DBPROPERTIES ('created_at' = '2023-01-01');
  • Table creation example:
CREATE TABLE students (id INT, name STRING);
  • Table structure inspection:
DESCRIBE FORMATTED students;

Query Syntax (DQL)

Hive supports single-table queries aswell as table joins for complex queriees. Some examples:

  • Single table query:
SELECT * FROM students WHERE id > 10;
  • Joins:
SELECT a.name, b.course FROM students a JOIN courses b ON a.id = b.student_id;

Hive Functions:

  1. Aggregate Functions:
    • COUNT, SUM, AVG, MIN, MAX.
  2. Conditional Function:
    • CASE WHEN, IF, COALESCE.
  3. Date Functions:
    • Manipulating and formatting dates (unix_timestamp, to_date, date_sub).
  4. String Functions:
    • Methods for manipulating text properties (substr, concat, trim).

Advanced Operations: Lateral Views and JSON Parsing

Hive enables row-to-column transformation using lateral views:

SELECT exploded_item FROM my_table LATERAL VIEW EXPLODE(my_array_column) exploded_table AS exploded_item;

It also provides tools to extract specific keys from JSON strings:

SELECT get_json_object(json_column, '$.key') FROM my_table;

Window Functions:

Window functions allow calculations across partitions:

SELECT course_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY course_name) AS rank FROM course_table;

Other examples include RANK(), LEAD(), and LAG().

Grouping Sets and Rollups:

For aggregated reporting:

SELECT month, COUNT(*) AS user_count FROM log_data GROUP BY ROLLUP(month, day);

This enables bucket-level analysis.

This article is a practical referance for Hive SQL users, covering common use cases and advanced methodology for managing big data effectively.

For consistent updates and insights, follow the blog [Five Minutes Learning Big Data].

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.