Home > Tech > Content

PostgreSQL Aggregate Functions and HAVING Clause Implementation

Tech May 15 12

This article examines the implementation of aggregate functions and the HAVING clause in PostgreSQL from a source code perspective. We'll explore how PostgreSQL processes SQL queries containing aggregate operations through its execution pipeline.

The PostgreSQL query processing pipeline consists of five main stages:

flowchart TD main["**postgres** backend PostgresMain()"] parse["**PARSE**: pg_parse_query()"] analyze["**ANALYZE:** parse_analyze_fixedparams()"] rewrite["**REWRITE**: pg_rewrite_query()"] utilities["**DDL PROCESSOR**: ProcessUtility()"] optimize["**PLAN/OPTIMIZE**: pg_plan_queries()"] execute["**DML EXECUTOR**: ExecutePlan()"] main --> parse parse --> analyze analyze --> rewrite analyze --> utilities rewrite --> optimize optimize --> execute

To illustrate the concepts, we'll use the following sample table and query:

CREATE TABLE sales_data(product_id int, quantity int, price numeric);
INSERT INTO sales_data SELECT i % 100, random() * 10, random() * 100 
FROM generate_series(1, 5000) g(i);

SELECT product_id, SUM(quantity * price) 
FROM sales_data 
GROUP BY product_id;

Parser Stage

During the parsing phase, PostgreSQL converts the SQL text into an internal parse tree structure. The raw parse tree for our example query would look like:

(
   {RAWSTMT 
   :stmt 
      {SELECTSTMT
      :targetList (
         {RESTARGET 
         :val 
            {COLUMNREF
            :fields ("product_id")
            }
         }
         {RESTARGET
         :val 
            {FUNCCALL 
            :funcname ("sum")
            :args (
               {FUNCCALL 
               :funcname ("*")
               :args (
                  {COLUMNREF 
                  :fields ("quantity")
                  }
                  {COLUMNREF 
                  :fields ("price")
                  }
               )
            }
            :funcformat 0 
            }
         }
      )
      :fromClause (
         {RANGEVAR
         :relname sales_data 
         :inh true 
         :relpersistence p
         }
      )
      :groupClause (
         {COLUMNREF 
         :fields ("product_id")
         }
      )
    }
   }
)

Analyser Stage

The analyser transforms the raw parse tree into a more structured query tree, performing semantic analysis and validation:

{QUERY 
:commandType 1 
:hasAggs true 
:rtable (
  {RANGETBLENTRY 
  :eref 
     {ALIAS 
     :aliasname sales_data 
     :colnames ("product_id" "quantity" "price")
     }
  :rtekind 0 
  :relid 16401 
  :relkind r 
  }
)
:targetList (
  {TARGETENTRY 
  :expr 
     {VAR 
     :varno 1 
     :varattno 1 
     :vartype 23 
     }
  :resno 1 
  :resname product_id 
  :ressortgroupref 1 
  :resorigtbl 16401 
  :resorigcol 1 
  }
  {TARGETENTRY 
  :expr 
     {AGGREF 
     :aggfnoid 2147 
     :aggtype 20 
     :args (
        {TARGETENTRY 
        :expr 
           {FUNCCALL 
           :funcname ("*")
           :args (
              {TARGETENTRY 
              :expr 
                 {VAR 
                 :varno 1 
                 :varattno 2 
                 :vartype 23 
                 }
              :resno 1 
              }
              {TARGETENTRY 
              :expr 
                 {VAR 
                 :varno 1 
                 :varattno 3 
                 :vartype 701 
                 }
              :resno 2 
              }
           )
           }
        :resno 1 
        }
     )
     :aggkind n 
     }
  :resno 2 
  :resname sum 
  }
)
:groupClause (
  {SORTGROUPCLAUSE 
  :tleSortGroupRef 1 
  :eqop 96 
  :sortop 97 
  :hashable true
  }
)
}

Rewriter Stage

The rewriter applies query transformations and optimizations, producing the final query tree:

(
   {QUERY 
   :commandType 1 
   :canSetTag true 
   :hasAggs true 
   :rtable (
      {RANGETBLENTRY 
      :eref 
         {ALIAS 
         :aliasname sales_data 
         :colnames ("product_id" "quantity" "price")
         }
      :rtekind 0 
      :relid 16385 
      :relkind r 
      :rellockmode 1 
      :perminfoindex 1 
      :inh true 
      :inFromCl true 
      }
   )
   :rteperminfos (
      {RTEPERMISSIONINFO 
      :relid 16385 
      :inh true 
      :requiredPerms 2 
      :selectedCols (b 8 9 10)
      :insertedCols (b)
      :updatedCols (b)
      }
   )
   :jointree 
      {FROMEXPR 
      :fromlist (
         {RANGETBLREF 
         :rtindex 1
         }
      )
      }
   :targetList (
      {TARGETENTRY 
      :expr 
         {VAR 
         :varno 1 
         :varattno 1 
         :vartype 23 
         :varnullingrels (b)
         :varlevelsup 0 
         :varnosyn 1 
         :varattnosyn 1 
         }
      :resno 1 
      :resname product_id 
      :ressortgroupref 1 
      :resorigtbl 16385 
      :resorigcol 1 
      }
      {TARGETENTRY 
      :expr 
         {AGGREF 
         :aggfnoid 2147 
         :aggtype 20 
         :aggargtypes (o 23, o 701)
         :args (
            {TARGETENTRY 
            :expr 
               {FUNCCALL 
               :funcname ("*")
               :args (
                  {TARGETENTRY 
                  :expr 
                     {VAR 
                     :varno 1 
                     :varattno 2 
                     :vartype 23 
                     :varnullingrels (b)
                     :varnosyn 1 
                     :varattnosyn 2 
                     }
                  :resno 1 
                  }
                  {TARGETENTRY 
                  :expr 
                     {VAR 
                     :varno 1 
                     :varattno 3 
                     :vartype 701 
                     :varnullingrels (b)
                     :varnosyn 1 
                     :varattnosyn 3 
                     }
                  :resno 2 
                  }
               )
               }
            :resno 1 
            }
         )
         :aggkind n 
         }
      :resno 2 
      :resname sum 
      }
   )
   :groupClause (
      {SORTGROUPCLAUSE 
      :tleSortGroupRef 1 
      :eqop 96 
      :sortop 97 
      :nulls_first false 
      :hashable true
      }
   )
   }
)

Planner/Optimizer Stage

The planner generates an execution plan based on the query tree:

(
   {PLANNEDSTMT 
   :commandType 1 
   :canSetTag true 
   :planTree 
      {AGG 
      :plan.targetlist (
         {TARGETENTRY 
         :expr 
            {VAR 
            :varno -2 
            :varattno 1 
            :vartype 23 
            }
         :resno 1 
         :resname product_id 
         :ressortgroupref 1 
         :resorigtbl 16385 
         }
         {TARGETENTRY 
         :expr 
            {AGGREF 
            :aggfnoid 2147 
            :aggtype 20 
            :aggtranstype 20 
            :aggargtypes (o 23, o 701)
            :args (
               {TARGETENTRY 
               :expr 
                  {FUNCCALL 
                  :funcname ("*")
                  :args (
                     {TARGETENTRY 
                     :expr 
                        {VAR 
                        :varno -2 
                        :varattno 2 
                        :vartype 23 
                        :varnosyn 1 
                        :varattnosyn 2 
                        }
                     :resno 1 
                     }
                     {TARGETENTRY 
                     :expr 
                        {VAR 
                        :varno -2 
                        :varattno 3 
                        :vartype 701 
                        :varnosyn 1 
                        :varattnosyn 3 
                        }
                     :resno 2 
                     }
                  )
                  }
               :resno 1 
               }
            )
            :aggkind n 
            }
         :resno 2 
         :resname sum 
         }
      )
      :plan.lefttree 
         {SEQSCAN 
         :scan.plan.targetlist (
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 1 
               :vartype 23 
               :varnosyn 1 
               :varattnosyn 1 
               }
            :resno 1 
            :ressortgroupref 1 
            }
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 2 
               :vartype 23 
               :varnosyn 1 
               :varattnosyn 2 
               }
            :resno 2 
            }
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 3 
               :vartype 701 
               :varnosyn 1 
               :varattnosyn 3 
               }
            :resno 3 
            }
         )
         :scan.scanrelid 1
         }
      :aggstrategy 2 
      :numCols 1 
      :grpColIdx ( 1)
      :grpOperators ( 96)
      :grpCollations ( 0)
      :numGroups 100 
      :aggParams (b)
      }
   :rtable (
      {RANGETBLENTRY 
      :eref 
         {ALIAS 
         :aliasname sales_data 
         :colnames ("product_id" "quantity" "price")
         }
      :rtekind 0 
      :relid 16385 
      :relkind r 
      :rellockmode 1 
      :perminfoindex 1 
      :inFromCl true 
      }
   )
   :permInfos (
      {RTEPERMISSIONINFO 
      :relid 16385 
      :inh true 
      :requiredPerms 2 
      :selectedCols (b 8 9 10)
      :insertedCols (b)
      :updatedCols (b)
      }
   )
   :relationOids (o 16385)
   }
)

Executor Stage

The executor is responsible for actually executing the plan. For aggregate operations, PostgreSQL uses the execAgg function to handle the Agg node.

Aggregate Strategies

PostgreSQL employs several strategies for handling aggregates:

AGG_HASHED: Builds a hash table using group key columns as hash keys. This allows fast aggregation for arbitrary groupings but requires additional memory.
AGG_MIXED: Combines hashed and sorted aggregation, switching between them as needed to balance memory usage and performance.
AGG_PLAIN: Performs naive aggregation by scanning input tuples repeatedly, updating aggregates for each group encountered.
AGG_SORTED: First sorts input data on group key columns, then computes aggregates within each consecutive group.

Data Structures

Several key data structures support aggregation operations:

AggStatePerAggData: Contains state for each aggregate function (sum, count, etc.)
AggStatePerTransData: Holds transition state for distributive functions
AggStatePerGroupData: Stores intermediate data per grouping set
AggStatePerPhaseData: Manages state between aggregation phases
AggStatePerHashData: Handles hash table operations

Aggregate Execution Flow

The agg_fill_hash_table function is central to hash-based aggrgeation. It fetches tuples and applies aggregate functions to each group.

Tuple Fetching

SeqNext is the access method that retrieves data using table_scan_getnextslot. PostgreSQL uses TableHashSlot to hold tuple data efficiently.

Minimal Tuples

A minimal tuple in PostgreSQL is a compact representation containing only essential data. Key characteristics include:

Stores only user column data with null flags
Excludes control information like xmin, cmax
Data not aligned on byte boundaries
Smaller header than standard heap tuples
Requires TupleDesc for interpretation

Minimal tuples are used in:

Hash table tuples for hashed aggregation
Intermediate storage for grouping operations
Compact tuple representation in memory
Network transmission to reduce ovehread

System Catalogs

Aggregate functions are defined in system catalogs:

pg_proc: Contains function definitions
pg_aggregate: Stores aggregate-specific metadata

For example, the COUNT function is defined as:

name	value
aggfnoid	pg_catalog.count
aggkind	n
aggtransfn	int8inc
aggcombinefn	int8pl
aggtranstype	20
agginitval	0

Execution Steps

The evaluation steps for our example query would be:

EEOP_OUTER_FETCHSOME
EEOP_OUTER_VAR
EEOP_AGG_STRICT_INPUT_CHECK_ARGS
EEOP_AGG_PLAIN_TRANS_STRICT_BYVAL
EEOP_DONE

In the EEOP_AGG_PLAIN_TRANS_STRICT_BYVAL step, the transition function logic is executed, storing intermediate values in pergroup->transValue.

Result Retrieval

When retrieving results from the hash table:

EEOP_OUTER_FETCHSOME
EEOP_ASSIGN_OUTER_VAR
EEOP_AGGREF
EEOP_ASSIGN_TMP
EEOP_DONE

This sequence extracts the final aggregated values from the hash table and prepares them for output.

Back to List

Prev: Efficient Problem Solutions for Programming Contests

Next: Essential JavaScript Fundamentals for React Development

Fading Coder

PostgreSQL Aggregate Functions and HAVING Clause Implementation

Parser Stage

Analyser Stage

Rewriter Stage

Planner/Optimizer Stage

Executor Stage

Aggregate Strategies

Data Structures

Aggregate Execution Flow

Tuple Fetching

Minimal Tuples

System Catalogs

Execution Steps

Result Retrieval

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

PostgreSQL Aggregate Functions and HAVING Clause Implementation

Parser Stage

Analyser Stage

Rewriter Stage

Planner/Optimizer Stage

Executor Stage

Aggregate Strategies

Data Structures

Aggregate Execution Flow

Tuple Fetching

Minimal Tuples

System Catalogs

Execution Steps

Result Retrieval

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment