Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

PostgreSQL Aggregate Functions and HAVING Clause Implementation

Tech May 15 1

This article examines the implementation of aggregate functions and the HAVING clause in PostgreSQL from a source code perspective. We'll explore how PostgreSQL processes SQL queries containing aggregate operations through its execution pipeline.

The PostgreSQL query processing pipeline consists of five main stages:

flowchart TD main["**postgres** backend PostgresMain()"] parse["**PARSE**: pg_parse_query()"] analyze["**ANALYZE:** parse_analyze_fixedparams()"] rewrite["**REWRITE**: pg_rewrite_query()"] utilities["**DDL PROCESSOR**: ProcessUtility()"] optimize["**PLAN/OPTIMIZE**: pg_plan_queries()"] execute["**DML EXECUTOR**: ExecutePlan()"] main --> parse parse --> analyze analyze --> rewrite analyze --> utilities rewrite --> optimize optimize --> execute

To illustrate the concepts, we'll use the following sample table and query:

CREATE TABLE sales_data(product_id int, quantity int, price numeric);
INSERT INTO sales_data SELECT i % 100, random() * 10, random() * 100 
FROM generate_series(1, 5000) g(i);

SELECT product_id, SUM(quantity * price) 
FROM sales_data 
GROUP BY product_id;

Parser Stage

During the parsing phase, PostgreSQL converts the SQL text into an internal parse tree structure. The raw parse tree for our example query would look like:

(
   {RAWSTMT 
   :stmt 
      {SELECTSTMT
      :targetList (
         {RESTARGET 
         :val 
            {COLUMNREF
            :fields ("product_id")
            }
         }
         {RESTARGET
         :val 
            {FUNCCALL 
            :funcname ("sum")
            :args (
               {FUNCCALL 
               :funcname ("*")
               :args (
                  {COLUMNREF 
                  :fields ("quantity")
                  }
                  {COLUMNREF 
                  :fields ("price")
                  }
               )
            }
            :funcformat 0 
            }
         }
      )
      :fromClause (
         {RANGEVAR
         :relname sales_data 
         :inh true 
         :relpersistence p
         }
      )
      :groupClause (
         {COLUMNREF 
         :fields ("product_id")
         }
      )
    }
   }
)

Analyser Stage

The analyser transforms the raw parse tree into a more structured query tree, performing semantic analysis and validation:

{QUERY 
:commandType 1 
:hasAggs true 
:rtable (
  {RANGETBLENTRY 
  :eref 
     {ALIAS 
     :aliasname sales_data 
     :colnames ("product_id" "quantity" "price")
     }
  :rtekind 0 
  :relid 16401 
  :relkind r 
  }
)
:targetList (
  {TARGETENTRY 
  :expr 
     {VAR 
     :varno 1 
     :varattno 1 
     :vartype 23 
     }
  :resno 1 
  :resname product_id 
  :ressortgroupref 1 
  :resorigtbl 16401 
  :resorigcol 1 
  }
  {TARGETENTRY 
  :expr 
     {AGGREF 
     :aggfnoid 2147 
     :aggtype 20 
     :args (
        {TARGETENTRY 
        :expr 
           {FUNCCALL 
           :funcname ("*")
           :args (
              {TARGETENTRY 
              :expr 
                 {VAR 
                 :varno 1 
                 :varattno 2 
                 :vartype 23 
                 }
              :resno 1 
              }
              {TARGETENTRY 
              :expr 
                 {VAR 
                 :varno 1 
                 :varattno 3 
                 :vartype 701 
                 }
              :resno 2 
              }
           )
           }
        :resno 1 
        }
     )
     :aggkind n 
     }
  :resno 2 
  :resname sum 
  }
)
:groupClause (
  {SORTGROUPCLAUSE 
  :tleSortGroupRef 1 
  :eqop 96 
  :sortop 97 
  :hashable true
  }
)
}

Rewriter Stage

The rewriter applies query transformations and optimizations, producing the final query tree:

(
   {QUERY 
   :commandType 1 
   :canSetTag true 
   :hasAggs true 
   :rtable (
      {RANGETBLENTRY 
      :eref 
         {ALIAS 
         :aliasname sales_data 
         :colnames ("product_id" "quantity" "price")
         }
      :rtekind 0 
      :relid 16385 
      :relkind r 
      :rellockmode 1 
      :perminfoindex 1 
      :inh true 
      :inFromCl true 
      }
   )
   :rteperminfos (
      {RTEPERMISSIONINFO 
      :relid 16385 
      :inh true 
      :requiredPerms 2 
      :selectedCols (b 8 9 10)
      :insertedCols (b)
      :updatedCols (b)
      }
   )
   :jointree 
      {FROMEXPR 
      :fromlist (
         {RANGETBLREF 
         :rtindex 1
         }
      )
      }
   :targetList (
      {TARGETENTRY 
      :expr 
         {VAR 
         :varno 1 
         :varattno 1 
         :vartype 23 
         :varnullingrels (b)
         :varlevelsup 0 
         :varnosyn 1 
         :varattnosyn 1 
         }
      :resno 1 
      :resname product_id 
      :ressortgroupref 1 
      :resorigtbl 16385 
      :resorigcol 1 
      }
      {TARGETENTRY 
      :expr 
         {AGGREF 
         :aggfnoid 2147 
         :aggtype 20 
         :aggargtypes (o 23, o 701)
         :args (
            {TARGETENTRY 
            :expr 
               {FUNCCALL 
               :funcname ("*")
               :args (
                  {TARGETENTRY 
                  :expr 
                     {VAR 
                     :varno 1 
                     :varattno 2 
                     :vartype 23 
                     :varnullingrels (b)
                     :varnosyn 1 
                     :varattnosyn 2 
                     }
                  :resno 1 
                  }
                  {TARGETENTRY 
                  :expr 
                     {VAR 
                     :varno 1 
                     :varattno 3 
                     :vartype 701 
                     :varnullingrels (b)
                     :varnosyn 1 
                     :varattnosyn 3 
                     }
                  :resno 2 
                  }
               )
               }
            :resno 1 
            }
         )
         :aggkind n 
         }
      :resno 2 
      :resname sum 
      }
   )
   :groupClause (
      {SORTGROUPCLAUSE 
      :tleSortGroupRef 1 
      :eqop 96 
      :sortop 97 
      :nulls_first false 
      :hashable true
      }
   )
   }
)

Planner/Optimizer Stage

The planner generates an execution plan based on the query tree:

(
   {PLANNEDSTMT 
   :commandType 1 
   :canSetTag true 
   :planTree 
      {AGG 
      :plan.targetlist (
         {TARGETENTRY 
         :expr 
            {VAR 
            :varno -2 
            :varattno 1 
            :vartype 23 
            }
         :resno 1 
         :resname product_id 
         :ressortgroupref 1 
         :resorigtbl 16385 
         }
         {TARGETENTRY 
         :expr 
            {AGGREF 
            :aggfnoid 2147 
            :aggtype 20 
            :aggtranstype 20 
            :aggargtypes (o 23, o 701)
            :args (
               {TARGETENTRY 
               :expr 
                  {FUNCCALL 
                  :funcname ("*")
                  :args (
                     {TARGETENTRY 
                     :expr 
                        {VAR 
                        :varno -2 
                        :varattno 2 
                        :vartype 23 
                        :varnosyn 1 
                        :varattnosyn 2 
                        }
                     :resno 1 
                     }
                     {TARGETENTRY 
                     :expr 
                        {VAR 
                        :varno -2 
                        :varattno 3 
                        :vartype 701 
                        :varnosyn 1 
                        :varattnosyn 3 
                        }
                     :resno 2 
                     }
                  )
                  }
               :resno 1 
               }
            )
            :aggkind n 
            }
         :resno 2 
         :resname sum 
         }
      )
      :plan.lefttree 
         {SEQSCAN 
         :scan.plan.targetlist (
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 1 
               :vartype 23 
               :varnosyn 1 
               :varattnosyn 1 
               }
            :resno 1 
            :ressortgroupref 1 
            }
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 2 
               :vartype 23 
               :varnosyn 1 
               :varattnosyn 2 
               }
            :resno 2 
            }
            {TARGETENTRY 
            :expr 
               {VAR 
               :varno 1 
               :varattno 3 
               :vartype 701 
               :varnosyn 1 
               :varattnosyn 3 
               }
            :resno 3 
            }
         )
         :scan.scanrelid 1
         }
      :aggstrategy 2 
      :numCols 1 
      :grpColIdx ( 1)
      :grpOperators ( 96)
      :grpCollations ( 0)
      :numGroups 100 
      :aggParams (b)
      }
   :rtable (
      {RANGETBLENTRY 
      :eref 
         {ALIAS 
         :aliasname sales_data 
         :colnames ("product_id" "quantity" "price")
         }
      :rtekind 0 
      :relid 16385 
      :relkind r 
      :rellockmode 1 
      :perminfoindex 1 
      :inFromCl true 
      }
   )
   :permInfos (
      {RTEPERMISSIONINFO 
      :relid 16385 
      :inh true 
      :requiredPerms 2 
      :selectedCols (b 8 9 10)
      :insertedCols (b)
      :updatedCols (b)
      }
   )
   :relationOids (o 16385)
   }
)

Executor Stage

The executor is responsible for actually executing the plan. For aggregate operations, PostgreSQL uses the execAgg function to handle the Agg node.

Aggregate Strategies

PostgreSQL employs several strategies for handling aggregates:

  • AGG_HASHED: Builds a hash table using group key columns as hash keys. This allows fast aggregation for arbitrary groupings but requires additional memory.
  • AGG_MIXED: Combines hashed and sorted aggregation, switching between them as needed to balance memory usage and performance.
  • AGG_PLAIN: Performs naive aggregation by scanning input tuples repeatedly, updating aggregates for each group encountered.
  • AGG_SORTED: First sorts input data on group key columns, then computes aggregates within each consecutive group.

Data Structures

Several key data structures support agggregation operations:

  • AggStatePerAggData: Contains state for each aggregate function (sum, count, etc.)
  • AggStatePerTransData: Holds transition state for distributive functions
  • AggStatePerGroupData: Stores intermediate data per grouping set
  • AggStatePerPhaseData: Manages state between aggregation phases
  • AggStatePerHashData: Handles hash table operations

Aggregate Execution Flow

The agg_fill_hash_table function is central to hash-based aggregation. It fetches tuples and applies aggregate functions to each group.

Tuple Fetching

SeqNext is the access method that retrieves data using table_scan_getnextslot. PostgreSQL uses TableHashSlot to hold tuple data efficiently.

Minimal Tuples

A minimal tuple in PostgreSQL is a compact representation containing only essential data. Key characteristics include:

  • Stores only user column data with null flags
  • Excludes control information like xmin, cmax
  • Data not aligned on byte boundaries
  • Smaller header than standard heap tuples
  • Requires TupleDesc for interpretation

Minimal tuples are used in:

  • Hash table tuples for hashed aggregation
  • Intermediate storage for grouping operations
  • Compact tuple representation in memory
  • Network transmission to reduce overhead
System Catalogs

Aggregate functions are defined in system catalogs:

  • pg_proc: Contains function definitions
  • pg_aggregate: Stores aggregate-specific metadata

For example, the COUNT function is defined as:

namevalueaggfnoidpg_catalog.countaggkindnaggtransfnint8incaggcombinefnint8plaggtranstype20agginitval0

Execution Steps

The evaluation steps for our example query would be:

EEOP_OUTER_FETCHSOME
EEOP_OUTER_VAR
EEOP_AGG_STRICT_INPUT_CHECK_ARGS
EEOP_AGG_PLAIN_TRANS_STRICT_BYVAL
EEOP_DONE

In the EEOP_AGG_PLAIN_TRANS_STRICT_BYVAL step, the transition function logic is executed, storing intermediate values in pergroup->transValue.

Result Retrieval

When retrieving results from the hash table:

EEOP_OUTER_FETCHSOME
EEOP_ASSIGN_OUTER_VAR
EEOP_AGGREF
EEOP_ASSIGN_TMP
EEOP_DONE

This sequence extracts the final aggregated values from the hash table and prepares them for output.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.