Understanding Heaps: Implementation, Heap Sort, and Top-K Algorithms
Introduction to Heaps
A heap is a specialized tree-based data structure that satisfies strict ordering properties. Conceptually, it represents a complete binary tree, but physically it is stored sequentially within a one-dimensional array. In a min-heap, every parent node's value is less than or equal to its children's values. Conversely, a max-heap ensures every parent is greater than or equal to its children. Consequently, the root node always contains the minimum value in a min-heap and the maximum value in a max-heap.
Core Properties
- Parent-Child Inequality: For any node
i, the relationshipheap[parent(i)] <= heap[i](min-heap) orheap[parent(i)] >= heap[i](max-heap) must always hold. - Structural Completeness: The underlying binary tree is always complete, meaning all levels are fully populated except possibly the last, which fills from left to right.
Heap Implementation (Min-Heap)
Because heaps rely on contiguous memory storage for efficient indexing, dynamic arrays are the standard physical representation.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
typedef int ElementType;
typedef struct {
ElementType* elements;
size_t count;
size_t capacity;
} MinHeap;
Primary Operations
Maintaining the heap invariant requires two directional traversal algorithms that restore order when a node violates the structural rules.
1. Downward Sifting (Heapify Down)
This algorithm is applied when a node's value disrupts the heap order, but its subtrees are already valid. Starting from a given parent index, it repeatedly compares the node with its children, swapping with the smaller child until the invariant is restored or the leaves are reached.
Array indexing follows the rule: for a parent at i, children reside at 2i + 1 and 2i + 2.
static void exchange(ElementType* a, ElementType* b) {
assert(a && b);
ElementType temp = *a;
*a = *b;
*b = temp;
}
void sift_down(ElementType* arr, size_t boundary, size_t start_idx) {
size_t cursor = start_idx;
while (true) {
size_t left_idx = 2 * cursor + 1;
size_t right_idx = left_idx + 1;
if (left_idx >= boundary) {
break; // Reached leaf level
}
// Identify the smaller child
size_t target = left_idx;
if (right_idx < boundary && arr[right_idx] < arr[left_idx]) {
target = right_idx;
}
if (arr[target] < arr[cursor]) {
exchange(&arr[cursor], &arr[target]);
cursor = target;
} else {
break; // Heap property satisfied
}
}
}
Time Complexity: O(log N)
2. Upward Sifting (Bubble Up)
Upward sifting is primarily used after appending a new element to the array. The element moves toward the root as long as it is smaller than its parent.
The parent index is derived via (child - 1) / 2. This formula works for both left and right children due to integer division truncation.
void sift_up(ElementType* arr, size_t child_idx) {
while (child_idx > 0) {
size_t parent_idx = (child_idx - 1) / 2;
if (arr[child_idx] < arr[parent_idx]) {
exchange(&arr[child_idx], &arr[parent_idx]);
child_idx = parent_idx;
} else {
break;
}
}
}
Time Complexity: O(log N)
3. Heap Construction
Bottom-Up Approach (Optimal)
By starting at the last non-leaf node and moving backward to the root, we can transform an unsorted array into a heap using downward sifting. Since leaf nodes inherently satisfy the heap condition, skipping them reduces redundant operations.
void construct_heap(ElementType* data, size_t length) {
if (length < 2) return;
// Last non-leaf node is at (length / 2) - 1
for (int i = (length / 2) - 1; i >= 0; i--) {
sift_down(data, length, (size_t)i);
}
}
Time Complexity: O(N)
Top-Down Approach
Iterates through the array from left to right, applying upward sifting at each step. While simpler conceptually, its computationally heavier.
void build_incremental(ElementType* data, size_t length) {
for (size_t i = 1; i < length; i++) {
sift_up(data, i);
}
}
Time Complexity: O(N log N)
4. Initialization
Efficiently loads raw data and applies the O(N) construction strategy.
void heap_setup(MinHeap* h, const ElementType* src, size_t n) {
assert(h && n >= 0);
if (n == 0) {
h->elements = NULL;
h->count = h->capacity = 0;
return;
}
assert(src);
h->elements = (ElementType*)malloc(sizeof(ElementType) * n);
if (!h->elements) {
fprintf(stderr, "Allocation failed\n");
exit(EXIT_FAILURE);
}
memcpy(h->elements, src, sizeof(ElementType) * n);
h->capacity = n;
h->count = n;
construct_heap(h->elements, h->count);
}
5. Insertion
Appends data to the end and restores order via upward sifting, leveraging the existing heap structure for O(log N) efficiency.
static void resize_buffer(MinHeap* h) {
size_t next_cap = (h->capacity == 0) ? 8 : h->capacity * 2;
ElementType* resized = (ElementType*)realloc(h->elements, sizeof(ElementType) * next_cap);
if (!resized) {
fprintf(stderr, "Reallocation failed\n");
exit(EXIT_FAILURE);
}
h->elements = resized;
h->capacity = next_cap;
}
void heap_insert(MinHeap* h, ElementType val) {
assert(h);
if (h->count == h->capacity) resize_buffer(h);
h->elements[h->count] = val;
sift_up(h->elements, h->count);
h->count++;
}
6. Root Removal
Removes the minimum element efficiently by swapping it with the last node, decrementing the size, and triggering a downward sift from the root.
ElementType heap_extract(MinHeap* h) {
assert(h && h->count > 0);
ElementType min_val = h->elements[0];
h->count--;
h->elements[0] = h->elements[h->count];
sift_down(h->elements, h->count, 0);
return min_val;
}
Utility Functions
void display_heap(const MinHeap* h) {
assert(h);
for (size_t i = 0; i < h->count; i++) {
printf("%d ", h->elements[i]);
}
printf("\n");
}
void heap_release(MinHeap* h) {
assert(h);
free(h->elements);
h->elements = NULL;
h->count = h->capacity = 0;
}
ElementType heap_peek(const MinHeap* h) {
assert(h && h->count > 0);
return h->elements[0];
}
size_t heap_length(const MinHeap* h) {
assert(h);
return h->count;
}
int heap_is_empty(const MinHeap* h) {
assert(h);
return h->count == 0;
}
Algorithm Applications
Heap Sort
Heap sort achieves in-place sorting without requiring auxiliary arrays. To sort in ascending order:
- Build a max-heap from the input data.
- Swap the root (maximum) with the last element.
- Reduce the considered heap size by one and apply downward sifting on the new root.
- Repeat until the heap size reaches 1.
Note: The following snippet demonstrates the iterative partition reduction mechanism.
void sort_via_heap(MinHeap* h) {
assert(h);
size_t limit = h->count - 1;
while (limit > 0) {
exchange(&h->elements[0], &h->elements[limit]);
// In practice, use max-heap sift_down here for ascending sort
sift_down(h->elements, limit, 0);
limit--;
}
}
Overall Complexity: O(N log N)
The Top-K Problem
The Top-K problem requires extracting the K largest (or smallest) items from a massive dataset N, where N >> K. Fully sorting the dataset is inefficient and memory-prohibitive.
Optimal Strategy: Maintain a min-heap of size K.
- Initialize the heap with the first
Kelements. - Iterate through the remaining
N-Kelements. If a new element exceeds the heap's root, replace the root and restore the heap property. - The heap retains the
Klargest elements encountered. This approach limits memory usage toO(K)and runs inO(N log K)time.
#include <time.h>
void create_mock_data(const char* path, size_t records) {
FILE* fp = fopen(path, "w");
if (!fp) exit(EXIT_FAILURE);
srand((unsigned)time(NULL));
while (records-- > 0) {
fprintf(fp, "%d\n", rand() % 100000);
}
fclose(fp);
}
void extract_top_k(const char* path, size_t k) {
FILE* stream = fopen(path, "r");
if (!stream) exit(EXIT_FAILURE);
ElementType* buffer = (ElementType*)malloc(sizeof(ElementType) * k);
if (!buffer) exit(EXIT_FAILURE);
// Read initial K values
for (size_t i = 0; i < k; i++) {
fscanf(stream, "%d", &buffer[i]);
}
// Transform to min-heap
construct_heap(buffer, k);
// Process remainder of stream
int candidate;
while (fscanf(stream, "%d", &candidate) == 1) {
if (candidate > buffer[0]) {
buffer[0] = candidate;
sift_down(buffer, k, 0);
}
}
printf("Top %zu values:\n", k);
for (size_t i = 0; i < k; i++) printf("%d ", buffer[i]);
printf("\n");
free(buffer);
fclose(stream);
}
For datasets fitting in RAM, file operations can be replaced with direct array iteration.
Summary
Heaps are foundational structures for priority queues and efficient data retrieval. Mastering downward and upward adjustment techniques, along with optimal construction strategies, enables developers to solve complex sorting and streaming selection problems with predictable time and space complexity.