In-Place Heap Sorting Strategies and Top-K Element Extraction
Binary Heap Sorting Fundamentals
Heap-based ordering relies on the binary tree property to organize sequences efficiently. The algorithm operates in two distinct phases: initial heap construction followed by iterative extraction. The chosen heap variant determines the final sequence direction: constructing a maximum-heap yields an ascending arrangement, whereas a minimum-heap produces a descending order.
Auxiliary Structrue Approach
A straightforward implemantation allocates seperate memory for an independent heap object. Each input element is pushed sequentially, followed by repeated popping to overwrite the original buffer. This method operates in $O(N \log N)$ time but demands extra memory allocation and data duplication, making it unsuitable for memory-constrained environments.
Optimized In-Place Rearrangement
Eliminating auxiliary storage requires operating directly on the target array. To achieve ascending order, the routine first transforms the raw data into a maximum-heap. The root node (containing the largest value) is swapped with the final element, effectively locking the maximum value into its correct sorted position. The active heap boundary shrinks, and the displaced root undergoes a downward restructuring process to restore heap invariants. Repeating this cycle until the boundary collapses sorts the entire array in $O(N \log N)$ time with $O(1)$ auxiliary space.
#include <stdio.h>
#include <stdlib.h>
static void exchange_elements(int* p_left, int* p_right) {
int temporary = *p_left;
*p_left = *p_right;
*p_right = temporary;
}
static void percolate_down(int* buffer, int heap_limit, int pivot_idx) {
while (1) {
int target = pivot_idx;
int left = 2 * pivot_idx + 1;
int right = 2 * pivot_idx + 2;
if (left < heap_limit && buffer[left] > buffer[target]) {
target = left;
}
if (right < heap_limit && buffer[right] > buffer[target]) {
target = right;
}
if (target == pivot_idx) {
break;
}
exchange_elements(&buffer[pivot_idx], &buffer[target]);
pivot_idx = target;
}
}
void arrange_ascending(int* sequence, int length) {
// Build maximum-heap from bottom up
for (int cursor = length / 2 - 1; cursor >= 0; --cursor) {
percolate_down(sequence, length, cursor);
}
// Extract and restructure
for (int boundary = length - 1; boundary > 0; --boundary) {
exchange_elements(&sequence[0], &sequence[boundary]);
percolate_down(sequence, boundary, 0);
}
}
int main() {
int dataset[] = {12, 5, 8, 19, 3, 7, 14, 6, 9};
int total = sizeof(dataset) / sizeof(dataset[0]);
arrange_ascending(dataset, total);
for (int idx = 0; idx < total; ++idx) {
printf("%d ", dataset[idx]);
}
return 0;
}
Top-K Element Selection from Massive Streams
Identifying the largest K items from a dataset containing N entries becomes computationally expensive when N is extremely large. Loading the entire collection into memory and performing a complete sort is impractical. A fixed-capacity minimum-heap provides an optimal streaming solution.
The procedure begins by populating a min-heap of size K with the initial batch of elements. As subsequent values arrive from the data stream, each candidate is evaluated against the heap's root, which represents the current K-th largest threshold. If a new value surpasses the root, it overwrites it, and the structure is immediately re-balanced using a downward sift. This guarantees that the heap continuously retains the largest K items encountered. Once the stream ends, the heap holds the exact top-K subset.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SAMPLE_COUNT 1000000
#define VALUE_RANGE 10000000
static void populate_test_data(const char* target_path) {
srand((unsigned int)time(NULL));
FILE* output = fopen(target_path, "w");
if (!output) {
perror("Failed to create data file");
exit(EXIT_FAILURE);
}
for (int i = 0; i < SAMPLE_COUNT; ++i) {
fprintf(output, "%d\n", rand() % VALUE_RANGE);
}
fclose(output);
}
static void restore_min_property(int* heap, int capacity, int start_node) {
int current = start_node;
while (1) {
int smallest = current;
int left_child = 2 * current + 1;
int right_child = 2 * current + 2;
if (left_child < capacity && heap[left_child] < heap[smallest]) {
smallest = left_child;
}
if (right_child < capacity && heap[right_child] < heap[smallest]) {
smallest = right_child;
}
if (smallest == current) {
break;
}
int swap_val = heap[current];
heap[current] = heap[smallest];
heap[smallest] = swap_val;
current = smallest;
}
}
void fetch_top_k(const char* source_path, int k) {
FILE* input = fopen(source_path, "r");
if (!input) {
perror("Failed to open source file");
exit(EXIT_FAILURE);
}
int* candidate_pool = (int*)malloc(k * sizeof(int));
if (!candidate_pool) {
perror("Memory allocation failed");
exit(EXIT_FAILURE);
}
// Initialize pool with first K records
for (int pos = 0; pos < k; ++pos) {
if (fscanf(input, "%d", &candidate_pool[pos]) != 1) break;
}
// Transform pool into min-heap
for (int pivot = k / 2 - 1; pivot >= 0; --pivot) {
restore_min_property(candidate_pool, k, pivot);
}
// Process remaining stream
int incoming_value;
while (fscanf(input, "%d", &incoming_value) == 1) {
if (incoming_value > candidate_pool[0]) {
candidate_pool[0] = incoming_value;
restore_min_property(candidate_pool, k, 0);
}
}
fclose(input);
// Display extracted results
for (int idx = 0; idx < k; ++idx) {
printf("%d ", candidate_pool[idx]);
}
printf("\n");
free(candidate_pool);
}
int main() {
const char* file_ref = "stream_data.txt";
populate_test_data(file_ref);
fetch_top_k(file_ref, 7);
return 0;
}