Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

In-Place Heap Sorting Strategies and Top-K Element Extraction

Tech 1

Binary Heap Sorting Fundamentals

Heap-based ordering relies on the binary tree property to organize sequences efficiently. The algorithm operates in two distinct phases: initial heap construction followed by iterative extraction. The chosen heap variant determines the final sequence direction: constructing a maximum-heap yields an ascending arrangement, whereas a minimum-heap produces a descending order.

Auxiliary Structrue Approach

A straightforward implemantation allocates seperate memory for an independent heap object. Each input element is pushed sequentially, followed by repeated popping to overwrite the original buffer. This method operates in $O(N \log N)$ time but demands extra memory allocation and data duplication, making it unsuitable for memory-constrained environments.

Optimized In-Place Rearrangement

Eliminating auxiliary storage requires operating directly on the target array. To achieve ascending order, the routine first transforms the raw data into a maximum-heap. The root node (containing the largest value) is swapped with the final element, effectively locking the maximum value into its correct sorted position. The active heap boundary shrinks, and the displaced root undergoes a downward restructuring process to restore heap invariants. Repeating this cycle until the boundary collapses sorts the entire array in $O(N \log N)$ time with $O(1)$ auxiliary space.

#include <stdio.h>
#include <stdlib.h>

static void exchange_elements(int* p_left, int* p_right) {
    int temporary = *p_left;
    *p_left = *p_right;
    *p_right = temporary;
}

static void percolate_down(int* buffer, int heap_limit, int pivot_idx) {
    while (1) {
        int target = pivot_idx;
        int left = 2 * pivot_idx + 1;
        int right = 2 * pivot_idx + 2;

        if (left < heap_limit && buffer[left] > buffer[target]) {
            target = left;
        }
        if (right < heap_limit && buffer[right] > buffer[target]) {
            target = right;
        }
        if (target == pivot_idx) {
            break;
        }
        exchange_elements(&buffer[pivot_idx], &buffer[target]);
        pivot_idx = target;
    }
}

void arrange_ascending(int* sequence, int length) {
    // Build maximum-heap from bottom up
    for (int cursor = length / 2 - 1; cursor >= 0; --cursor) {
        percolate_down(sequence, length, cursor);
    }

    // Extract and restructure
    for (int boundary = length - 1; boundary > 0; --boundary) {
        exchange_elements(&sequence[0], &sequence[boundary]);
        percolate_down(sequence, boundary, 0);
    }
}

int main() {
    int dataset[] = {12, 5, 8, 19, 3, 7, 14, 6, 9};
    int total = sizeof(dataset) / sizeof(dataset[0]);

    arrange_ascending(dataset, total);

    for (int idx = 0; idx < total; ++idx) {
        printf("%d ", dataset[idx]);
    }
    return 0;
}

Top-K Element Selection from Massive Streams

Identifying the largest K items from a dataset containing N entries becomes computationally expensive when N is extremely large. Loading the entire collection into memory and performing a complete sort is impractical. A fixed-capacity minimum-heap provides an optimal streaming solution.

The procedure begins by populating a min-heap of size K with the initial batch of elements. As subsequent values arrive from the data stream, each candidate is evaluated against the heap's root, which represents the current K-th largest threshold. If a new value surpasses the root, it overwrites it, and the structure is immediately re-balanced using a downward sift. This guarantees that the heap continuously retains the largest K items encountered. Once the stream ends, the heap holds the exact top-K subset.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SAMPLE_COUNT 1000000
#define VALUE_RANGE  10000000

static void populate_test_data(const char* target_path) {
    srand((unsigned int)time(NULL));
    FILE* output = fopen(target_path, "w");
    if (!output) {
        perror("Failed to create data file");
        exit(EXIT_FAILURE);
    }

    for (int i = 0; i < SAMPLE_COUNT; ++i) {
        fprintf(output, "%d\n", rand() % VALUE_RANGE);
    }
    fclose(output);
}

static void restore_min_property(int* heap, int capacity, int start_node) {
    int current = start_node;
    while (1) {
        int smallest = current;
        int left_child = 2 * current + 1;
        int right_child = 2 * current + 2;

        if (left_child < capacity && heap[left_child] < heap[smallest]) {
            smallest = left_child;
        }
        if (right_child < capacity && heap[right_child] < heap[smallest]) {
            smallest = right_child;
        }
        if (smallest == current) {
            break;
        }
        int swap_val = heap[current];
        heap[current] = heap[smallest];
        heap[smallest] = swap_val;
        current = smallest;
    }
}

void fetch_top_k(const char* source_path, int k) {
    FILE* input = fopen(source_path, "r");
    if (!input) {
        perror("Failed to open source file");
        exit(EXIT_FAILURE);
    }

    int* candidate_pool = (int*)malloc(k * sizeof(int));
    if (!candidate_pool) {
        perror("Memory allocation failed");
        exit(EXIT_FAILURE);
    }

    // Initialize pool with first K records
    for (int pos = 0; pos < k; ++pos) {
        if (fscanf(input, "%d", &candidate_pool[pos]) != 1) break;
    }

    // Transform pool into min-heap
    for (int pivot = k / 2 - 1; pivot >= 0; --pivot) {
        restore_min_property(candidate_pool, k, pivot);
    }

    // Process remaining stream
    int incoming_value;
    while (fscanf(input, "%d", &incoming_value) == 1) {
        if (incoming_value > candidate_pool[0]) {
            candidate_pool[0] = incoming_value;
            restore_min_property(candidate_pool, k, 0);
        }
    }
    fclose(input);

    // Display extracted results
    for (int idx = 0; idx < k; ++idx) {
        printf("%d ", candidate_pool[idx]);
    }
    printf("\n");
    free(candidate_pool);
}

int main() {
    const char* file_ref = "stream_data.txt";
    populate_test_data(file_ref);
    fetch_top_k(file_ref, 7);
    return 0;
}

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.