Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Sorting Algorithms and Their Implementation

Tech 1

Sorting Fundamentals

Sorting rearranges elements in a list so that they follow a specific order based on their keys.

Evaluation Metrics

  • Time Complexity: Number of operations required.
  • Space Complexity: Memory usage during execution.
  • Stability: Maintains relative order of equal elements after sorting.

An algorithm is stable if two records R_i and R_j with identical keys maintain their original sequence when R_i precedes R_j before sorting.

Classsification

  • Internal Sorting: All data fits into memory; focus on minimizing time and space complexity.
  • External Sorting: Data exceeds memory capacity; emphasis on reducing disk I/O operations.

For comparison-based algorithms, at least (\log_2(n!)) comparisons are necessary due to the number of permutations involved.

Internal Sorting Techniques

Insertion Sort Variants

Straight Insertion Sort

This method builds the sorted array one element at a time by inserting each new item into its correct position within an already-sorted sublist.

void straightInsertionSort(int arr[], int size) {
    for (int i = 1; i < size; ++i) {
        if (arr[i] < arr[i - 1]) {
            int key = arr[i];
            int j = i - 1;
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                --j;
            }
            arr[j + 1] = key;
        }
    }
}

Using a sentinel simplifies boundary checks:

void insertionWithSentinel(int arr[], int size) {
    for (int i = 2; i <= size; ++i) {
        if (arr[i] < arr[i - 1]) {
            arr[0] = arr[i];
            int j = i - 1;
            while (arr[0] < arr[j]) {
                arr[j + 1] = arr[j];
                --j;
            }
            arr[j + 1] = arr[0];
        }
    }
}

Performance characteristics:

  • Space: (O(1))
  • Best-case Time: (O(n)) (already sorted)
  • Worst/Average-case Time: (O(n^2))
  • Stable

Suitable for both sequential and linked storage structures.

Binary Insertion Sort

Optimizes search phase using binary search but maintains same movement cost.

void binaryInsertionSort(int arr[], int size) {
    for (int i = 2; i <= size; ++i) {
        arr[0] = arr[i];
        int low = 1, high = i - 1;
        while (low <= high) {
            int mid = (low + high) / 2;
            if (arr[mid] > arr[0])
                high = mid - 1;
            else
                low = mid + 1;
        }
        for (int j = i - 1; j >= high + 1; --j)
            arr[j + 1] = arr[j];
        arr[high + 1] = arr[0];
    }
}

Each pass requires (O(\log m)) comparisons where (m) is current subsequence length. Total remains (O(n^2)).

Only applicable to sequential storage.

Shell Sort

Divides input into interleaved subsequences which are individually sorted via insertion sort, progressively reducing gap size until (d = 1).

void shellSort(int arr[], int size) {
    for (int gap = size / 2; gap >= 1; gap /= 2) {
        for (int i = gap + 1; i <= size; ++i) {
            if (arr[i] < arr[i - gap]) {
                arr[0] = arr[i];
                int j = i - gap;
                while (j > 0 && arr[0] < arr[j]) {
                    arr[j + gap] = arr[j];
                    j -= gap;
                }
                arr[j + gap] = arr[0];
            }
        }
    }
}

Time depends heavily on chosen increment sequence. Worst case is (O(n^2)), best can reach (O(n^{1.3})). Unstable and limited to sequential access.

Exchange-Based Methods

Bubble Sort

Adjacent pairs are compared from start to end, swapping whenever out-of-order. Each iteration places at least one element correctly.

void bubbleSort(int arr[], int size) {
    for (int i = 0; i < size - 1; ++i) {
        bool swapped = false;
        for (int j = size - 1; j > i; --j) {
            if (arr[j - 1] > arr[j]) {
                int temp = arr[j - 1];
                arr[j - 1] = arr[j];
                arr[j] = temp;
                swapped = true;
            }
        }
        if (!swapped) break;
    }
}

Metrics:

  • Space: (O(1))
  • Best-case Time: (O(n))
  • Worst/Average-case Time: (O(n^2))
  • Stable

Works well with arrays and lists alike.

Quick Sort

Employs divide-and-conquer strategy. Selects a pivot value, partitions array around it such that smaller values precede larger ones, then recursively sorts partitions.

int partition(int arr[], int left, int right) {
    int pivotValue = arr[left];
    while (left < right) {
        while (left < right && arr[right] >= pivotValue)
            --right;
        arr[left] = arr[right];
        while (left < right && arr[left] <= pivotValue)
            ++left;
        arr[right] = arr[left];
    }
    arr[left] = pivotValue;
    return left;
}

void quickSort(int arr[], int low, int high) {
    if (low < high) {
        int pivotIndex = partition(arr, low, high);
        quickSort(arr, low, pivotIndex - 1);
        quickSort(arr, pivotIndex + 1, high);
    }
}

Analysis:

  • Average Time: (O(n \log n))
  • Worst-case Time: (O(n^2))
  • Space: (O(\log n)) average, up to (O(n)) worst-case recursion depth
  • Not stable

Limited to contiguous memory layouts.

Selection Strategies

Simple Selection Sort

In each round, identifies the smallest unsorted element and appends it to the ordered section.

void selectionSort(int arr[], int size) {
    for (int i = 0; i < size - 1; ++i) {
        int minIdx = i;
        for (int j = i + 1; j < size; ++j) {
            if (arr[j] < arr[minIdx])
                minIdx = j;
        }
        if (minIdx != i) {
            int temp = arr[i];
            arr[i] = arr[minIdx];
            arr[minIdx] = temp;
        }
    }
}

Features:

  • Time: (O(n^2))
  • Space: (O(1))
  • Unstable due to potential reordering of equal items
  • Supports both array and linked list representations

Heap Sort

Utilizes heap properties to efficiently extract extreme values. Begins by building a max-heap, repeatedly extracting root and rebuilding heap.

void buildMaxHeap(int arr[], int len) {
    for (int i = len / 2; i > 0; --i)
        adjustHeap(arr, i, len);
}

void adjustHeap(int arr[], int index, int len) {
    arr[0] = arr[index];
    for (int child = 2 * index; child <= len; child *= 2) {
        if (child < len && arr[child] < arr[child + 1])
            ++child;
        if (arr[0] >= arr[child])
            break;
        arr[index] = arr[child];
        index = child;
    }
    arr[index] = arr[0];
}

void heapSort(int arr[], int len) {
    buildMaxHeap(arr, len);
    for (int i = len; i > 1; --i) {
        int temp = arr[1];
        arr[1] = arr[i];
        arr[i] = temp;
        adjustHeap(arr, 1, i - 1);
    }
}

Efficiency profile:

  • Time: (O(n \log n))
  • Space: (O(1))
  • Unstable
  • Sequential-only

Supports efficient insertions (bubble-up) and deletions (sink-down).

Merging and Non-comparison Approaches

Merge Sort

Combines multiple pre-sorted sequences into one. For two-way merge, compare heads of both inputs and move lesser item forward.

int* auxiliaryBuffer;

void merge(int arr[], int low, int mid, int high) {
    for (int k = low; k <= high; ++k)
        auxiliaryBuffer[k] = arr[k];
    
    int i = low, j = mid + 1, k = low;
    while (i <= mid && j <= high) {
        if (auxiliaryBuffer[i] <= auxiliaryBuffer[j])
            arr[k++] = auxiliaryBuffer[i++];
        else
            arr[k++] = auxiliaryBuffer[j++];
    }
    while (i <= mid)
        arr[k++] = auxiliaryBuffer[i++];
    while (j <= high)
        arr[k++] = auxiliaryBuffer[j++];
}

void mergeSort(int arr[], int low, int high) {
    if (low < high) {
        int mid = (low + high) / 2;
        mergeSort(arr, low, mid);
        mergeSort(arr, mid + 1, high);
        merge(arr, low, mid, high);
    }
}

Characteristics:

  • Time: (O(n \log n))
  • Space: (O(n))
  • Stable
  • Compatible with arrays and linked lists

Radix Sort

Orders elements digit-by-digit starting from least significant digit. Requires queue-like buckets per radix digit.

Typically implemented using linked structures:

typedef struct Node {
    int data;
    struct Node* next;
} Node;

typedef struct Queue {
    Node* front;
    Node* rear;
} Queue;

Process involves distributing elements across queues based on digits followed by collecting them back in order.

Performence metrics:

  • Space: (O(r)) where (r) denotes base/radix
  • Time: (O(d(n + r))) independent of initial arrangement
  • Stable
  • Suitable for integers with small ranges

Cannot handle floating point types directly.

Counting Sort

Counts occurrences of distinct elements and uses cumulative counts to determine positions.

void countingSort(int input[], int output[], int countArray[], int length, int maxValue) {
    for (int i = 0; i <= maxValue; ++i)
        countArray[i] = 0;
    
    for (int i = 0; i < length; ++i)
        ++countArray[input[i]];
    
    for (int i = 1; i <= maxValue; ++i)
        countArray[i] += countArray[i - 1];
    
    for (int i = length - 1; i >= 0; --i) {
        output[countArray[input[i]] - 1] = input[i];
        --countArray[input[i]];
    }
}

Analysis:

  • Space: (O(n + k)) where (k) represents range of input values
  • Time: (O(n + k))
  • Stable
  • Efficient only when (k) is not significantly larger than (n)

External Sorting Mechanisms

Handles datasets too large for internal memory through file buffering techniques involving reading/writing blocks.

Core principle revolves around merge sorting phases:

  1. Generate initial sorted runs via internal sorting of buffered chunks.
  2. Perform successive merges combining pairs of runs into longer ordered segments.

Example: With 32 total blocks and 3 passes, total I/O would be (322 + 323 = 128) operations.

Major bottleneck lies in excessive disk interactions.

Multiway Merging Enhancements

Increases number of simultaneous input streams during merging stage to reduce overall passes needed.

Four-way merge example reduces I/O overhead:

Total reads/writes become (32 + 32*2 = 96) operations instead of more under standard two-way approach.

Trade-offs involve increased buffer allocation demands and higher internal comparison costs during selection stages.

To minimize run count:( r = \lceil N/L \rceil ), where (N) is record count and (L) working area size.

Balanced multiway merge ensures uniform distribution of merged segments per round.

Tournament Trees for Optimization

Reduces comparisons during multi-stream selections using loser trees which track losing entries upward toward root node.

Construction requires (k - 1) initial comparisons among leaves; subsequent selections take just (\lceil \log_2 k \rceil) steps.

Replacement Selection Technique

Improves run generation efficiency by selecting minimal candidates dynamically rather then statically batching fixed-size groups.

Steps include:

  1. Load first batch into work area.
  2. Extract minimum element.
  3. Output selected item.
  4. Read replacement entry if available.
  5. Choose next minimum greater than previous output.
  6. Repeat until no valid replacements remain.
  7. Finalize segment and repeat process.

Goal is producing fewer yet longer runs for improved downstream performance.

Optimal Merge Tree Construction

Similar to Huffman coding principles, constructs weighted path-length minimized tree given varying run lengths.

Total I/O equates to twice computed WPL value representing sum product of leaf weights times depths.

When constructing k-ary optimal trees, dummy zero-length placeholders may be added to ensure proper structural balance satisfying modulo conditions.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.