Sorting Algorithms and Their Implementation
Sorting Fundamentals
Sorting rearranges elements in a list so that they follow a specific order based on their keys.
Evaluation Metrics
- Time Complexity: Number of operations required.
- Space Complexity: Memory usage during execution.
- Stability: Maintains relative order of equal elements after sorting.
An algorithm is stable if two records R_i and R_j with identical keys maintain their original sequence when R_i precedes R_j before sorting.
Classsification
- Internal Sorting: All data fits into memory; focus on minimizing time and space complexity.
- External Sorting: Data exceeds memory capacity; emphasis on reducing disk I/O operations.
For comparison-based algorithms, at least (\log_2(n!)) comparisons are necessary due to the number of permutations involved.
Internal Sorting Techniques
Insertion Sort Variants
Straight Insertion Sort
This method builds the sorted array one element at a time by inserting each new item into its correct position within an already-sorted sublist.
void straightInsertionSort(int arr[], int size) {
for (int i = 1; i < size; ++i) {
if (arr[i] < arr[i - 1]) {
int key = arr[i];
int j = i - 1;
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
--j;
}
arr[j + 1] = key;
}
}
}
Using a sentinel simplifies boundary checks:
void insertionWithSentinel(int arr[], int size) {
for (int i = 2; i <= size; ++i) {
if (arr[i] < arr[i - 1]) {
arr[0] = arr[i];
int j = i - 1;
while (arr[0] < arr[j]) {
arr[j + 1] = arr[j];
--j;
}
arr[j + 1] = arr[0];
}
}
}
Performance characteristics:
- Space: (O(1))
- Best-case Time: (O(n)) (already sorted)
- Worst/Average-case Time: (O(n^2))
- Stable
Suitable for both sequential and linked storage structures.
Binary Insertion Sort
Optimizes search phase using binary search but maintains same movement cost.
void binaryInsertionSort(int arr[], int size) {
for (int i = 2; i <= size; ++i) {
arr[0] = arr[i];
int low = 1, high = i - 1;
while (low <= high) {
int mid = (low + high) / 2;
if (arr[mid] > arr[0])
high = mid - 1;
else
low = mid + 1;
}
for (int j = i - 1; j >= high + 1; --j)
arr[j + 1] = arr[j];
arr[high + 1] = arr[0];
}
}
Each pass requires (O(\log m)) comparisons where (m) is current subsequence length. Total remains (O(n^2)).
Only applicable to sequential storage.
Shell Sort
Divides input into interleaved subsequences which are individually sorted via insertion sort, progressively reducing gap size until (d = 1).
void shellSort(int arr[], int size) {
for (int gap = size / 2; gap >= 1; gap /= 2) {
for (int i = gap + 1; i <= size; ++i) {
if (arr[i] < arr[i - gap]) {
arr[0] = arr[i];
int j = i - gap;
while (j > 0 && arr[0] < arr[j]) {
arr[j + gap] = arr[j];
j -= gap;
}
arr[j + gap] = arr[0];
}
}
}
}
Time depends heavily on chosen increment sequence. Worst case is (O(n^2)), best can reach (O(n^{1.3})). Unstable and limited to sequential access.
Exchange-Based Methods
Bubble Sort
Adjacent pairs are compared from start to end, swapping whenever out-of-order. Each iteration places at least one element correctly.
void bubbleSort(int arr[], int size) {
for (int i = 0; i < size - 1; ++i) {
bool swapped = false;
for (int j = size - 1; j > i; --j) {
if (arr[j - 1] > arr[j]) {
int temp = arr[j - 1];
arr[j - 1] = arr[j];
arr[j] = temp;
swapped = true;
}
}
if (!swapped) break;
}
}
Metrics:
- Space: (O(1))
- Best-case Time: (O(n))
- Worst/Average-case Time: (O(n^2))
- Stable
Works well with arrays and lists alike.
Quick Sort
Employs divide-and-conquer strategy. Selects a pivot value, partitions array around it such that smaller values precede larger ones, then recursively sorts partitions.
int partition(int arr[], int left, int right) {
int pivotValue = arr[left];
while (left < right) {
while (left < right && arr[right] >= pivotValue)
--right;
arr[left] = arr[right];
while (left < right && arr[left] <= pivotValue)
++left;
arr[right] = arr[left];
}
arr[left] = pivotValue;
return left;
}
void quickSort(int arr[], int low, int high) {
if (low < high) {
int pivotIndex = partition(arr, low, high);
quickSort(arr, low, pivotIndex - 1);
quickSort(arr, pivotIndex + 1, high);
}
}
Analysis:
- Average Time: (O(n \log n))
- Worst-case Time: (O(n^2))
- Space: (O(\log n)) average, up to (O(n)) worst-case recursion depth
- Not stable
Limited to contiguous memory layouts.
Selection Strategies
Simple Selection Sort
In each round, identifies the smallest unsorted element and appends it to the ordered section.
void selectionSort(int arr[], int size) {
for (int i = 0; i < size - 1; ++i) {
int minIdx = i;
for (int j = i + 1; j < size; ++j) {
if (arr[j] < arr[minIdx])
minIdx = j;
}
if (minIdx != i) {
int temp = arr[i];
arr[i] = arr[minIdx];
arr[minIdx] = temp;
}
}
}
Features:
- Time: (O(n^2))
- Space: (O(1))
- Unstable due to potential reordering of equal items
- Supports both array and linked list representations
Heap Sort
Utilizes heap properties to efficiently extract extreme values. Begins by building a max-heap, repeatedly extracting root and rebuilding heap.
void buildMaxHeap(int arr[], int len) {
for (int i = len / 2; i > 0; --i)
adjustHeap(arr, i, len);
}
void adjustHeap(int arr[], int index, int len) {
arr[0] = arr[index];
for (int child = 2 * index; child <= len; child *= 2) {
if (child < len && arr[child] < arr[child + 1])
++child;
if (arr[0] >= arr[child])
break;
arr[index] = arr[child];
index = child;
}
arr[index] = arr[0];
}
void heapSort(int arr[], int len) {
buildMaxHeap(arr, len);
for (int i = len; i > 1; --i) {
int temp = arr[1];
arr[1] = arr[i];
arr[i] = temp;
adjustHeap(arr, 1, i - 1);
}
}
Efficiency profile:
- Time: (O(n \log n))
- Space: (O(1))
- Unstable
- Sequential-only
Supports efficient insertions (bubble-up) and deletions (sink-down).
Merging and Non-comparison Approaches
Merge Sort
Combines multiple pre-sorted sequences into one. For two-way merge, compare heads of both inputs and move lesser item forward.
int* auxiliaryBuffer;
void merge(int arr[], int low, int mid, int high) {
for (int k = low; k <= high; ++k)
auxiliaryBuffer[k] = arr[k];
int i = low, j = mid + 1, k = low;
while (i <= mid && j <= high) {
if (auxiliaryBuffer[i] <= auxiliaryBuffer[j])
arr[k++] = auxiliaryBuffer[i++];
else
arr[k++] = auxiliaryBuffer[j++];
}
while (i <= mid)
arr[k++] = auxiliaryBuffer[i++];
while (j <= high)
arr[k++] = auxiliaryBuffer[j++];
}
void mergeSort(int arr[], int low, int high) {
if (low < high) {
int mid = (low + high) / 2;
mergeSort(arr, low, mid);
mergeSort(arr, mid + 1, high);
merge(arr, low, mid, high);
}
}
Characteristics:
- Time: (O(n \log n))
- Space: (O(n))
- Stable
- Compatible with arrays and linked lists
Radix Sort
Orders elements digit-by-digit starting from least significant digit. Requires queue-like buckets per radix digit.
Typically implemented using linked structures:
typedef struct Node {
int data;
struct Node* next;
} Node;
typedef struct Queue {
Node* front;
Node* rear;
} Queue;
Process involves distributing elements across queues based on digits followed by collecting them back in order.
Performence metrics:
- Space: (O(r)) where (r) denotes base/radix
- Time: (O(d(n + r))) independent of initial arrangement
- Stable
- Suitable for integers with small ranges
Cannot handle floating point types directly.
Counting Sort
Counts occurrences of distinct elements and uses cumulative counts to determine positions.
void countingSort(int input[], int output[], int countArray[], int length, int maxValue) {
for (int i = 0; i <= maxValue; ++i)
countArray[i] = 0;
for (int i = 0; i < length; ++i)
++countArray[input[i]];
for (int i = 1; i <= maxValue; ++i)
countArray[i] += countArray[i - 1];
for (int i = length - 1; i >= 0; --i) {
output[countArray[input[i]] - 1] = input[i];
--countArray[input[i]];
}
}
Analysis:
- Space: (O(n + k)) where (k) represents range of input values
- Time: (O(n + k))
- Stable
- Efficient only when (k) is not significantly larger than (n)
External Sorting Mechanisms
Handles datasets too large for internal memory through file buffering techniques involving reading/writing blocks.
Core principle revolves around merge sorting phases:
- Generate initial sorted runs via internal sorting of buffered chunks.
- Perform successive merges combining pairs of runs into longer ordered segments.
Example: With 32 total blocks and 3 passes, total I/O would be (322 + 323 = 128) operations.
Major bottleneck lies in excessive disk interactions.
Multiway Merging Enhancements
Increases number of simultaneous input streams during merging stage to reduce overall passes needed.
Four-way merge example reduces I/O overhead:
Total reads/writes become (32 + 32*2 = 96) operations instead of more under standard two-way approach.
Trade-offs involve increased buffer allocation demands and higher internal comparison costs during selection stages.
To minimize run count:( r = \lceil N/L \rceil ), where (N) is record count and (L) working area size.
Balanced multiway merge ensures uniform distribution of merged segments per round.
Tournament Trees for Optimization
Reduces comparisons during multi-stream selections using loser trees which track losing entries upward toward root node.
Construction requires (k - 1) initial comparisons among leaves; subsequent selections take just (\lceil \log_2 k \rceil) steps.
Replacement Selection Technique
Improves run generation efficiency by selecting minimal candidates dynamically rather then statically batching fixed-size groups.
Steps include:
- Load first batch into work area.
- Extract minimum element.
- Output selected item.
- Read replacement entry if available.
- Choose next minimum greater than previous output.
- Repeat until no valid replacements remain.
- Finalize segment and repeat process.
Goal is producing fewer yet longer runs for improved downstream performance.
Optimal Merge Tree Construction
Similar to Huffman coding principles, constructs weighted path-length minimized tree given varying run lengths.
Total I/O equates to twice computed WPL value representing sum product of leaf weights times depths.
When constructing k-ary optimal trees, dummy zero-length placeholders may be added to ensure proper structural balance satisfying modulo conditions.