Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Computing Unique Minimal Prefixes for a Set of Words

Tech 2

Givan a colection of words, the objective is to detremine the shortest unique prefix for each word. A prefix is a substring starting from the first character. For instance, the word "carbon" has prefixes: "c", "ca", "car", "carb", "carbo", and "carbon". An exact match takes precedence over a prefix match; if a word is exactly "car", its shortest unique prefix is "car", not a shorter prefix that might also start other words.

Input consists of 2 to 1000 lines, each containing a lowercase word with length between 1 and 20. Output must have the same number of lines, each displaying the original word followed by its unambiguous shortest prefix.

Example Input:

carbohydrate
cart
carburetor
caramel
caribou
carbonic
cartilage
carbon
carriage
carton
car
carbonate

Example Output:

carbohydrate carboh
cart cart
carburetor carbu
caramel cara
caribou cari
carbonic carboni
cartilage carti
carbon carbon
carriage carr
carton carto
car car
carbonate carbona

A Trie (prefix tree) data structure is effective for this problem. Each node in the trie records the count of words passing through it. By traversing the trie for each word, we can stop at the first character where the node count is 1, indicating uniqueness.

Implementation using a Trie:

#include <stdio.h>
#include <string.h>

#define MAX_NODES 2500
#define ALPHABET 26

int node_count[MAX_NODES];
int trie[MAX_NODES][ALPHABET];
int node_index = 0;
int word_count = 0;
char words[1005][25];

void add_to_trie(char *word) {
    int current = 0;
    int idx, length = strlen(word);
    for (int i = 0; i < length; i++) {
        idx = word[i] - 'a';
        if (trie[current][idx] == 0) {
            trie[current][idx] = ++node_index;
        }
        current = trie[current][idx];
        node_count[current]++;
    }
}

void find_prefix(char *word) {
    int current = 0;
    int idx, length = strlen(word);
    printf("%s ", word);
    for (int i = 0; i < length; i++) {
        printf("%c", word[i]);
        idx = word[i] - 'a';
        if (node_count[trie[current][idx]] == 1) {
            break;
        }
        current = trie[current][idx];
    }
}

int main() {
    while (scanf("%s", words[word_count]) != EOF) {
        add_to_trie(words[word_count]);
        word_count++;
    }
    for (int i = 0; i < word_count; i++) {
        find_prefix(words[i]);
        if (i != word_count - 1) printf("\n");
    }
    return 0;
}

The algorithm proecsses input until EOF. Each word is inserted into the trie, incrementing counters for each node traversed. To find the shortest unique prefix, we traverse the trie again for the word, printing characters until we reach a node with a count of 1, which signifies no other word shares this prefix path.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.