Computing Unique Minimal Prefixes for a Set of Words
Givan a colection of words, the objective is to detremine the shortest unique prefix for each word. A prefix is a substring starting from the first character. For instance, the word "carbon" has prefixes: "c", "ca", "car", "carb", "carbo", and "carbon". An exact match takes precedence over a prefix match; if a word is exactly "car", its shortest unique prefix is "car", not a shorter prefix that might also start other words.
Input consists of 2 to 1000 lines, each containing a lowercase word with length between 1 and 20. Output must have the same number of lines, each displaying the original word followed by its unambiguous shortest prefix.
Example Input:
carbohydrate
cart
carburetor
caramel
caribou
carbonic
cartilage
carbon
carriage
carton
car
carbonate
Example Output:
carbohydrate carboh
cart cart
carburetor carbu
caramel cara
caribou cari
carbonic carboni
cartilage carti
carbon carbon
carriage carr
carton carto
car car
carbonate carbona
A Trie (prefix tree) data structure is effective for this problem. Each node in the trie records the count of words passing through it. By traversing the trie for each word, we can stop at the first character where the node count is 1, indicating uniqueness.
Implementation using a Trie:
#include <stdio.h>
#include <string.h>
#define MAX_NODES 2500
#define ALPHABET 26
int node_count[MAX_NODES];
int trie[MAX_NODES][ALPHABET];
int node_index = 0;
int word_count = 0;
char words[1005][25];
void add_to_trie(char *word) {
int current = 0;
int idx, length = strlen(word);
for (int i = 0; i < length; i++) {
idx = word[i] - 'a';
if (trie[current][idx] == 0) {
trie[current][idx] = ++node_index;
}
current = trie[current][idx];
node_count[current]++;
}
}
void find_prefix(char *word) {
int current = 0;
int idx, length = strlen(word);
printf("%s ", word);
for (int i = 0; i < length; i++) {
printf("%c", word[i]);
idx = word[i] - 'a';
if (node_count[trie[current][idx]] == 1) {
break;
}
current = trie[current][idx];
}
}
int main() {
while (scanf("%s", words[word_count]) != EOF) {
add_to_trie(words[word_count]);
word_count++;
}
for (int i = 0; i < word_count; i++) {
find_prefix(words[i]);
if (i != word_count - 1) printf("\n");
}
return 0;
}
The algorithm proecsses input until EOF. Each word is inserted into the trie, incrementing counters for each node traversed. To find the shortest unique prefix, we traverse the trie again for the word, printing characters until we reach a node with a count of 1, which signifies no other word shares this prefix path.