Implementing a Trie-Based Sensitive Word Filter in Java
1. Required Dependencies
To implement the sensitive word filter efficiently, we utilize the Apache Commons Lang library for character handling and Lombok for logging. Spring's utility class is also used for string validation.
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.CharUtils;
import org.springframework.util.StringUtils;
import java.util.HashMap;
import java.util.Map;
2. Initializing the Trie Structure
Before processing text, we must populate the prefix tree (Trie) with prohibited terms. The insertKeyword method iterates through each character of a sensitive word, creating nodes as necessary.
private void insertKeyword(String word) {
TrieNode currentNode = rootNode;
for (int i = 0; i < word.length(); i++) {
char character = word.charAt(i);
TrieNode childNode = currentNode.getSubNodes().get(character);
if (childNode == null) {
childNode = new TrieNode();
currentNode.getSubNodes().put(character, childNode);
}
currentNode = childNode;
// Mark the end of a sensitive word
if (i == word.length() - 1) {
currentNode.setKeywordEnd(true);
}
}
}
3. Detecting Sensitive Words in Text
The containsSensitiveWord method traverses the input string using a sliding window approach (pointers start and scan). It ignores special symbols defined by the isIgnorableChar helper method to prevent bypass attempts using separators.
public boolean containsSensitiveWord(String content) {
if (StringUtils.isEmpty(content)) {
return false;
}
TrieNode currentNode = rootNode;
int start = 0;
int scan = 0;
while (start < content.length()) {
char character = content.charAt(scan);
// Skip non-alphanumeric symbols (e.g., #, *, -)
if (isIgnorableChar(character)) {
// If at the root, move the start pointer forward
if (currentNode == rootNode) {
start++;
}
// Always move the scan pointer forward
scan++;
continue;
}
// Check if the current character path exists in the Trie
currentNode = currentNode.getSubNodes().get(character);
if (currentNode == null) {
// No match found, reset and move start pointer
scan = ++start;
currentNode = rootNode;
} else if (currentNode.isKeywordEnd()) {
// Found a sensitive word
return true;
} else {
// Partial match found, continue scanning
scan++;
}
}
return false;
}
/**
* Determines if a character should be ignored during matching (e.g., punctuation).
*/
private boolean isIgnorableChar(char c) {
return !CharUtils.isAsciiAlphanumeric(c);
}