Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Implementing a Trie-Based Sensitive Word Filter in Java

Notes 2

1. Required Dependencies

To implement the sensitive word filter efficiently, we utilize the Apache Commons Lang library for character handling and Lombok for logging. Spring's utility class is also used for string validation.

import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.CharUtils;
import org.springframework.util.StringUtils;

import java.util.HashMap;
import java.util.Map;

2. Initializing the Trie Structure

Before processing text, we must populate the prefix tree (Trie) with prohibited terms. The insertKeyword method iterates through each character of a sensitive word, creating nodes as necessary.

private void insertKeyword(String word) {
    TrieNode currentNode = rootNode;
    for (int i = 0; i < word.length(); i++) {
        char character = word.charAt(i);
        TrieNode childNode = currentNode.getSubNodes().get(character);
        
        if (childNode == null) {
            childNode = new TrieNode();
            currentNode.getSubNodes().put(character, childNode);
        }
        
        currentNode = childNode;
        
        // Mark the end of a sensitive word
        if (i == word.length() - 1) {
            currentNode.setKeywordEnd(true);
        }
    }
}

3. Detecting Sensitive Words in Text

The containsSensitiveWord method traverses the input string using a sliding window approach (pointers start and scan). It ignores special symbols defined by the isIgnorableChar helper method to prevent bypass attempts using separators.

public boolean containsSensitiveWord(String content) {
    if (StringUtils.isEmpty(content)) {
        return false;
    }

    TrieNode currentNode = rootNode;
    int start = 0;
    int scan = 0;

    while (start < content.length()) {
        char character = content.charAt(scan);

        // Skip non-alphanumeric symbols (e.g., #, *, -)
        if (isIgnorableChar(character)) {
            // If at the root, move the start pointer forward
            if (currentNode == rootNode) {
                start++;
            }
            // Always move the scan pointer forward
            scan++;
            continue;
        }

        // Check if the current character path exists in the Trie
        currentNode = currentNode.getSubNodes().get(character);

        if (currentNode == null) {
            // No match found, reset and move start pointer
            scan = ++start;
            currentNode = rootNode;
        } else if (currentNode.isKeywordEnd()) {
            // Found a sensitive word
            return true;
        } else {
            // Partial match found, continue scanning
            scan++;
        }
    }

    return false;
}

/**
 * Determines if a character should be ignored during matching (e.g., punctuation).
 */
private boolean isIgnorableChar(char c) {
    return !CharUtils.isAsciiAlphanumeric(c);
}

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.