Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Practical Methods for Classifying Chinese Characters, Latin Letters, and Digits in Strings

Notes 1

The charCodeAt() method retrieves the 16-bit Unicode code unit value for a character at a specified index in a string. Character classification can be performed by checking which of the following standard Unicode ranges the code unit falls into:

  1. Uppercase Latin letters (A-Z): 65 to 90
  2. Lowercase Latin letters (a-z): 97 to 122
  3. Arabic numerals (0-9): 48 to 57
  4. Common simplified Chinese characters: 19968 (U+4E00) to 40869 (U+9FA5)

The code snippet below prints each character in a sample string alongside its corresponding code unit:

var sampleText:String = "greeting! 世界你好! 42!";
for (var index:int = 0; index < sampleText.length; index++) {
    trace(sampleText.charAt(index), " | ", sampleText.charCodeAt(index));
}

Character and Code Unit Conversion

Convert character to Unicode code unit

Syntax: stringVariable.charCodeAt(positionIndex)

var testChar:String = "B";
trace(testChar.charCodeAt(0)); // Output: 66

Convert Unicode code unit to character

Syntax: String.fromCharCode(codeUnitValue)

var convertedChar:String = String.fromCharCode(122);
trace(convertedChar); // Output: z

Method 1: Per-Character Iteration Validation

This approach traverses every character in the input string and checks if it falls with in the allowed Unicode ranges:

/**
 * Validates if all characters in input are Chinese, Latin letters, or digits
 * @param inputStr String to validate
 * @return True if all characters meet requirements, false otherwise
 */
public function checkValidCharSet(inputStr:String):Boolean {
    for (var pos:int = 0; pos < inputStr.length; pos++) {
        var charCode:Number = inputStr.charCodeAt(pos);
        const isChinese:Boolean = charCode >= 19968 && charCode <= 40869;
        const isDigit:Boolean = charCode >= 48 && charCode <= 57;
        const isUppercaseLetter:Boolean = charCode >= 65 && charCode <= 90;
        const isLowercaseLetter:Boolean = charCode >= 97 && charCode <= 122;
        
        if (!isChinese && !isDigit && !isUppercaseLetter && !isLowercaseLetter) {
            return false;
        }
    }
    return true;
}

Method 2: Regular Expression Validation

This method uses a precompiled regular expression to perform full string validation in a single operation:

/**
 * Validates if all characters in input are Chinese, Latin letters, or digits
 * @param inputStr String to validate
 * @return True if all characters meet requirements, false otherwise
 */
public function validateAlnumChinese(inputStr:String):Boolean {
    var validationRegex:RegExp = /^[a-zA-Z0-9\u4e00-\u9fa5]+$/;
    return validationRegex.test(inputStr);
}

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.