Home > Notes > Content

Practical Methods for Classifying Chinese Characters, Latin Letters, and Digits in Strings

Notes Apr 18 9

The charCodeAt() method retrieves the 16-bit Unicode code unit value for a character at a specified index in a string. Character classification can be performed by checking which of the following standard Unicode ranges the code unit falls into:

Uppercase Latin letters (A-Z): 65 to 90
Lowercase Latin letters (a-z): 97 to 122
Arabic numerals (0-9): 48 to 57
Common simplified Chinese characters: 19968 (U+4E00) to 40869 (U+9FA5)

The code snippet below prints each character in a sample string alongside its corresponding code unit:

var sampleText:String = "greeting! 世界你好! 42!";
for (var index:int = 0; index < sampleText.length; index++) {
    trace(sampleText.charAt(index), " | ", sampleText.charCodeAt(index));
}

Character and Code Unit Conversion

Convert character to Unicode code unit

Syntax: stringVariable.charCodeAt(positionIndex)

var testChar:String = "B";
trace(testChar.charCodeAt(0)); // Output: 66

Convert Unicode code unit to character

Syntax: String.fromCharCode(codeUnitValue)

var convertedChar:String = String.fromCharCode(122);
trace(convertedChar); // Output: z

Method 1: Per-Character Iteration Validation

This approach traverses every character in the input string and checks if it falls with in the allowed Unicode ranges:

/**
 * Validates if all characters in input are Chinese, Latin letters, or digits
 * @param inputStr String to validate
 * @return True if all characters meet requirements, false otherwise
 */
public function checkValidCharSet(inputStr:String):Boolean {
    for (var pos:int = 0; pos < inputStr.length; pos++) {
        var charCode:Number = inputStr.charCodeAt(pos);
        const isChinese:Boolean = charCode >= 19968 && charCode <= 40869;
        const isDigit:Boolean = charCode >= 48 && charCode <= 57;
        const isUppercaseLetter:Boolean = charCode >= 65 && charCode <= 90;
        const isLowercaseLetter:Boolean = charCode >= 97 && charCode <= 122;
        
        if (!isChinese && !isDigit && !isUppercaseLetter && !isLowercaseLetter) {
            return false;
        }
    }
    return true;
}

Method 2: Regular Expression Validation

This method uses a precompiled regular expression to perform full string validation in a single operation:

/**
 * Validates if all characters in input are Chinese, Latin letters, or digits
 * @param inputStr String to validate
 * @return True if all characters meet requirements, false otherwise
 */
public function validateAlnumChinese(inputStr:String):Boolean {
    var validationRegex:RegExp = /^[a-zA-Z0-9\u4e00-\u9fa5]+$/;
    return validationRegex.test(inputStr);
}

Back to List

Prev: Real-Time Human Fall Detection Using Convolutional Neural Networks and YOLOv5

Next: Achieving Horizontal and Vertical Centering in CSS

Fading Coder

Practical Methods for Classifying Chinese Characters, Latin Letters, and Digits in Strings

Character and Code Unit Conversion

Convert character to Unicode code unit

Convert Unicode code unit to character

Method 1: Per-Character Iteration Validation

Method 2: Regular Expression Validation

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Skipping Errors in MySQL Asynchronous Replication

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Practical Methods for Classifying Chinese Characters, Latin Letters, and Digits in Strings

Character and Code Unit Conversion

Convert character to Unicode code unit

Convert Unicode code unit to character

Method 1: Per-Character Iteration Validation

Method 2: Regular Expression Validation

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Skipping Errors in MySQL Asynchronous Replication

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment