Practical Methods for Classifying Chinese Characters, Latin Letters, and Digits in Strings
The charCodeAt() method retrieves the 16-bit Unicode code unit value for a character at a specified index in a string. Character classification can be performed by checking which of the following standard Unicode ranges the code unit falls into:
- Uppercase Latin letters (A-Z): 65 to 90
- Lowercase Latin letters (a-z): 97 to 122
- Arabic numerals (0-9): 48 to 57
- Common simplified Chinese characters: 19968 (U+4E00) to 40869 (U+9FA5)
The code snippet below prints each character in a sample string alongside its corresponding code unit:
var sampleText:String = "greeting! 世界你好! 42!";
for (var index:int = 0; index < sampleText.length; index++) {
trace(sampleText.charAt(index), " | ", sampleText.charCodeAt(index));
}
Character and Code Unit Conversion
Convert character to Unicode code unit
Syntax: stringVariable.charCodeAt(positionIndex)
var testChar:String = "B";
trace(testChar.charCodeAt(0)); // Output: 66
Convert Unicode code unit to character
Syntax: String.fromCharCode(codeUnitValue)
var convertedChar:String = String.fromCharCode(122);
trace(convertedChar); // Output: z
Method 1: Per-Character Iteration Validation
This approach traverses every character in the input string and checks if it falls with in the allowed Unicode ranges:
/**
* Validates if all characters in input are Chinese, Latin letters, or digits
* @param inputStr String to validate
* @return True if all characters meet requirements, false otherwise
*/
public function checkValidCharSet(inputStr:String):Boolean {
for (var pos:int = 0; pos < inputStr.length; pos++) {
var charCode:Number = inputStr.charCodeAt(pos);
const isChinese:Boolean = charCode >= 19968 && charCode <= 40869;
const isDigit:Boolean = charCode >= 48 && charCode <= 57;
const isUppercaseLetter:Boolean = charCode >= 65 && charCode <= 90;
const isLowercaseLetter:Boolean = charCode >= 97 && charCode <= 122;
if (!isChinese && !isDigit && !isUppercaseLetter && !isLowercaseLetter) {
return false;
}
}
return true;
}
Method 2: Regular Expression Validation
This method uses a precompiled regular expression to perform full string validation in a single operation:
/**
* Validates if all characters in input are Chinese, Latin letters, or digits
* @param inputStr String to validate
* @return True if all characters meet requirements, false otherwise
*/
public function validateAlnumChinese(inputStr:String):Boolean {
var validationRegex:RegExp = /^[a-zA-Z0-9\u4e00-\u9fa5]+$/;
return validationRegex.test(inputStr);
}