Encoding Java Strings into GBK Format
Internally, the JVM reprseents String objects using UTF-16 character encoding. When interacting with legacy systems or specific regional formats, such as the GBK standard (an extension of GB2312 that includes Traditional Chinese and other symbols), explicit charset conversion becomes necessary. The String.getBytes(Charset) method extracts the text into a byte sequence using the specified encoding.
import java.nio.charset.Charset;
String message = "Welcome, 世界";
byte[] gbkEncodedBytes = message.getBytes(Charset.forName("GBK"));
String reconstructedMessage = new String(gbkEncodedBytes, Charset.forName("GBK"));
The process transitions the internal UTF-16 representation into a byte array formated according to the GBK standard, which can then be decoded back into a Java String.
stateDiagram-v2
[*] --> InMemoryRepresentation
InMemoryRepresentation --> UTF16
UTF16 --> ByteExtraction
ByteExtraction --> GBKByteArray
GBKByteArray --> [*]
sequenceDiagram
participant App as Application
participant Str as Java String (UTF-16)
participant ByteArr as Byte Array (GBK)
App->>Str: Provide text
Str->>ByteArr: getBytes(Charset.forName("GBK"))
ByteArr->>App: Return GBK encoded bytes
App->>ByteArr: Construct new String
ByteArr->>Str: Decode bytes back to UTF-16