Understanding MySQL VARCHAR Length Units: Character-Based Storage Explained
Character vs Byte: The Fundamental Difference
A character represents a human-readable symbol—the abstract unit of written language. A Chinese character like '中', an English letter like 'a', and a digit like '1' each count as one character.
A byte is the basic unit of digital storage in computers—an 8-bit binary sequence (such as 0x4F or 0xA3).
The critical distinction: different character encodings consume different amounts of storage space per character.
VARCHAR Length Definition in MySQL
MySQL's VARCHAR(n) declaration specifies the maximum number of characters, not bytes. This means you can store up to n characters regardless of whether each character occupies 1, 2, or 3 bytes under your chosen encoding.
UTF-8 Encoding Behavior
CREATE TABLE user_accounts (
user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
username VARCHAR(10) NOT NULL DEFAULT '',
PRIMARY KEY (user_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Inserting mixed content:
INSERT INTO user_accounts (username) VALUES
('Alice'),
('李明'),
('HelloWorld'),
('数据库教程');
Query results using length functions:
SELECT
user_id,
username,
LENGTH(username) AS byte_count,
CHAR_LENGTH(username) AS char_count
FROM user_accounts;
| user_id | username | byte_count | char_count |
|---|---|---|---|
| 1 | Alice | 5 | 5 |
| 2 | 李明 | 6 | 2 |
| 3 | HelloWorld | 10 | 10 |
| 4 | 数据库教程 | 12 | 4 |
Key observations for UTF-8:
- English letters and ASCII characters: 1 byte per character
- Chinese characters: 3 bytes per character
LENGTH()measures bytes;CHAR_LENGTH()measures characters
GBK Encoding Behavior
CREATE TABLE product_catalog (
product_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
product_name VARCHAR(10) CHARACTER SET gbk COLLATE gbk_bin NOT NULL,
PRIMARY KEY (product_id)
) ENGINE=InnoDB DEFAULT CHARSET=gbk;
INSERT INTO product_catalog (product_name) VALUES
('手机'),
('Phone'),
('平板电脑');
SELECT
product_id,
product_name,
LENGTH(product_name) AS byte_count,
CHAR_LENGTH(product_name) AS char_count
FROM product_catalog;
| product_id | product_name | byte_count | char_count |
|---|---|---|---|
| 1 | 手机 | 4 | 2 |
| 2 | Phone | 5 | 5 |
| 3 | 平板电脑 | 8 | 4 |
Key observations for GBK:
- English letters and ASCII characters: 1 byte per character
- Chinese characters: 2 bytes per character
Maximum VARCHAR Lengths by Encoding
The row size limit in MySQL is 65,535 bytes (shared across all columns). This constraint directly impacts maximum character capacity:
| Encoding | Bytes per Character | Maximum Characters |
|---|---|---|
| latin1 | 1 | 65,533 |
| gbk | 2 | 32,766 |
| utf8 | 3 | 21,845 |
| utf8mb4 | 4 | 16,383 |
Note: The actual limit is 65,532 bytes (not 65,535) because VARCHAR columns require 2 bytes to store the length prefix.
Version Differences in Storage Rules
MySQL 4.x and earlier:
VARCHAR(20)meant 20 bytes- UTF-8 text with 3-byte characters: only 6 characters fit in 20 bytes
MySQL 5.x and later:
VARCHAR(20)means 20 characters- Maximum row size: 65,532 bytes (after accounting for overhead)
CHAR vs VARCHAR: Storage Mechanisms
CHAR is a fixed-length type:
- Always occupies exactly n bytes per value
- Trailing spaces shorter than n are padded with spaces
- Spaces are trimmed upon retrieval
- Suitable for short, fixed-length data (UUIDs, state codes, phone prefixes)
VARCHAR is a variable-length type:
- Stores only the actual bytes needed plus 1-2 bytes for length metadata
- No padding; stores data as-is
- Ideal for varible-length content (names, descriptions, addresses)
-- CHAR storage example
CHAR(10) storing 'ABC' → 'ABC ' (10 bytes, spaces padded)
-- VARCHAR storage example
VARCHAR(10) storing 'ABC' → 'ABC' (4 bytes: 1 length byte + 3 content bytes)
MySQL Row Storage Rules
- If all columns have fixed lengths, every row has the same size
- If any column has variable length, rows may have different sizes
- MySQL may automatically convert fixed-length columns to variable-length when row size becomes variable—except CHARR columns under 4 characters, which remain fixed-length by design
These storage characteristics affect index performance and page utilization in InnoDB and MyISAM engines.