C Language Fundamentals: Data Types
Data Types
Every piece of data in C has a type, and the compiler must know the data type to perform operations. A 'type' refers to the common characteristics of similar data. Once the data type of a value is known, its properties and operation methods can be determined.
The basic data types are: character (char), integer (int), and floating point (float). More complex types are built upon these.
Character Type
The character type represents a single character, declared using the char keyword.
char c = 'B';
This example declares varible c as a character type and assigns it the value 'B'.
In C, character constants must be enclosed in single quotes.
Internally, the character type uses one byte (8 bits) for storage. C treats it as an integer, so the character type is essentially an 8-bit integer. Each character corresponds to an integer (determined by ASCII code), such as 'B' corresponding to integer 66.
The default range for character types varies across systems. Some systems use -128 to 127, while others use 0 to 255. Both ranges cover the ASCII character range from 0 to 127.
As long as within the character type's range, integers and characters can be interchanged and assigned to character variables.
char c = 66;
// equivalent to
char c = 'B';
In this example, the variable c is a character type, and it is assigned the integer 66. This has the same effect as assigning the character 'B'.
Two character variables can perform mathematical operations.
char a = 'B'; // equivalent to char a = 66;
char b = 'C'; // equivalent to char b = 67;
printf("%d\n", a + b); // outputs 133
In this example, the character variables a and b are added, treated as two integers. The %d placeholder indicates output as a decimal number, resulting in 133.
A single quote itself is also a character. To represent this character constant, it must be escaped with a backslash.
char t = '\'';
In this example, the variable t is a single quote character. Since character constants must be enclosed in single quotes, the internal single quote must be escaped with a backslash.
This escaping syntax is mainly used to represent some non-printable control characters defined by ASCII, which are also values of the character type.
\a: Alert, which causes the terminal to make an alert sound or flash, or both.\b: Backspace, moves the cursor back one character without deleting it.\f: Form feed, moves the cursor to the next page. On modern systems, this behavior is now similar to\v.\n: Newline character.\r: Carriage return, moves the cursor to the beginning of the same line.\t: Tab character, moves the cursor to the next horizontal tab position, usually a multiple of 8.\v: Vertical tab, moves the curser to the next vertical tab position, typically the same column on the next line.\0: Null character, representing no content. Note that this value is not equal to the number 0.
Escape notation can also use octal and hexadecimal representations for a character.
\nn: Octal representation of a character, wherennis an octal value.\xnn: Hexadecimal representation of a character, wherennis a hexadecimal value.
char x = 'B';
char x = 66;
char x = '\102'; // octal
char x = '\x42'; // hexadecimal
The four ways shown above are equivalent.
Integer Type
Overview
Integer types are used to represent larger integers, declared using the int keyword.
int a;
This example declares an integer variable a.
The size of the int type varies across different computers. Commonly, it uses 4 bytes (32 bits) to store an int value, but it could also use 2 bytes (16 bits) or 8 bytes (64 bits). The integer ranges they can represent are as follows:
- 16-bit: -32,768 to 32,767.
- 32-bit: -2,147,483,648 to 2,147,483,647.
- 64-bit: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
signed, unsigned
C uses the signed keyword to indicate a type that includes negative values, while the unsigned keyword indicates a type that does not include negative values and only represents zero and positive integers.
For the int type, the default is signed, meaning int is equivalent to signed int. Since this is the default, the signed keyword is usually omitted, but including it is not incorrect.
signed int a;
// equivalent to
int a;
The int type can also be unsigned, representing only non-negative integers. In this case, the unsigned keyword must be used to declare the variable.
unsigned int a;
Declaring an integer variable as unsigned increases the maximum value that can be represented with the same memory langth by a factor of two. For example, the maximum value for a 16-bit signed int is 32,767, whereas for an unsigned int, it is 65,535.
The int in unsigned int can be omitted, so the variable declaration can also be written as follows.
unsigned a;
The char type can also be set to signed or unsigned.
signed char c; // range from -128 to 127
unsigned char c; // range from 0 to 255
Note that C language specifies whether the char type is signed or unsigned depends on the current system. This means char is not equivalent to signed char; it might be signed char or unsigned char. This differs from int, which is equivalent to signed int.
Subtypes of Integers
If the int type uses 4 or 8 bytes to represent an integer, it may waste space for small integers. On the other hand, some scenarios require larger integers, and 8 bytes may not be enough. To solve these issues, C provides three subtypes of the int type, allowing more precise limitation of the integer variable range and better expression of code intent.
short int(abbreviated asshort): occupies no more thanint, typically 2 bytes (range -32768 to 32767).long int(abbreviated aslong): occupies no less thanint, at least 4 bytes.long long int(abbreviated aslong long): occupies more thanlong, at least 8 bytes.
short int a;
long int b;
long long int c;
The code above declares variables of three different integer subtypes.
By default, short, long, and long long are signed (i.e., the signed keyword is omitted). They can also be declared as unsigned to double the maximum value they can represent.
unsigned short int a;
unsigned long int b;
unsigned long long int c;
C allows omission of int, so the variable declarations can also be written as follows.
short a;
unsigned short a;
long b;
unsigned long b;
long long c;
unsigned long long c;
Different computers have different byte lengths for data types. When a 32-bit integer is needed, use the long type instead of int to ensure at least 4 bytes. When a 64-bit integer is required, use the long long type to ensure at least 8 bytes. On the other hand, when only a 16-bit integer is needed, use the short type; when an 8-bit integer is needed, use the char type.
Limits of Integer Types
Sometimes, it is necessary to check the maximum and minimum values of different integer types on the current system. The C header file limits.h provides these values.