Character, String, and Memory Functions in C
Character Classification Functions
C provides a comprehensive set of functions for character classification through the <ctype.h> header. These functions determine the category of a given character.
Classification Functions Reference
| Function | Condition for True Return |
|---|---|
iscntrl |
Control characters |
isspace |
Whitespace: space ' ', form feed '\f', carriage return '\r', tab '\t', or vertical tab '\v' |
isdigit |
Decimal digits 0 through 9 |
isxdigit |
Hexadecimal digits: 0-9, a-f, A-F |
islower |
Lowercase letters a through z |
isupper |
Uppercase letters A through Z |
isalpha |
Alphabetic letters a-z or A-Z |
isalnum |
Alphanumeric characters (letters and digits) |
isgraph |
Graphic characters (printable, non-space) |
isprint |
Printable characters (including space) |
All these functions follow a similar patttern: they accept an int argument (typically a char promoted to int) and return a non-zero value if the character matches the classification, or zero otherwise.
Usage Example: Case Conversion
The following example converts all lowercase letters in a string to uppercase while leaving other characters unchanged:
#include <stdio.h>
#include <ctype.h>
int main(void) {
char message[] = "Test String.\n";
size_t idx = 0;
while (message[idx] != '\0') {
if (islower((unsigned char)message[idx])) {
message[idx] = toupper(message[idx]);
}
idx++;
}
printf("%s", message);
return 0;
}
Another common pattern involves checking for uppercase and converting to lowercase:
#include <stdio.h>
#include <ctype.h>
int main(void) {
char text[] = "Test String.\n";
for (size_t i = 0; text[i] != '\0'; i++) {
char ch = text[i];
if (isupper((unsigned char)ch)) {
ch = tolower(ch);
}
putchar(ch);
}
return 0;
}
Character Conversion Functions
C provides two standard character conversion functions:
int tolower(int c); // Converts uppercase to lowercase
int toupper(int c); // Converts lowercase to uppercase
These functions return the converted character if applicable, or the original character if no conversion is needed.
Note: Some compilers provide non-standard extensions like strlwr() (defined in <string.h> on certain systems) for string-wide conversion, though these are not portable.
String Length Functions
strlen()
Prototype:
size_t strlen(const char *str);
The strlen() function calculates the length of a null-terminated string by counting characters until it encounters the null terminator \0.
Key Considerations:
-
Null Terminator Required: The input string must be null-terminated. Initializing a character array with individual characters without a terminating
\0will cause undefined behavior (typically reading until a random null byte is found in memory). -
Unsigned Return Type: The return type
size_tis an unsigned integer. This leads to subtle bugs when comparing lengths:
#include <stdio.h>
#include <string.h>
int main(void) {
// This prints ">" because strlen returns size_t (unsigned)
// 3 - 6 = -3, but as unsigned, this becomes a large positive number
if (strlen("abc") - strlen("abcdef") > 0) {
printf(">\n");
} else {
printf("<=\n");
}
return 0;
}
Implementations of strlen()
Method 1: Counter Approach
size_t string_length_count(const char *text) {
size_t count = 0;
while (text[count] != '\0') {
count++;
}
return count;
}
Method 2: Recursive Approach
size_t string_length_recursive(const char *text) {
if (*text == '\0') {
return 0;
}
return 1 + string_length_recursive(text + 1);
}
Method 3: Pointer Arithmetic
size_t string_length_pointer(const char *text) {
const char *start = text;
while (*text) {
text++;
}
return (size_t)(text - start);
}
Unbounded String Functions
These functions operate on entire strings without explicit length limits, relying on null terminators.
strcpy()
Prototype:
char *strcpy(char *destination, const char *source);
Copies the source string (including the null terminator) to the destination buffer.
Requirements:
- Source must be null-terminated
- Destination must be large enough to hold the source
- Destination must be modifiable (not a string literal or constant memory)
Implementation:
#include <assert.h>
char *string_copy(char *dest, const char *src) {
assert(dest != NULL && src != NULL);
char *result = dest;
while ((*dest++ = *src++) != '\0') {
; // Copy including null terminator
}
return result;
}
strcat()
Prototype:
char *strcat(char *destination, const char *source);
Appends the source string to the destination string, starting at the destination's null terminator. The resulting string is null-terminated.
Critical Limitation: You cannot concatenate a string to itself safely, as the source null terminator gets overwritten during the operation, causing infinite reading past the buffer.
Implementation:
#include <assert.h>
char *string_concat(char *dest, const char *src) {
assert(dest && src);
char *start = dest;
// Find end of destination
while (*dest != '\0') {
dest++;
}
// Copy source to end of destination
while ((*dest++ = *src++) != '\0') {
;
}
return start;
}
strcmp()
Prototype:
int strcmp(const char *str1, const char *str2);
Compares two strings lexicographically. Returns:
< 0ifstr1is less thanstr20if strings are equal> 0ifstr1is greater thanstr2
Implementation:
#include <assert.h>
int string_compare(const char *s1, const char *s2) {
assert(s1 && s2);
while (*s1 == *s2) {
if (*s1 == '\0') {
return 0; // Equal strings
}
s1++;
s2++;
}
return (unsigned char)*s1 - (unsigned char)*s2;
}
Bounded String Functions
These safer variants accept a maximum length parameter to prevent buffer overflows.
strncpy()
Prototype:
char *strncpy(char *destination, const char *source, size_t num);
Copies exactly num characters. If the source is shorter than num, the remainder is padded with null bytes. If the source is longer, the result is not null-terminated!
Implementation:
#include <assert.h>
char *bounded_copy(char *dest, const char *src, size_t num) {
assert(dest && src);
char *start = dest;
while (num > 0 && *src != '\0') {
*dest++ = *src++;
num--;
}
// Pad remaining space with nulls if source was shorter than num
while (num > 0) {
*dest++ = '\0';
num--;
}
return start;
}
strncat()
Prototype:
char *strncat(char *destination, const char *source, size_t num);
Appends at most num characters from source (plus a null terminator). Always null-terminates the result.
Implementation:
#include <assert.h>
char *bounded_concat(char *dest, const char *src, size_t num) {
assert(dest && src);
char *start = dest;
// Move to end of destination
while (*dest != '\0') {
dest++;
}
// Copy up to num characters
while (num > 0 && *src != '\0') {
*dest++ = *src++;
num--;
}
// Always null-terminate
*dest = '\0';
return start;
}
strncmp()
Prototype:
int strncmp(const char *str1, const char *str2, size_t num);
Compares at most num characters of two strings.
Implementation:
#include <assert.h>
int bounded_compare(const char *s1, const char *s2, size_t num) {
assert(s1 && s2);
if (num == 0) return 0;
while (num > 0 && *s1 == *s2) {
if (*s1 == '\0' || num == 1) {
return 0;
}
s1++;
s2++;
num--;
}
return (unsigned char)*s1 - (unsigned char)*s2;
}
String Search Functions
strstr()
Prototype:
char *strstr(const char *haystack, const char *needle);
Finds the first occurrence of the substring needle within haystack. Returns a pointer to the beginning of the found substring, or NULL if not found.
Brute Force Implementation:
#include <assert.h>
const char *find_substring(const char *text, const char *pattern) {
assert(text && pattern);
if (*pattern == '\0') return text; // Empty pattern matches at start
const char *current = text;
while (*current != '\0') {
const char *t = current;
const char *p = pattern;
while (*p != '\0' && *t == *p) {
t++;
p++;
}
if (*p == '\0') {
return current; // Match found
}
current++;
}
return NULL;
}
Note: For production use with large strings, consider the KMP algorithm for better performance.
strtok()
Prototype:
char *strtok(char *str, const char *delimiters);
Tokenizes a string by splitting it at specified delimiter characters. This function modifies the input string by inserting null terminators.
Usage Pattern:
#include <stdio.h>
#include <string.h>
int main(void) {
char data[] = "user@example.com";
const char *separators = "@.";
// Create a copy to preserve original
char buffer[50];
strcpy(buffer, data);
// Tokenize using for loop idiom
char *token;
for (token = strtok(buffer, separators);
token != NULL;
token = strtok(NULL, separators)) {
printf("Token: %s\n", token);
}
return 0;
}
Key Behavior:
- First call: Pass the string to tokenize
- Subsequent calls: Pass
NULLto continue with the same string - Returns
NULLwhen no more tokens exist
Error Reporting Functions
strerror()
Prototype:
char *strerror(int errnum);
Returns a pointer to a string describing the error code passed in errnum. Commonly used with the global errno variable set by system calls.
Practical Example:
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(void) {
FILE *file = fopen("nonexistent.txt", "r");
if (file == NULL) {
printf("Error opening file: %s\n", strerror(errno));
return 1;
}
// Process file...
fclose(file);
return 0;
}
Memory Manipulation Functions
These functions operate on raw bytes (void*) rather than strings, making them suitable for any data type.
memcpy()
Prototype:
void *memcpy(void *destination, const void *source, size_t num);
Copies num bytes from source to destination. Behavior is undefined if memory regions overlap.
Implementation:
#include <assert.h>
void *memory_copy(void *dest, const void *src, size_t num) {
assert(dest && src);
unsigned char *d = dest;
const unsigned char *s = src;
while (num--) {
*d++ = *s++;
}
return dest;
}
memmove()
Prototype:
void *memmove(void *destination, const void *source, size_t num);
Similar to memcpy(), but safely handles overlapping memory regions by choosing the copy direction (forward or backward) based on address comparison.
Implementation:
#include <assert.h>
void *memory_move(void *dest, const void *src, size_t num) {
assert(dest && src);
unsigned char *d = dest;
const unsigned char *s = src;
if (d < s) {
// Copy forward
while (num--) {
*d++ = *s++;
}
} else {
// Copy backward to avoid overlap corruption
d += num;
s += num;
while (num--) {
*--d = *--s;
}
}
return dest;
}
memset()
Prototype:
void *memset(void *ptr, int value, size_t num);
Fills the first num bytes of memory pointed to by ptr with the constant byte value.
Important Caveat: When initializing integer arrays, remember that memset sets bytes, not integers. Setting int arr[10] with memset(arr, 1, sizeof(arr)) does not set each element to 1, but sets each byte to 1, resulting in 0x01010101 (16843009 in decimal).
int main(void) {
int values[10];
// Correct: Zero out memory
memset(values, 0, sizeof(values));
// Incorrect for setting to 1:
// memset(values, 1, sizeof(values)); // Each element becomes 0x01010101
return 0;
}
memcmp()
Prototype:
int memcmp(const void *ptr1, const void *ptr2, size_t num);
Compares the first num bytes of two memory regions. Returns values similar to strcmp().
#include <stdio.h>
int main(void) {
int a[] = {1, 2, 3};
int b[] = {1, 3, 2};
// Compare first 12 bytes (3 integers on typical systems)
int result = memcmp(a, b, sizeof(int) * 3);
if (result < 0) {
printf("a < b\n");
} else if (result > 0) {
printf("a > b\n");
} else {
printf("a == b\n");
}
return 0;
}
Note: On little-endian systems (like x86), byte-wise comparison of multi-byte integers may yield different results than integer comparison due to byte order.