Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Character, String, and Memory Functions in C

Tech 1

Character Classification Functions

C provides a comprehensive set of functions for character classification through the <ctype.h> header. These functions determine the category of a given character.

Classification Functions Reference

Function Condition for True Return
iscntrl Control characters
isspace Whitespace: space ' ', form feed '\f', carriage return '\r', tab '\t', or vertical tab '\v'
isdigit Decimal digits 0 through 9
isxdigit Hexadecimal digits: 0-9, a-f, A-F
islower Lowercase letters a through z
isupper Uppercase letters A through Z
isalpha Alphabetic letters a-z or A-Z
isalnum Alphanumeric characters (letters and digits)
isgraph Graphic characters (printable, non-space)
isprint Printable characters (including space)

All these functions follow a similar patttern: they accept an int argument (typically a char promoted to int) and return a non-zero value if the character matches the classification, or zero otherwise.

Usage Example: Case Conversion

The following example converts all lowercase letters in a string to uppercase while leaving other characters unchanged:

#include <stdio.h>
#include <ctype.h>

int main(void) {
    char message[] = "Test String.\n";
    size_t idx = 0;
    
    while (message[idx] != '\0') {
        if (islower((unsigned char)message[idx])) {
            message[idx] = toupper(message[idx]);
        }
        idx++;
    }
    
    printf("%s", message);
    return 0;
}

Another common pattern involves checking for uppercase and converting to lowercase:

#include <stdio.h>
#include <ctype.h>

int main(void) {
    char text[] = "Test String.\n";
    
    for (size_t i = 0; text[i] != '\0'; i++) {
        char ch = text[i];
        if (isupper((unsigned char)ch)) {
            ch = tolower(ch);
        }
        putchar(ch);
    }
    return 0;
}

Character Conversion Functions

C provides two standard character conversion functions:

int tolower(int c);  // Converts uppercase to lowercase
int toupper(int c);  // Converts lowercase to uppercase

These functions return the converted character if applicable, or the original character if no conversion is needed.

Note: Some compilers provide non-standard extensions like strlwr() (defined in <string.h> on certain systems) for string-wide conversion, though these are not portable.

String Length Functions

strlen()

Prototype:

size_t strlen(const char *str);

The strlen() function calculates the length of a null-terminated string by counting characters until it encounters the null terminator \0.

Key Considerations:

  1. Null Terminator Required: The input string must be null-terminated. Initializing a character array with individual characters without a terminating \0 will cause undefined behavior (typically reading until a random null byte is found in memory).

  2. Unsigned Return Type: The return type size_t is an unsigned integer. This leads to subtle bugs when comparing lengths:

#include <stdio.h>
#include <string.h>

int main(void) {
    // This prints ">" because strlen returns size_t (unsigned)
    // 3 - 6 = -3, but as unsigned, this becomes a large positive number
    if (strlen("abc") - strlen("abcdef") > 0) {
        printf(">\n");
    } else {
        printf("<=\n");
    }
    return 0;
}

Implementations of strlen()

Method 1: Counter Approach

size_t string_length_count(const char *text) {
    size_t count = 0;
    while (text[count] != '\0') {
        count++;
    }
    return count;
}

Method 2: Recursive Approach

size_t string_length_recursive(const char *text) {
    if (*text == '\0') {
        return 0;
    }
    return 1 + string_length_recursive(text + 1);
}

Method 3: Pointer Arithmetic

size_t string_length_pointer(const char *text) {
    const char *start = text;
    while (*text) {
        text++;
    }
    return (size_t)(text - start);
}

Unbounded String Functions

These functions operate on entire strings without explicit length limits, relying on null terminators.

strcpy()

Prototype:

char *strcpy(char *destination, const char *source);

Copies the source string (including the null terminator) to the destination buffer.

Requirements:

  • Source must be null-terminated
  • Destination must be large enough to hold the source
  • Destination must be modifiable (not a string literal or constant memory)

Implementation:

#include <assert.h>

char *string_copy(char *dest, const char *src) {
    assert(dest != NULL && src != NULL);
    char *result = dest;
    
    while ((*dest++ = *src++) != '\0') {
        ; // Copy including null terminator
    }
    
    return result;
}

strcat()

Prototype:

char *strcat(char *destination, const char *source);

Appends the source string to the destination string, starting at the destination's null terminator. The resulting string is null-terminated.

Critical Limitation: You cannot concatenate a string to itself safely, as the source null terminator gets overwritten during the operation, causing infinite reading past the buffer.

Implementation:

#include <assert.h>

char *string_concat(char *dest, const char *src) {
    assert(dest && src);
    char *start = dest;
    
    // Find end of destination
    while (*dest != '\0') {
        dest++;
    }
    
    // Copy source to end of destination
    while ((*dest++ = *src++) != '\0') {
        ;
    }
    
    return start;
}

strcmp()

Prototype:

int strcmp(const char *str1, const char *str2);

Compares two strings lexicographically. Returns:

  • < 0 if str1 is less than str2
  • 0 if strings are equal
  • > 0 if str1 is greater than str2

Implementation:

#include <assert.h>

int string_compare(const char *s1, const char *s2) {
    assert(s1 && s2);
    
    while (*s1 == *s2) {
        if (*s1 == '\0') {
            return 0; // Equal strings
        }
        s1++;
        s2++;
    }
    
    return (unsigned char)*s1 - (unsigned char)*s2;
}

Bounded String Functions

These safer variants accept a maximum length parameter to prevent buffer overflows.

strncpy()

Prototype:

char *strncpy(char *destination, const char *source, size_t num);

Copies exactly num characters. If the source is shorter than num, the remainder is padded with null bytes. If the source is longer, the result is not null-terminated!

Implementation:

#include <assert.h>

char *bounded_copy(char *dest, const char *src, size_t num) {
    assert(dest && src);
    char *start = dest;
    
    while (num > 0 && *src != '\0') {
        *dest++ = *src++;
        num--;
    }
    
    // Pad remaining space with nulls if source was shorter than num
    while (num > 0) {
        *dest++ = '\0';
        num--;
    }
    
    return start;
}

strncat()

Prototype:

char *strncat(char *destination, const char *source, size_t num);

Appends at most num characters from source (plus a null terminator). Always null-terminates the result.

Implementation:

#include <assert.h>

char *bounded_concat(char *dest, const char *src, size_t num) {
    assert(dest && src);
    char *start = dest;
    
    // Move to end of destination
    while (*dest != '\0') {
        dest++;
    }
    
    // Copy up to num characters
    while (num > 0 && *src != '\0') {
        *dest++ = *src++;
        num--;
    }
    
    // Always null-terminate
    *dest = '\0';
    return start;
}

strncmp()

Prototype:

int strncmp(const char *str1, const char *str2, size_t num);

Compares at most num characters of two strings.

Implementation:

#include <assert.h>

int bounded_compare(const char *s1, const char *s2, size_t num) {
    assert(s1 && s2);
    
    if (num == 0) return 0;
    
    while (num > 0 && *s1 == *s2) {
        if (*s1 == '\0' || num == 1) {
            return 0;
        }
        s1++;
        s2++;
        num--;
    }
    
    return (unsigned char)*s1 - (unsigned char)*s2;
}

String Search Functions

strstr()

Prototype:

char *strstr(const char *haystack, const char *needle);

Finds the first occurrence of the substring needle within haystack. Returns a pointer to the beginning of the found substring, or NULL if not found.

Brute Force Implementation:

#include <assert.h>

const char *find_substring(const char *text, const char *pattern) {
    assert(text && pattern);
    
    if (*pattern == '\0') return text; // Empty pattern matches at start
    
    const char *current = text;
    
    while (*current != '\0') {
        const char *t = current;
        const char *p = pattern;
        
        while (*p != '\0' && *t == *p) {
            t++;
            p++;
        }
        
        if (*p == '\0') {
            return current; // Match found
        }
        
        current++;
    }
    
    return NULL;
}

Note: For production use with large strings, consider the KMP algorithm for better performance.

strtok()

Prototype:

char *strtok(char *str, const char *delimiters);

Tokenizes a string by splitting it at specified delimiter characters. This function modifies the input string by inserting null terminators.

Usage Pattern:

#include <stdio.h>
#include <string.h>

int main(void) {
    char data[] = "user@example.com";
    const char *separators = "@.";
    
    // Create a copy to preserve original
    char buffer[50];
    strcpy(buffer, data);
    
    // Tokenize using for loop idiom
    char *token;
    for (token = strtok(buffer, separators); 
         token != NULL; 
         token = strtok(NULL, separators)) {
        printf("Token: %s\n", token);
    }
    
    return 0;
}

Key Behavior:

  • First call: Pass the string to tokenize
  • Subsequent calls: Pass NULL to continue with the same string
  • Returns NULL when no more tokens exist

Error Reporting Functions

strerror()

Prototype:

char *strerror(int errnum);

Returns a pointer to a string describing the error code passed in errnum. Commonly used with the global errno variable set by system calls.

Practical Example:

#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(void) {
    FILE *file = fopen("nonexistent.txt", "r");
    
    if (file == NULL) {
        printf("Error opening file: %s\n", strerror(errno));
        return 1;
    }
    
    // Process file...
    fclose(file);
    return 0;
}

Memory Manipulation Functions

These functions operate on raw bytes (void*) rather than strings, making them suitable for any data type.

memcpy()

Prototype:

void *memcpy(void *destination, const void *source, size_t num);

Copies num bytes from source to destination. Behavior is undefined if memory regions overlap.

Implementation:

#include <assert.h>

void *memory_copy(void *dest, const void *src, size_t num) {
    assert(dest && src);
    unsigned char *d = dest;
    const unsigned char *s = src;
    
    while (num--) {
        *d++ = *s++;
    }
    
    return dest;
}

memmove()

Prototype:

void *memmove(void *destination, const void *source, size_t num);

Similar to memcpy(), but safely handles overlapping memory regions by choosing the copy direction (forward or backward) based on address comparison.

Implementation:

#include <assert.h>

void *memory_move(void *dest, const void *src, size_t num) {
    assert(dest && src);
    unsigned char *d = dest;
    const unsigned char *s = src;
    
    if (d < s) {
        // Copy forward
        while (num--) {
            *d++ = *s++;
        }
    } else {
        // Copy backward to avoid overlap corruption
        d += num;
        s += num;
        while (num--) {
            *--d = *--s;
        }
    }
    
    return dest;
}

memset()

Prototype:

void *memset(void *ptr, int value, size_t num);

Fills the first num bytes of memory pointed to by ptr with the constant byte value.

Important Caveat: When initializing integer arrays, remember that memset sets bytes, not integers. Setting int arr[10] with memset(arr, 1, sizeof(arr)) does not set each element to 1, but sets each byte to 1, resulting in 0x01010101 (16843009 in decimal).

int main(void) {
    int values[10];
    // Correct: Zero out memory
    memset(values, 0, sizeof(values));
    
    // Incorrect for setting to 1:
    // memset(values, 1, sizeof(values)); // Each element becomes 0x01010101
    return 0;
}

memcmp()

Prototype:

int memcmp(const void *ptr1, const void *ptr2, size_t num);

Compares the first num bytes of two memory regions. Returns values similar to strcmp().

#include <stdio.h>

int main(void) {
    int a[] = {1, 2, 3};
    int b[] = {1, 3, 2};
    
    // Compare first 12 bytes (3 integers on typical systems)
    int result = memcmp(a, b, sizeof(int) * 3);
    
    if (result < 0) {
        printf("a < b\n");
    } else if (result > 0) {
        printf("a > b\n");
    } else {
        printf("a == b\n");
    }
    
    return 0;
}

Note: On little-endian systems (like x86), byte-wise comparison of multi-byte integers may yield different results than integer comparison due to byte order.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.