Home > Tech > Content

A Quick Guide to C++ Hash Tables and Maps

Tech May 10 2

When solving competitive programming problems, duplicate detectino is a frequent requirement. The C++ standard library provides map and unordered_map for this purpose, while the policy-based data structures libray offers cc_hash_table and gp_hash_table as alternatives.mapThe map container is implemented as a red-black tree. Both insertion and lookup operatinos have a time complexity of O(log n).cpp map<int,bool> mp; int count; cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; mp[val]=true; } cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; if(mp.count(val)) cout<<"found\n"; else cout<<"not found\n"; } Input5 1 2 4 5 6 5 1 2 3 4 5Outputfound found not found found foundThe count() method returns the number of elements with a given key, which is always either 1 or 0. This is useful for checking existence. An alternative approach is using find()!=end().Common Pitfall with mapReplacing count() with operator[] can produce correct-looking output in some cases, but it often leads to severe performance degradation. Consider these two submissions demonstrating this issue:Using count()Using operator[]To understand why, let's examine the map contents after each lookup:cpp map<int,bool> mp; int count; cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; mp[val]=true; } cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; if(mp.count(val)) cout<<"found\n"; else cout<<"not found\n"; cout<<"Current state:\n"; for(auto& p:mp) cout<<p.first<<' '<<p.second<<'\n'; cout<<'\n'; } Input3 1 2 3 3 1 4 5Output with count()``` found Current state: 1 1 2 1 3 1

not found Current state: 1 1 2 1 3 1

not found Current state: 1 1 2 1 3 1 Using count() works as expected. Now let's see what happens with operator[]:Output with operator[]found Current state: 1 1 2 1 3 1

not found Current state: 1 1 2 1 3 1 4 0

not found Current state: 1 1 2 1 3 1 4 0 5 0 Unexpected entries with value=0 appear! The reason is that accessing a non-existent key with operator[] automatically inserts a new element with that key and a default value of 0. This transforms the time complexity from O(m log n) to O(m log (n+m)), causing TLE in time-sensitive problems.This is particularly dangerous because the code may produce correct answers while being too slow, leading to frustration during debugging sessions. For problems where accessing elements doesn't require existence checks, you must verify the element exists first. Using operator[] without existence checks can also cause WA when zero-values have semantic meaning.Iterating Over mapEach element in a map is a pair, with first representing the key and second representing the value.cpp map<string,int> mp; for(pair<string,int> entry:mp) { cout<<entry.first<<' '; cout<<entry.second<<'\n'; } Or more concisely with auto:cpp map<string,int> mp; for(auto& entry:mp) { cout<<entry.first<<' '; cout<<entry.second<<'\n'; } Iterator-based traversal is possible but verbose.unordered_mapunordered_map has an interface similar to map but uses hashing internally. Insertions and lookups achieve average O(1) time complexity. In worst-case scenarios involving deliberately crafted hash collisions, performance degrades to O(n).Iteration syntax is identical to map.Policy-Based Data Structures (pb_ds)The pb_ds library, sometimes called "tablet" in Chinese programming communities, provides implementations of tries, heaps, balanced trees, and hash tables. This section focuses on the hash table variants: cc_hash_table and gp_hash_table.Include the required headers and namespace:cpp #include<ext/pb_ds/assoc_container.hpp> using namespace __gnu_pbds; /* Note: Some member names may conflict with standard library names. In such cases, avoid the using directive and use __gnu_pbds::prefix instead. */ The primary difference between cc_hash_table and gp_hash_table lies in their collision resolution strategies.gp_hash_table employs open addressing with probing, while cc_hash_table uses chaining. In practice, gp_hash_table often demonstrates superior performance. Testing on random data confirmed this observation.Like unordered_map, these containers are unordered with amortized O(1) insertion and lookup.Important: Neither cc_hash_table nor gp_hash_table provides a count() method. Use find(x)!=end() for existence checks.cpp #include<ext/pb_ds/assoc_container.hpp> using namespace __gnu_pbds;

gp_hash_table<int,int> gpTable; cc_hash_table<int,int> ccTable;

void process() { int qty; cin>>qty; for(int i=1;i<=qty;i++) { if(i%2==1) gpTable[i]=1; else ccTable[i]=1; } cin>>qty; for(int i=1;i<=qty;i++) { int key; cin>>key;

    if(gpTable.find(key)!=gpTable.end()) 
        cout<<"found in gp\n";
    else 
        cout<<"not in gp\n";

    if(ccTable.find(key)!=ccTable.end()) 
        cout<<"found in cc\n";
    else 
        cout<<"not in cc\n";
}

} Similar to std::map, using operator[] on non-existent keys automatically inserts them with default values.Iteration syntax matches that of std::map.Performance BenchmarkHere is a test comparing performance on random data:cpp #include <bits/stdc++.h> #include <ext/pb_ds/assoc_container.hpp>

using namespace std; using namespace __gnu_pbds;

const int NUM_ELEMENTS = 200000;

int main() { mt19937 rng(chrono::steady_clock::now().time_since_epoch().count()); uniform_int_distribution dist(INT_MIN, INT_MAX);

vector<int> data(NUM_ELEMENTS);
for (int i = 0; i < NUM_ELEMENTS; i++) 
    data[i] = dist(rng);

// gp_hash_table benchmark
{
    gp_hash_table<int,int> table;
    clock_t start = clock();
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        table[data[i]] = i;
    long long sum = 0;
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        sum += table[data[i]];
    clock_t end = clock();
    cout << "gp_hash_table: " 
         << (end - start) * 1.0 / CLOCKS_PER_SEC 
         << "s (sum=" << sum << ")\n";
}

// cc_hash_table benchmark
{
    cc_hash_table<int,int> table;
    clock_t start = clock();
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        table[data[i]] = i;
    long long sum = 0;
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        sum += table[data[i]];
    clock_t end = clock();
    cout << "cc_hash_table: " 
         << (end - start) * 1.0 / CLOCKS_PER_SEC 
         << "s (sum=" << sum << ")\n";
}

// unordered_map benchmark
{
    unordered_map<int,int> table;
    clock_t start = clock();
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        table[data[i]] = i;
    long long sum = 0;
    for (int i = 0; i < NUM_ELEMENTS; i++) 
        sum += table[data[i]];
    clock_t end = clock();
    cout << "unordered_map: " 
         << (end - start) * 1.0 / CLOCKS_PER_SEC 
         << "s (sum=" << sum << ")\n";
}
return 0;

}

Tags: C++hash-table unordered-map Map pb-ds

Back to List

Prev: Implementing Message Queues with Redis List: Consumer Thread Models

Next: Understanding Divergent String Sort Orders in PostgreSQL

Fading Coder

A Quick Guide to C++ Hash Tables and Maps

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

A Quick Guide to C++ Hash Tables and Maps

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment