A Quick Guide to C++ Hash Tables and Maps
When solving competitive programming problems, duplicate detectino is a frequent requirement. The C++ standard library provides map and unordered_map for this purpose, while the policy-based data structures libray offers cc_hash_table and gp_hash_table as alternatives.mapThe map container is implemented as a red-black tree. Both insertion and lookup operatinos have a time complexity of O(log n).cpp map<int,bool> mp; int count; cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; mp[val]=true; } cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; if(mp.count(val)) cout<<"found\n"; else cout<<"not found\n"; } Input5 1 2 4 5 6 5 1 2 3 4 5Outputfound found not found found foundThe count() method returns the number of elements with a given key, which is always either 1 or 0. This is useful for checking existence. An alternative approach is using find()!=end().Common Pitfall with mapReplacing count() with operator[] can produce correct-looking output in some cases, but it often leads to severe performance degradation. Consider these two submissions demonstrating this issue:Using count()Using operator[]To understand why, let's examine the map contents after each lookup:cpp map<int,bool> mp; int count; cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; mp[val]=true; } cin>>count; for(int i=0;i<count;i++) { int val; cin>>val; if(mp.count(val)) cout<<"found\n"; else cout<<"not found\n"; cout<<"Current state:\n"; for(auto& p:mp) cout<<p.first<<' '<<p.second<<'\n'; cout<<'\n'; } Input3 1 2 3 3 1 4 5Output with count()``` found Current state: 1 1 2 1 3 1
not found Current state: 1 1 2 1 3 1
not found Current state: 1 1 2 1 3 1 Using count() works as expected. Now let's see what happens with operator[]:Output with operator[]found Current state: 1 1 2 1 3 1
not found Current state: 1 1 2 1 3 1 4 0
not found Current state: 1 1 2 1 3 1 4 0 5 0 Unexpected entries with value=0 appear! The reason is that accessing a non-existent key with operator[] automatically inserts a new element with that key and a default value of 0. This transforms the time complexity from O(m log n) to O(m log (n+m)), causing TLE in time-sensitive problems.This is particularly dangerous because the code may produce correct answers while being too slow, leading to frustration during debugging sessions. For problems where accessing elements doesn't require existence checks, you must verify the element exists first. Using operator[] without existence checks can also cause WA when zero-values have semantic meaning.Iterating Over mapEach element in a map is a pair, with first representing the key and second representing the value.cpp map<string,int> mp; for(pair<string,int> entry:mp) { cout<<entry.first<<' '; cout<<entry.second<<'\n'; } Or more concisely with auto:cpp map<string,int> mp; for(auto& entry:mp) { cout<<entry.first<<' '; cout<<entry.second<<'\n'; } Iterator-based traversal is possible but verbose.unordered_mapunordered_map has an interface similar to map but uses hashing internally. Insertions and lookups achieve average O(1) time complexity. In worst-case scenarios involving deliberately crafted hash collisions, performance degrades to O(n).Iteration syntax is identical to map.Policy-Based Data Structures (pb_ds)The pb_ds library, sometimes called "tablet" in Chinese programming communities, provides implementations of tries, heaps, balanced trees, and hash tables. This section focuses on the hash table variants: cc_hash_table and gp_hash_table.Include the required headers and namespace:cpp #include<ext/pb_ds/assoc_container.hpp> using namespace __gnu_pbds; /* Note: Some member names may conflict with standard library names. In such cases, avoid the using directive and use __gnu_pbds::prefix instead. */ The primary difference between cc_hash_table and gp_hash_table lies in their collision resolution strategies.gp_hash_table employs open addressing with probing, while cc_hash_table uses chaining. In practice, gp_hash_table often demonstrates superior performance. Testing on random data confirmed this observation.Like unordered_map, these containers are unordered with amortized O(1) insertion and lookup.Important: Neither cc_hash_table nor gp_hash_table provides a count() method. Use find(x)!=end() for existence checks.cpp #include<ext/pb_ds/assoc_container.hpp> using namespace __gnu_pbds;
gp_hash_table<int,int> gpTable; cc_hash_table<int,int> ccTable;
void process() { int qty; cin>>qty; for(int i=1;i<=qty;i++) { if(i%2==1) gpTable[i]=1; else ccTable[i]=1; } cin>>qty; for(int i=1;i<=qty;i++) { int key; cin>>key;
if(gpTable.find(key)!=gpTable.end())
cout<<"found in gp\n";
else
cout<<"not in gp\n";
if(ccTable.find(key)!=ccTable.end())
cout<<"found in cc\n";
else
cout<<"not in cc\n";
}
} Similar to std::map, using operator[] on non-existent keys automatically inserts them with default values.Iteration syntax matches that of std::map.Performance BenchmarkHere is a test comparing performance on random data:cpp #include <bits/stdc++.h> #include <ext/pb_ds/assoc_container.hpp>
using namespace std; using namespace __gnu_pbds;
const int NUM_ELEMENTS = 200000;
int main() { mt19937 rng(chrono::steady_clock::now().time_since_epoch().count()); uniform_int_distribution dist(INT_MIN, INT_MAX);
vector<int> data(NUM_ELEMENTS);
for (int i = 0; i < NUM_ELEMENTS; i++)
data[i] = dist(rng);
// gp_hash_table benchmark
{
gp_hash_table<int,int> table;
clock_t start = clock();
for (int i = 0; i < NUM_ELEMENTS; i++)
table[data[i]] = i;
long long sum = 0;
for (int i = 0; i < NUM_ELEMENTS; i++)
sum += table[data[i]];
clock_t end = clock();
cout << "gp_hash_table: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC
<< "s (sum=" << sum << ")\n";
}
// cc_hash_table benchmark
{
cc_hash_table<int,int> table;
clock_t start = clock();
for (int i = 0; i < NUM_ELEMENTS; i++)
table[data[i]] = i;
long long sum = 0;
for (int i = 0; i < NUM_ELEMENTS; i++)
sum += table[data[i]];
clock_t end = clock();
cout << "cc_hash_table: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC
<< "s (sum=" << sum << ")\n";
}
// unordered_map benchmark
{
unordered_map<int,int> table;
clock_t start = clock();
for (int i = 0; i < NUM_ELEMENTS; i++)
table[data[i]] = i;
long long sum = 0;
for (int i = 0; i < NUM_ELEMENTS; i++)
sum += table[data[i]];
clock_t end = clock();
cout << "unordered_map: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC
<< "s (sum=" << sum << ")\n";
}
return 0;
}