Xor Filter Applications in Encrypted Equi-Join Queries
Mechanics of Xor Filters
Xor Filters function as space-optimized probabilistic structures tailored for rapid set membership tests. They utilize multiple hash functions to map elements onto a bit array, employing bitwise XOR operations to compress storage and accelerate retrieval.
Construction Workflow
- Hash Function Selection: Several hash functions (e.g.,
hash_0,hash_1,hash_2) are chosen to project each element onto multiple indices within an array. The output domain corresponds to the array's bounds. - Fingerprint Derivation: For an element
val, its fingerprint is derived by XORing the outputs of the hash functions:fingerprint(val) = hash_0(val) ⊕ hash_1(val) ⊕ hash_2(val) - Fingerprint Storage: The derived fingerprints are recorded in the array, with each slot accommodating a fixed-length binary value.
- Clustering and Index Mapping: To accelerate queries, elements are partitioned into clusters. A cluster signature is computed as the XOR of all member fingerprints. An auxiliary index table maps elements to their respective cluster identifiers.
Lookup Workflow
- Fingerprint Recalculation: Compute the fingerprint for the target element using the identical hash functions:
fingerprint(val) = hash_0(val) ⊕ hash_1(val) ⊕ hash_2(val) - Cluster Location: Use the index mapping table to pinpoint the target cluster.
- Signature Verification: Compare the recalculated fingerprint against the stored cluster signature. A match implies the element might be present; a mismatch confirms its absence.
Consider a keyset {W, X, Y, Z} with hash functions hash_0, hash_1, hash_2.
Construction Example:
Assume the following hash outputs:
hash_0(W)=101,hash_1(W)=110,hash_2(W)=011hash_0(X)=011,hash_1(X)=100,hash_2(X)=111hash_0(Y)=111,hash_1(Y)=001,hash_2(Y)=101hash_0(Z)=010,hash_1(Z)=111,hash_2(Z)=000
Deriving fingerprints:
fingerprint(W) = 101 ⊕ 110 ⊕ 011 = 000fingerprint(X) = 011 ⊕ 100 ⊕ 111 = 000fingerprint(Y) = 111 ⊕ 001 ⊕ 101 = 011fingerprint(Z) = 010 ⊕ 111 ⊕ 000 = 101
Clustering and signature generation:
- Cluster 0
{W, X}signature:000 ⊕ 000 = 000 - Cluster 1
{Y, Z}signature:011 ⊕ 101 = 110
Signature array: [000, 110]. Index map: W->0, X->0, Y->1, Z->1.
Lookup Example: Checking for W:
- Compute
fingerprint(W) = 000. - Locate cluster via index map: Cluster 0.
- Verify signature:
000 ⊕ 000 = 000. Matches stored signature000.Wis possibly present.
Xor Filters update bit arrays via XOR, ensuring constant-time insertions and lookups with minimal storage overhead. They may yield false positives (indicating an element exists when it does not) but never produce false negatives.
Xor Filter Integration in JXT+ and JXT++
In the JXT+ and JXT++ cryptographic schemes, Xor Filters are adapted to diminish storage requirements and accelerate join operations over encrypted datasets, collaborating with structures like TupleMap (TSet), JoinIndex (XSet), and ChainMap (CSet).
Encrypted Database Setup
During setup, the client encrypts relational tables and join attributes, populating the Xor Filter:
- Key Generation: Cryptographic keys (e.g.,
K_z,K_w) are generated to initialize TupleMap, JoinIndex, ChainMap, and the Xor Filter. - Attribute Encryption: Attribute-value pairs and record identifiers are encrypted using pseudo-random functions (PRFs).
- Filter Population: Each encrypted pair is mapped via multiple hashes into the Xor Filter, updating slots through XOR operations.
Query Execution
- Token Generation: The client formulates search tokens encompassing target tables and join attributes.
- Server-Side Matching: The server retrieves TupleMap entries using the token. The Xor Filter facilitates rapid existence checks for join attributes. If an attribute is absent, the operation terminates early.
- Leakage Mitigation: The filter conceals frequency statistics. Low-entropy attributes that normally leak frequency patterns are masked within the encrypted, XOR-obfuscated bit array, preventing sub-query leakage.
Performance Enhancements
- Storage Optimization: Mapping join attributes and record identifiers into a compressed bit array consumes significantly less space than storing individual encrypted tuples.
- Query Acceleration: Constant-time membership tests streamline multi-table join verifications, rapidly filtering non-matching records.
- Privacy Assurance: Obfuscation prevents frequency analysis, ensuring the server cannot discern underlying attribute values or distributions.
JXT+ Implementation Example
Data Structures
- TupleMap: Maps encrypted attribute values to corresponding record IDs (
Map<String, List<Integer>>). - JoinIndex: A set holding encrypted join attributes (
Set<String>). - ChainMap: Maps encrypted join attributes to encrypted record identifiers (
Map<String, List<String>>).
Source Code
import java.util.*;
public class SecureJoinEngine {
public static String obfuscateValue(String raw) {
return raw + "_obf"; // Simulated encryption
}
public static void main(String[] args) {
List<Map<String, String>> relationAlpha = new ArrayList<>();
relationAlpha.add(Map.of("RecID", "1", "Entity", "Alpha"));
relationAlpha.add(Map.of("RecID", "2", "Entity", "Beta"));
List<Map<String, String>> relationBeta = new ArrayList<>();
relationBeta.add(Map.of("RecID", "10", "Entity", "Alpha"));
relationBeta.add(Map.of("RecID", "11", "Entity", "Gamma"));
Map<String, List<Integer>> tupleMap = new HashMap<>();
Set<String> joinIndex = new HashSet<>();
Map<String, List<String>> chainMap = new HashMap<>();
List<List<Map<String, String>>> datasets = Arrays.asList(relationAlpha, relationBeta);
for (List<Map<String, String>> ds : datasets) {
for (Map<String, String> row : ds) {
String target = row.get("Entity");
String obfTarget = obfuscateValue(target);
String rowId = row.get("RecID");
tupleMap.computeIfAbsent(obfTarget, k -> new ArrayList<>()).add(Integer.parseInt(rowId));
joinIndex.add(obfTarget);
chainMap.computeIfAbsent(obfTarget, k -> new ArrayList<>()).add("obf_id_" + rowId);
}
}
System.out.println("TupleMap: " + tupleMap);
System.out.println("JoinIndex: " + joinIndex);
System.out.println("ChainMap: " + chainMap);
String queryTarget = "Alpha";
String obfQuery = obfuscateValue(queryTarget);
if (tupleMap.containsKey(obfQuery)) {
System.out.println("Matches for " + queryTarget + ": " + tupleMap.get(obfQuery));
}
}
}
Output
TupleMap: {Alpha_obf=[1, 10], Beta_obf=[2], Gamma_obf=[11]}
JoinIndex: [Alpha_obf, Beta_obf, Gamma_obf]
ChainMap: {Alpha_obf=[obf_id_1, obf_id_10], Beta_obf=[obf_id_2], Gamma_obf=[obf_id_11]}
Matches for Alpha: [1, 10]
JXT++ Multi-Table Join Extension
JXT++ broadens the JXT+ framework to support multi-table equi-joins while preserving cryptographic guarantees. It relies on the same core encrypted structures—TupleMap, JoinIndex, and ChainMap—to coordinate secure queries across several relations.
Initialization and Storage
Attributes across all participating tables are encrypted. TupleMap indexes the encrypted attributes to their original record IDs. JoinIndex maintains the collection of encrypted join keys. ChainMap links encrypted join keys to their respective encrypted record identifiers, enabling many-to-many relationship tracking across the database.
Multi-Table Query Process
- Token Issuance: The client dispatches encrypted search tokens defining the join criteria across multiple tables.
- Sequential Lookups: The server evaluates the tokens against the TupleMap to locate matching records, verifies presence in the JoinIndex, and traverses the ChainMap to retrieve linked encrypted record identifiers.
- Join Resolution: By iteratively applying these lookups, the server correlates records from distinct tables based on shared encrypted join keys, assembling the final encrypted result set without accessing plaintext data.
This coordinated traversal across TupleMap, JoinIndex, and ChainMap enables efficient resolution of complex joins. The server exclusively handles obfuscated values, mitigating frequency analysis and ensuring that sensitive relationship graphs remain confidential throughout the join execution.