Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Xor Filter Applications in Encrypted Equi-Join Queries

Tech May 16 1

Mechanics of Xor Filters

Xor Filters function as space-optimized probabilistic structures tailored for rapid set membership tests. They utilize multiple hash functions to map elements onto a bit array, employing bitwise XOR operations to compress storage and accelerate retrieval.

Construction Workflow

  • Hash Function Selection: Several hash functions (e.g., hash_0, hash_1, hash_2) are chosen to project each element onto multiple indices within an array. The output domain corresponds to the array's bounds.
  • Fingerprint Derivation: For an element val, its fingerprint is derived by XORing the outputs of the hash functions:
    fingerprint(val) = hash_0(val) ⊕ hash_1(val) ⊕ hash_2(val)
  • Fingerprint Storage: The derived fingerprints are recorded in the array, with each slot accommodating a fixed-length binary value.
  • Clustering and Index Mapping: To accelerate queries, elements are partitioned into clusters. A cluster signature is computed as the XOR of all member fingerprints. An auxiliary index table maps elements to their respective cluster identifiers.

Lookup Workflow

  • Fingerprint Recalculation: Compute the fingerprint for the target element using the identical hash functions:
    fingerprint(val) = hash_0(val) ⊕ hash_1(val) ⊕ hash_2(val)
  • Cluster Location: Use the index mapping table to pinpoint the target cluster.
  • Signature Verification: Compare the recalculated fingerprint against the stored cluster signature. A match implies the element might be present; a mismatch confirms its absence.

Consider a keyset {W, X, Y, Z} with hash functions hash_0, hash_1, hash_2.

Construction Example:

Assume the following hash outputs:

  • hash_0(W)=101, hash_1(W)=110, hash_2(W)=011
  • hash_0(X)=011, hash_1(X)=100, hash_2(X)=111
  • hash_0(Y)=111, hash_1(Y)=001, hash_2(Y)=101
  • hash_0(Z)=010, hash_1(Z)=111, hash_2(Z)=000

Deriving fingerprints:

  • fingerprint(W) = 101 ⊕ 110 ⊕ 011 = 000
  • fingerprint(X) = 011 ⊕ 100 ⊕ 111 = 000
  • fingerprint(Y) = 111 ⊕ 001 ⊕ 101 = 011
  • fingerprint(Z) = 010 ⊕ 111 ⊕ 000 = 101

Clustering and signature generation:

  • Cluster 0 {W, X} signature: 000 ⊕ 000 = 000
  • Cluster 1 {Y, Z} signature: 011 ⊕ 101 = 110

Signature array: [000, 110]. Index map: W->0, X->0, Y->1, Z->1.

Lookup Example: Checking for W:

  • Compute fingerprint(W) = 000.
  • Locate cluster via index map: Cluster 0.
  • Verify signature: 000 ⊕ 000 = 000. Matches stored signature 000. W is possibly present.

Xor Filters update bit arrays via XOR, ensuring constant-time insertions and lookups with minimal storage overhead. They may yield false positives (indicating an element exists when it does not) but never produce false negatives.

Xor Filter Integration in JXT+ and JXT++

In the JXT+ and JXT++ cryptographic schemes, Xor Filters are adapted to diminish storage requirements and accelerate join operations over encrypted datasets, collaborating with structures like TupleMap (TSet), JoinIndex (XSet), and ChainMap (CSet).

Encrypted Database Setup

During setup, the client encrypts relational tables and join attributes, populating the Xor Filter:

  • Key Generation: Cryptographic keys (e.g., K_z, K_w) are generated to initialize TupleMap, JoinIndex, ChainMap, and the Xor Filter.
  • Attribute Encryption: Attribute-value pairs and record identifiers are encrypted using pseudo-random functions (PRFs).
  • Filter Population: Each encrypted pair is mapped via multiple hashes into the Xor Filter, updating slots through XOR operations.

Query Execution

  • Token Generation: The client formulates search tokens encompassing target tables and join attributes.
  • Server-Side Matching: The server retrieves TupleMap entries using the token. The Xor Filter facilitates rapid existence checks for join attributes. If an attribute is absent, the operation terminates early.
  • Leakage Mitigation: The filter conceals frequency statistics. Low-entropy attributes that normally leak frequency patterns are masked within the encrypted, XOR-obfuscated bit array, preventing sub-query leakage.

Performance Enhancements

  • Storage Optimization: Mapping join attributes and record identifiers into a compressed bit array consumes significantly less space than storing individual encrypted tuples.
  • Query Acceleration: Constant-time membership tests streamline multi-table join verifications, rapidly filtering non-matching records.
  • Privacy Assurance: Obfuscation prevents frequency analysis, ensuring the server cannot discern underlying attribute values or distributions.

JXT+ Implementation Example

Data Structures

  • TupleMap: Maps encrypted attribute values to corresponding record IDs (Map<String, List<Integer>>).
  • JoinIndex: A set holding encrypted join attributes (Set<String>).
  • ChainMap: Maps encrypted join attributes to encrypted record identifiers (Map<String, List<String>>).

Source Code

import java.util.*;

public class SecureJoinEngine {

    public static String obfuscateValue(String raw) {
        return raw + "_obf"; // Simulated encryption
    }

    public static void main(String[] args) {
        List<Map<String, String>> relationAlpha = new ArrayList<>();
        relationAlpha.add(Map.of("RecID", "1", "Entity", "Alpha"));
        relationAlpha.add(Map.of("RecID", "2", "Entity", "Beta"));

        List<Map<String, String>> relationBeta = new ArrayList<>();
        relationBeta.add(Map.of("RecID", "10", "Entity", "Alpha"));
        relationBeta.add(Map.of("RecID", "11", "Entity", "Gamma"));

        Map<String, List<Integer>> tupleMap = new HashMap<>();
        Set<String> joinIndex = new HashSet<>();
        Map<String, List<String>> chainMap = new HashMap<>();

        List<List<Map<String, String>>> datasets = Arrays.asList(relationAlpha, relationBeta);
        for (List<Map<String, String>> ds : datasets) {
            for (Map<String, String> row : ds) {
                String target = row.get("Entity");
                String obfTarget = obfuscateValue(target);
                String rowId = row.get("RecID");

                tupleMap.computeIfAbsent(obfTarget, k -> new ArrayList<>()).add(Integer.parseInt(rowId));
                joinIndex.add(obfTarget);
                chainMap.computeIfAbsent(obfTarget, k -> new ArrayList<>()).add("obf_id_" + rowId);
            }
        }

        System.out.println("TupleMap: " + tupleMap);
        System.out.println("JoinIndex: " + joinIndex);
        System.out.println("ChainMap: " + chainMap);

        String queryTarget = "Alpha";
        String obfQuery = obfuscateValue(queryTarget);
        if (tupleMap.containsKey(obfQuery)) {
            System.out.println("Matches for " + queryTarget + ": " + tupleMap.get(obfQuery));
        }
    }
}

Output

TupleMap: {Alpha_obf=[1, 10], Beta_obf=[2], Gamma_obf=[11]}
JoinIndex: [Alpha_obf, Beta_obf, Gamma_obf]
ChainMap: {Alpha_obf=[obf_id_1, obf_id_10], Beta_obf=[obf_id_2], Gamma_obf=[obf_id_11]}
Matches for Alpha: [1, 10]

JXT++ Multi-Table Join Extension

JXT++ broadens the JXT+ framework to support multi-table equi-joins while preserving cryptographic guarantees. It relies on the same core encrypted structures—TupleMap, JoinIndex, and ChainMap—to coordinate secure queries across several relations.

Initialization and Storage

Attributes across all participating tables are encrypted. TupleMap indexes the encrypted attributes to their original record IDs. JoinIndex maintains the collection of encrypted join keys. ChainMap links encrypted join keys to their respective encrypted record identifiers, enabling many-to-many relationship tracking across the database.

Multi-Table Query Process

  • Token Issuance: The client dispatches encrypted search tokens defining the join criteria across multiple tables.
  • Sequential Lookups: The server evaluates the tokens against the TupleMap to locate matching records, verifies presence in the JoinIndex, and traverses the ChainMap to retrieve linked encrypted record identifiers.
  • Join Resolution: By iteratively applying these lookups, the server correlates records from distinct tables based on shared encrypted join keys, assembling the final encrypted result set without accessing plaintext data.

This coordinated traversal across TupleMap, JoinIndex, and ChainMap enables efficient resolution of complex joins. The server exclusively handles obfuscated values, mitigating frequency analysis and ensuring that sensitive relationship graphs remain confidential throughout the join execution.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.