High-Order Matching & Entity Resolution

Resolve duplicate entities and detect coordinated groups by analyzing the structural "fingerprint" of the graph.

At scale, data is often fragmented. The same real-world entity might appear as five different nodes across different systems. Pometry's high-order matching pipelines use a Multi-Level Similarity approach to bridge the gap where simple string matching fails.


Multi-Level Similarity Analysis

True entity resolution requires balancing metadata accuracy with structural context. Pometry's classification pipeline analyzes entities across four distinct levels.

1. Linguistic & Fuzzy Matching

We use a combination of SequenceMatcher and Levenshtein distance to handle typos in names, addresses, and document IDs.

2. Temporal Distance

For fields like Date of Birth or Incorporation, we apply a decay penalty. Small differences (1-2 days) might just be timezone or entry errors, while large differences (years) trigger a hard rejection.

3. Structural "Fingerprinting"

The most powerful level is the Neighborhood Profile. We compare the 1-hop connections of two nodes using Jaccard Similarity. If two "John Smiths" have never met, but they bank with the same branch, use the same lawyer, and transfer money to the same overseas shell - they are structurally identical.


Classification: Theft vs. Conflict

When the resolution pipeline detects a mismatch (e.g., two people claiming the same passport), it classifies the event into one of two strategic buckets.

Identity Theft

Triggered when a minority group of nodes attempts to "link" to a document ID already claimed by a established majority. Pometry flags the minority nodes as identity_thief.

Identity Conflict

Triggered when two equally sized groups claim the same ID, suggesting a data collision or a systemic error that requires human intervention.

Raphtory Intelligence Session
USR

Classifier, run a resolution pass on the 'Private Banking' segment.


The Entity Resolution API

Pometry provides an atomic resolve_entities API to materialize these findings back into the graph.