Graph Database

What Is Entity Resolution?

Managing Editor, Neo4j

February 13, 2025

7 min read

Think about the last time you needed to merge contact lists from different systems. Was J. Smith the same person as John Smith? What about their different email addresses and phone numbers? Cleaning up contact information is just the tip of the entity resolution iceberg — a critical capability that helps organizations understand when different data points actually refer to the same real-world entity.

Entity resolution is essential for organizations dealing with large, complex datasets. Let’s explore what entity resolution is, why it matters, and how organizations use it to potentially uncover hidden patterns and relationships in their data.

What is Entity Resolution?

Entity resolution is the process of determining when different data records actually represent the same real-world entity — whether that’s a person, organization, product, or location. It goes beyond simple deduplication by considering context and relationships between data points to make accurate matches even when information is incomplete or inconsistent.

For example, a bank might have a customer who appears as:

“Robert Jones” with email rjones@email.com in their checking system
“Bob Jones” with phone number 555-0123 in their credit card database
“R. Jones” with the same phone number in their investment portal

Entity resolution is a technique for determining whether these records represent the same person by analyzing patterns and relationships in the data, not just exact matches of individual fields. By performing entity resolution, you resolve any entity discrepancies.

Why Entity Resolution Matters

The cost of fragmented data and missed connections can be staggering. Research shows that poor data quality costs organizations an average of $12.9 million annually. Beyond the financial impact, fragmented data leads to:

Missed fraud patterns that could have been detected
Incomplete customer views that lead to poor service
Redundant marketing efforts targeting the same person multiple times
Compliance risks from incomplete know-your-customer (KYC) data
Supply chain blind spots from disconnected supplier information

Successful entity resolution, on the other hand, enables organizations to gain a deeper understanding of their data relationships and make better decisions. Let’s look at some real-world examples.

Entity Resolution in Action: Real-World Use Cases

Healthcare

Entity resolution in healthcare solves a critical challenge: determining when different medical records actually represent the same patient across multiple systems. Consider a typical patient journey:

Consider these common entity-matching challenges in healthcare:

Robert J. Smith” visits his primary care physician (EHR system #1)
“Bob Smith” gets labs done at a diagnostic center (Lab database)
“R.J. Smith” fills prescriptions at a pharmacy (Prescription system)
“Rob Smith” files an insurance claim (Claims database)
“Robert Smith” participates in a clinical trial (Research database)

Without proper entity resolution, Robert (or Bob, R.J., and Rob) appears as five separate patients, fragmenting his medical history across disconnected systems. This fragmentation leads to incomplete views of patient medical history, missed drug interactions or allergies, duplicate tests and procedures, inconsistent treatment plans, and even delayed diagnoses.

By using entity resolution techniques, healthcare providers can match disparate records to a single patient while maintaining HIPAA compliance. The matching process often considers multiple data points, such as demographics (name variations, date of birth, address history), medical identifiers (insurance numbers, patient IDs), treatment context (diagnoses, medications, procedures), and provider relationships (physicians seen, facilities visited).

When healthcare organizations successfully resolve patient entities, they create a complete, unified view of each patient’s journey through the healthcare system. This comprehensive understanding enables better-coordinated care, more accurate diagnoses, and improved treatment outcomes — all based on a holistic picture of the patient’s medical history.

Fraud Detection

Financial institutions use entity resolution to uncover fraud rings by connecting seemingly unrelated accounts and transactions. For example, a major bank might discover that what looks like separate fraudulent transactions are actually connected through shared addresses, phone numbers, or IP addresses — revealing a larger organized fraud operation. For example, Account A and Account B have different names but share a phone number, or Account B and Account C use different phone numbers but share an IP address.

Performing entity resolution with a graph database allows you to aggregate and reconcile data from multiple sources to create comprehensive profiles for individuals, organizations, and other entities. These profiles then become part of your application. For instance, you can unify all transactions, interactions, and relationships associated with a single profile. By resolving these entities, you can more effectively map a network of suspicious activities and connections. Furthermore, by considering the full context of relationships between entities, organizations can identify suspicious patterns more accurately and with fewer false positives.

Master Data Management

Retailers often struggle with maintaining consistent product information across e-commerce sites, inventory systems, and supplier catalogs. As a result, different product records often represent the same item across various systems.

Consider how a single denim jacket exists across a retailer’s digital ecosystem. In their e-commerce platform, it’s listed as “Women’s Blue Denim Jacket – Medium” with product ID WBDJ-M-2024 under Women’s Outerwear at $79.99. The warehouse system tracks the same jacket as “Ladies Jacket DN-BL-M” with SKU LJ238M. Meanwhile, the supplier database refers to it simply as “Supplier ID: WJ-238” with manufacturer code DEN-W-MED.

Without entity resolution, these appear as three separate products, creating inventory discrepancies and inconsistent information across channels. The resolution process matches records by examining product codes, names, and characteristics to create a single source of truth. This unified view both ensures accurate inventory tracking and helps retailers avoid costly mistakes and lost sales opportunities that come from fragmented product data.

Core Entity Resolution Techniques

Several approaches to entity resolution exist, each with its own strengths:

Deterministic Matching

Deterministic matching, or rules-based matching, uses predefined business rules and logic to find exact matches on specific fields and identify matching entities. For example, a rule might require that names match exactly, addresses match within 90%, and phone numbers are identical. While simple to implement and particularly effective for well-structured data with clear matching rules, this approach has significant limitations.

The rigid nature of deterministic rules often leads to missing valid matches due to data quality issues or variations in how information is recorded, making it less suitable for fuzzy matching scenarios. Fuzzy matching scenarios are situations where records should be matched despite having differences or imperfections in their data, such as name variations (“Jon” vs “John”). Fuzzy matching scenarios require more sophisticated matching approaches because they need to account for valid variations in how the same information might be recorded.

Probabilistic Matching

Probabilistic matching (also called probabilistic record linkage) uses statistical models to determine the probability that two records match. It considers the frequency and distribution of values in the data to assign weights to different field comparisons. For example, a match on a rare surname would be weighted more heavily than a match on a common one.

This method assigns probability scores to potential matches based on how closely different attributes align. It’s more flexible than deterministic matching but requires careful tuning of matching thresholds.

Graph-Based Methods

Graph databases excel at entity resolution because they can consider the full context of relationships between entities. You can use entity resolution to merge user profiles, accounts, and other entities that appear different but are actually the same. For example, two people might share a home phone number, address, and employer — indicating that it’s likely they’re the same person, even if the names aren’t an exact match.

Graph-based entity resolution can:

Follow relationship paths to find indirect connections
Consider the strength of multiple relationship types
Identify patterns that indicate matching entities
Identify patterns that indicate redundant and irrelevant entities
Scale to handle billions of potential connections

Entity Resolution Best Practices

There are several key challenges you may face when implementing entity resolution. Use the best practices outlined below to address each challenge.

Data Quality

Poor data quality is the biggest obstacle to successful entity resolution. Organizations should:

Standardize data formats and cleaning procedures
Implement robust data governance
Maintain clear data quality metrics
Regularly audit and update matching rules

Scale and Performance

As datasets grow, the number of potential matches to evaluate grows exponentially. To manage this:

Use efficient blocking strategies to limit comparison scope
Implement parallel processing where possible
Choose technology designed for graph-scale problems
Perform regular performance testing and optimization

Privacy and Compliance

Entity resolution often involves sensitive personal data. Organizations must:

Ensure compliance with relevant privacy regulations
Implement appropriate data security measures
Maintain audit trails of matching decisions
Consider privacy-preserving matching techniques

The Future of Entity Resolution

While traditional entity resolution has proven valuable across healthcare, fraud detection, and retail, emerging technologies are transforming how we connect related entities. Traditional methods struggle with common challenges like name variations (Catherine with ‘K’ versus ‘C’), complex abbreviations (“IBM” versus “International Business Machines”), corporate subsidiaries with vastly different legal names, and cultural naming variations across global systems. The combination of graph technology and vector-based entity resolution represents the next frontier in solving these complex matching challenges.

By combining vector embeddings with graph database technology, organizations can move beyond simple rule-based matching to uncover relationships that would otherwise remain hidden. This approach powers financial services analyzing complex corporate structures, healthcare providers matching international patient records, global retailers reconciling multilingual product catalogs, and government agencies connecting related entities across systems. The real power comes from persisting these resolved entities in a graph database, transforming entity resolution from a one-time matching exercise into a living, evolving data asset.

As new data enters the system, the resolution process continuously improves, learning from patterns and relationships to make increasingly accurate matches. Organizations that embrace these advanced entity resolution techniques are in a better position to uncover previously hidden relationships, reduce false positives, and create sustainable, constantly improving data quality over time. Try entity resolution yourself using the Neo4j sandbox to see how these techniques work in practice.

Ready to transform how you connect and understand your data relationships? Explore how Neo4j’s graph database and analytics capabilities can power your next-generation entity resolution solutions.

Explore Neo4j for Entity Resolution