In this fourth blog in our six-part series on using graph technology to fight money laundering, we discuss how to get started quickly using the Neo4j reference implementation.
Reference Implementation
This section describes how to perform sprints when developing Neo4j applications. A sprint starts with identifying a very small number of “money queries.” The sprint continues with creating a graph data model, and then building out a full-stack solution (from ingestion to visualization) for your first set of money queries.
Building a simple solution with a handful of carefully chosen money queries allows a business to quickly reap the benefits of identifying money laundering.
The following queries can be used as building blocks or combined with other queries to arrive at a money query.
Localized Pattern Match: Suspicious Behavior Money Queries
Small Deposits: Concentration
- Cash transactions are deposited to an account, transferred to a central account, then wired to a bank outside of the U.S.
- The goal is to identify parties and largest aggregate amounts.
- Money queries show the accounts involved and the largest aggregate amounts (n hops, aggregating only when pattern matches).
- Suspicious accounts receive a high number of incoming deposits and then send a few large transactions to one or more high-risk parties.
- This pattern is characterized by consecutive days of deposits with minimal withdrawals in the same period.
- Money queries identify parties and accounts involved and largest aggregate amounts (n hops, aggregating only when pattern matches).
- Customer makes several cash deposits just under $10,000 over an x-day period.
- Customer receives a large wire followed by withdrawal of most of it as cash via multiple ATM transaction within a short period of time.
- This pattern is signaled by a major behavior change. For example, a business customer whose cash deposit activity has gone from $50,000 a week to $250,000 a week over the course of a month.
- Party A sends to Party B and then B sends to Party C, where the transaction is greater than or equal to the amount between A and B and within Y% from B to C.
- Money queries in this scenario look for which nodes in the transactions have the highest incoming amount and few or no outgoing transactions.
- The pattern is revealed in a large aggregated set of deposits by a customer followed by a large ACH transaction or a transfer to another account such as a mortgage.
Localized Pattern Match: Suspicious Structure Money Queries
Structural: Entity Resolution
Shared attributes can be used for entity resolution
Structural: Payment Chain
Payment chain between two suspicious parties
Localized Pattern Match Scoring Money Queries
See Keymaker pipelines in the Sample Localized Pattern-Match Scoring section below.
Graph Algorithm Money Queries
Centrality
PageRank, Closeness, Degree, etc.
Closeness scores detect central players, liaisons (betweenness) and the most relevant parties (PageRank) in a path between a customer and a high-risk endpoint.
Community Detection
Louvain Modularity, Label Propagation, Strongly and Weakly Connected Components, etc.
Label Propagation detects common entities and strongly connected components in high-risk rings. These are all based on relationships in the graph.
Link Prediction
Common Neighbors, Preferential Attachment, Adamic Adar, etc.
Link prediction algorithms based on the money trail identify hidden COLLABORATOR relationships. These new relationships further inform the analysis of small deposit accounts that involve layering, velocity, concentration, etc.
Similarity
Jaccard, Cosine, Overlap, etc.
Similarity algorithms are used for entity resolution. Also, if there is a path between a customer and a high-risk end point, similarity algorithms indicate how similar each path is to other paths from that specific customer to those high-risk end points. A company fighting money laundering could then create a subgraph of paths (e.g., paths A, B and C) and have weighted relationships representing similarities among the paths.
Pathfinding and Search
Breadth-First Search, All Pairs Shortest Path, etc.
These algorithms identify payment chains and third parties layered between customers or transactions and other end points. They are also used as a fundamental step in Weakly and Strongly Connected Components, Closeness and other graph algorithms.
Reference Graph Data Model
Neo4j’s AML graph data model pictured below is a whiteboard-style reference model for the queries described in this document. The graph model used in the framework demonstrates best practices when working with Neo4j to support pattern matching for analysis..
Only twenty indexes were required to deliver millisecond response time at scale. In contrast to Neo4j’s graph model, a relational database approach would have an entity-relationship diagram with over 150 tables and 300 to 400 indexes (not including primary keys) for the same questions and dataset. The relational model would also require hundreds of indexes to deliver much slower query-response times measured in minutes or hours.