Since 2016, disinformation campaigns have steadily eroded Americans’ faith in democracy, and they pose an unprecedented threat to the 2024 U.S. presidential election, according to political scientists and researchers. Bad actors have become particularly adept at leveraging fake social media advertising accounts to overwhelm users with political propaganda.
Fortunately, graph databases like Neo4j are built to fight this kind of disinformation. Neo4j enables social media platforms and election researchers to identify malicious accounts by uncovering key connections between them—connections that would otherwise go undetected.
From One Set of Credentials, Many Shared Accounts
Every social media advertising account is identified by a set of credentials: phone number, email address, website address, etc. In a Neo4j graph database, these accounts and credentials are represented as nodes, which are connected by directional relationships:
It’s not uncommon for multiple accounts to share the same credentials. A business owner, for example, might want to create separate accounts for different product lines but link them all to one email address for convenience. Similarly, family members living in the same household might create individual accounts but use one phone number.
The situation becomes more complex, however, when bad actors create fake accounts to spread disinformation, engage in coordinated inauthentic behavior, or conduct merchandise spamming. In those cases, shared credentials become red flags that require further investigation.
This is where graph databases excel—they allow us to unravel intricate webs of connections between accounts and identify hidden patterns that other database technologies would miss.
Identifying Connections Between Accounts to Unmask Bad Actors
Graph databases are uniquely suited to handle the problem of identity resolution—figuring out whether multiple records or entities in a dataset actually reference the same real-world thing.
In a Neo4j graph, we see exactly how individual accounts relate to credentials, and we can look for discrete areas of the graph—subgraphs —in which all the nodes are connected by their relationships to given credentials, regardless of the relationship direction. This kind of subgraph, whose nodes are all reachable from one another, is known as a component.
The accounts in a component are likely to be managed by the same individual or organization. For example, consider a scenario in which three accounts share the same email address:
Account A (blue) is John’s personal account, account B (green) is John’s business page for his coffee shop, and account C (yellow) is a suspicious account promoting counterfeit merchandise.
Using a graph database, we represent these accounts as nodes and establish relationships between them and their individual emails, which in this case happen to be shared email addresses. By analyzing the subgraph that emerges, we can infer that Accounts A and B are both managed by John and that John’s activities warrant further investigation due to the suspicious nature of Account C.
Distinguishing Between Legitimate and Malicious Connected Accounts
While shared credentials can raise suspicions, it’s crucial to recognize when connected accounts are legitimate. There are many valid reasons for managing multiple accounts. A social media manager, for example, might handle several pages for different clients, all linked to their professional email addresses. Alternatively, a local political party may have support accounts for various candidates.
Graph databases allow us to analyze patterns and anomalies within the weakly connected components to distinguish between legitimate and malicious accounts. One red flag is a sudden surge in new accounts. If someone creates many new accounts with the same credentials in a short period of time, they may be doing so maliciously. This kind of suspicious weakly connected component is on the right:
By analyzing graph patterns within components, we can identify and flag suspicious accounts for additional analysis, while ensuring that legitimate accounts can continue to operate and advertise effectively.
Analyzing Deleted Accounts to Determine Their Legitimacy
Another challenge in identity resolution arises when accounts are deleted, either by the company because the accounts violated platform policies or by the users themselves. Deleted accounts leave behind a trail of relationships that can provide insights into their legitimacy.
Graph databases maintain the historical data and relationships associated with deleted accounts, enabling us to track and analyze them. Closely examining deleted accounts and their associations allows us to make inferences about their legitimacy.
For example, if a deleted account had a history of authentic user engagement and legitimate advertising activities, it is more likely to have been a genuine account. Conversely, suspicious behavior, lack of engagement, and connections to other flagged accounts may indicate that a deleted account was created for malicious purposes.
By leveraging graph databases to analyze deleted accounts and their relationships, we can gain a more comprehensive understanding of the overall identity resolution puzzle, even when pieces go missing.
Connecting Accounts Without Common Credentials
Things get really interesting when we start to find connections between accounts that don’t share credentials. In the illustration below, we can see that the two green nodes, Account B and Account C, share an email address and phone number. But the graph also reveals something that would be extremely difficult to discern in tabular data: Account A (blue) and Account E (yellow) belong to the same network of accounts, even though they don’t share any credentials.
Ensuring Integrity in Political Advertising With Relationship Analysis
Identity resolution through relationship analysis is critical to maintaining the integrity of social media advertising during presidential elections in the U.S. By leveraging graph databases to identify and examine weakly connected components, we can peel back layers of complexity and distinguish between legitimate and malicious accounts.
Once we’ve mapped out malicious account networks, revealing influence previously hidden in thickets of account credentials, we can begin to rein in malicious activity, from coordinated inauthentic behavior and account masking to merchandise spamming.