In politics, people are often advised to “follow the money” to understand the forces influencing decisions. As engineers, we know we can do that and more by following the data.
Inspired by some innovative work by Dave Fauth, a Washington DC data analyst, we arranged a workshop to use FEC Campaign finance data that had been imported into Neo4j.
FEC Campaign Finance Data
Every Sunday of every year, the FEC updates campaign finance data sets for the current two-year election period plus the most recent five (5) two-year election periods. The data sets include:
- all individuals registered as candidates for President, House, or Senate
- all registered committees engaged in political fundraising
- all individual contributions greater than $200
After exploring some evolutionary import strategies (starting with the most direct, then iterating), we settled on an approach which structured the data to look like this:
Campaign Finance Data in a Graph |
Query Challenge
With the data imported, and a basic understanding of the domain model, we then challenged people to write Cypher queries to answer the following questions:
- All presidential candidates for 2012
- Most mythical presidential candidate
- Top 10 Presidential candidates according to number of campaign committees
- Find President Barack Obama
- Lookup Obama by his candidate ID
- Find Presidential Candidate Mitt Romney
- Look up Mitt Romney by his candidate ID
- Find the shortest path of funding between Obama and Romney
- List the 10 top individual contributions to Obama
- List the 10 top individual contributions to Romney
related
importer like so:./bin/fec2graph --force --importer=RELATED
Then just start up Neo4j and open a browser to https://localhost:7474 to query away. If you’re new to Cypher read through this introduction to learn the basics of querying a graph.
Submit the queries to me andreas@neotechnology.com by next Thursday and we’ll pick a winner from the correct entries. Prize? A free pass to GraphConnect of course! Coming this November 5 & 6 in San Francisco, GraphConnect is a fantastic conference devoted to graph databases.
Want a hint?
Alrighty. Let’s take a look at #2. After successfully listing all candidates for the first query, you could page through the listing to look for names that seem…off. Use limit and skip in the return clause to page through the long listing:
start candidate=node:candidates('CAND_ID:*')
where candidate.CAND_OFFICE='{fill this in}' AND candidate.CAND_ELECTION_YR='{this too}'
return candidate.CAND_NAME skip 100 limit 100;
Once you spot one of the many candidate names that isn’t real, you can query for it directly:
start candidate=node:candidates(CAND_NAME:'CLAUS, SANTA')
return candidate;
Cypher Masters
From our recent workshop, the winners are:
- Matt Tyndal
- Lou Kosak
- Pengchao Wang
Always,
Andreas