OSCON Twitter Graph
As a part of Neo4j’s community engagement around OSCON, we wanted to look at the social media activity of the attendees on Twitter. Working with the Twitter Search API and searching for mentions of “OSCON”, we wanted to create a graph of Users, Tweets, Hashtags and shared Links. The Twitter Search API returns a list of tweets matching a supplied search term. We then populated the graph model that is shown above by representing the results as nodes and relationships, achieved through using Neo4j’s query language, Cypher. We designed a single Cypher query to import each tweet into the graph model in Neo4j. This is achieved using a single parameter that contains all of the tweets returned from Twitter’s Search API. Using the UNWIND clause we are able to pivot a collection of tweets into a set of rows containing information about each tweet, which can then be structured into the outlined graph model from the image.UNWIND {tweets} AS t MERGE (tweet:Tweet {id:t.id}) SET tweet.text = t.text, tweet.created_at = t.created_at, tweet.favorites = t.favorite_count MERGE (user:User {screen_name:t.user.screen_name}) SET user.profile_image_url = t.user.profile_image_url MERGE (user)-[:POSTS]->(tweet) FOREACH (h IN t.entities.hashtags | MERGE (tag:Hashtag {name:LOWER(h.text)}) MERGE (tag)-[:TAGS]->(tweet) ) … source, mentions, links, retweets, ...
Labels |
Count |
Tweet |
10653 |
User |
4910 |
Link |
1153 |
Hashtag |
742 |
Source |
175 |
MATCH (t:Tweet)-[:USING]->(s:Source) RETURN s.name as Source, count(t) as Count ORDER BY Count DESC LIMIT 5
Source |
Count |
2294 |
|
1712 |
|
1590 |
|
877 |
|
668 |
MATCH (:Hashtag {name:'python'})-[:TAGS]->(:Tweet)<-[:TAGS]-(h:Hashtag) WHERE h.name <> 'oscon' RETURN h.name AS Hashtag, COUNT(*) AS Count ORDER BY Count DESC LIMIT 5
Hashtag |
Count |
java |
7 |
opensource |
5 |
data |
5 |
golang |
5 |
nodejs |
5 |
MATCH (u:User {screen_name:"mojavelinux"})-[:POSTS]->(tweet) <-[:TAGS]-(tag1:Hashtag)-[:TAGS]->(tweet2)<-[:TAGS]-(tag2:Hashtag) WHERE tag1.name <> 'oscon' AND tag2.name <> 'oscon' AND NOT (u)-[:POSTS]->()<-[:TAGS]-(tag2) RETURN tag2.name as Topics, count(*) as Count ORDER BY count(*) DESC LIMIT 5
Topics |
Count |
graphdb |
30 |
graphviz |
24 |
rstats |
21 |
alchemyjs |
21 |
cassandra |
21 |
MATCH (:Tweet)-[:RETWEETS]->(t:Tweet) WITH t, COUNT(*) AS Retweets ORDER BY Retweets DESC LIMIT 1 MATCH (u:User)-[:POSTS]->(t) RETURN u.screen_name AS User, t.text AS Tweet, Retweets
User |
Tweet |
Retweets |
andypiper |
Wise words #oscon https://t.co/f4Jr9hnMcV |
470 |
Wise words #oscon pic.twitter.com/f4Jr9hnMcV
— Andy Piper (@andypiper) July 20, 2014
To test your own queries on this graph model, check out our GraphGist.
Graph Visualization
The interesting aspect of this tweet-graph is that it contains the implicit connections between users via their shared hash tags, mentions and links. This graph differs from the “official” followers graph that Twitter makes explicit. Via the inferred connections we can discover new groups of people or topics we could be interested in. So we wanted to visualize this aspect of our graph on the big screen. We wrote a tiny python application that queries Neo4j for connections between people and tags (skipping the tweets in between) and makes the data available to a JavaScript front-end. The query takes the last 2000 tweets to analyze, follows the paths to tags and mentioned users and returns 1000 tuples of users connect to a tag or user to keep it manageable in the visualization.MATCH (t:Tweet) WITH t ORDER BY t.id DESC LIMIT 2000 MATCH (user:User)-[:POSTS]->(t)<-[:TAGS]-(tag:Hashtag) MATCH (t)-[:MENTIONS]->(user2:User) UNWIND [tag,user2] as other WITH distinct user,other WHERE lower(other.name) <> 'oscon' RETURN { from: {id:id(user),label: head(labels(user)), data: user}, rel: 'CONNECTS', to: {id: id(other), label: head(labels(other)), data: other}} as tuple LIMIT 1000