Graph Database

Mashups with the Facebook Graph API and Neo4j

May 7, 2010

8 min read

In case you didn’t notice already, graph databases like Neo4j are hot nowadays. People ask questions, write about them, also in the contexts of NOSQL and RDF. Recently Twitter open sourced their graphdb implementation, targeted at shallow, distributed graphs. And then Facebook revealed their new Graph API using the Open Graph Protocol.

Today, we’re going to show you how easy it is to use the Facebook Graph API to mash up data from Facebook with data in a locally hosted graph database!

It’s movie time!

Let’s say you want to see a movie with one of your friends. Wouldn’t it be neat with a service that uses the Facebook social graph
to collect movies your friend liked, and combines this with IMDB data to produce a movie suggestion? Turns out that an app like that is pretty straight forward with a graph database.

The first step is to connect to Facebook to fetch a list of your friends, so that’s where the app will start out:

Next a list of your friends will show up:

Now, just click one of your friends and a movie suggestion will be generated:

Under the Hood

What we need to do is simply to let our mashup talk to both the Facebook Graph API and the IMDB API. Uh-oh – IMDB doesn’t have a public API that you can throw requests at. Well, that’s simple enough: we’ll just import the data into a local Neo4j graph database and then access it through the Facebook Graph API!

So, let’s see how to solve this. Here’s the basic structure of our app:

MovieNight.js is the mashup itself, embedded in the web page. It uses the Facebook Graph API to get information about the friends of the visitor and the movies that your friends like. SuggestionEngine.js uses the Graph API to talk to a Neo4j database containing movie information (a small example data set from IMDB). The movie suggestion is based on what movies your friend has liked in the past. It simply tries to find other movies starring some actor from the liked ones.

Using the same Graph API to connect to both Facebook and the Neo4j graph database backend makes for convenience: it means that you can use tools written for Facebook for locally hosted data as well – and that’s what we’re doing here. To download the source, go to the download page.

Facebook data

To get your friends from Facebook, just use the common Facebook graph API:

FB.api('/me/friends', function(response) {
   friends = response.data;
   // Load friends into UI
   friend_list.empty();
   for ( var i = 0; i < friends.length; i ++ ) {
      add_friend( friends[i] ); // write to UI
   }
});

Getting the movies a friend likes is very similar to getting the friends list:

FB.api("/" + friend.id + "/movies", function(result)
{
 /* handle the response here */ 
}

For more information, see the Graph API documentation.

Neo4j data

To connect to the Neo4j graph server we had to hack the connect-js library slightly, as it’s hard coded to send requests to facebook.com. What we added is the possibility to add prefixes for different data sources. It still defaults to graph.facebook.com etc., but makes a “fb:” prefix available to make your code easier to read. To hook in a data source, we modify the FB.init() call like this:

FB.init({
   appId  : '', // NOTE: create an appid and add it here
   status : true, cookie : true, xfbml  : true,
   // time to add our IMDB backend to the mix
   external_domains : {
      imdb : 'https://localhost:4567/'
   }
});

Now we’re able to send reqests to our own server as well, using code similar to the following:

FB.api("imdb:/path/to/data/in/graph", function(data) {
  // data is available here :)
});

So now that we can send requests, what can we do with the Neo4j backend here? Here’s a comprehensive list showing precisely that in some detail (all requests are GET from https://localhost:4567):

Get Actor (or Movie) by Id
Request	Response
`/56`	{ "name": "Bacon, Kevin", "id": 56 }
Extended information about Actor(/Movie)
Request	Response
`/56?metadata=1`	{ "name": "Bacon, Kevin", "id": 56, "metadata": { "connections": "https://localhost:4567/56/acted_in" }, "type": "actor" }
All the Movies an Actor had a Role in
Request	Response
`/56/acted_in`	{ "data": [ { "id": 57, "title": "Woodsman, The (2004)" }, { "id": 59, "title": "Wild Things (1998)" } // tons of movies here ... ] }
Get (Actor or) Movie by Id
Request	Response
`/59`	{ "title": "Wild Things (1998)", "year": "1998", "id": 59 }
Extended information about (Actor/)Movie
Request	Response
`/59?metadata=1`	{ "title": "Wild Things (1998)", "year": "1998", "id": 59, "metadata": { "connections": "https://localhost:4567/59/actors" }, "type": "movie" }
All the Actors that have a Role in this Movie
Request	Response
`/59/actors`	{ "data": [ { "id": 56, "name": "Bacon, Kevin" }, { "id": 528, "name": "Dillon, Matt (I)" } // loads of actors here ... ] }
Search for Actors with “bacon” in their name
Request	Response
`/search?q=bacon&type=actor`	[ { "name": "Bacon, Kevin", "id": 56 }, { "name": "Bacon, Travis", "id": 14242 } // more bacons here ... ]
Search for Movies with “wild things” in their title
Request	Response
`/search?q=wild%20things&type=movie`	[ { "title": "Wild Things (1998)", "year": "1998", "id": 59 }, { "title": "River Wild, The (1994)", "year": "1994", "id": 74 } // more wild movies here ... ]

Ok, but how do we use this stuff then?! Well, that’s what we’re going to look into right away, to see the Facebook Graph API used from JavaScript with a Neo4j/IMDB backend. To get started, here’s how to perform a search:

self.movie_info = function( movie_name, callback ) {
    // The search API uses commas for AND-type searches, spaces become OR, so for
    // the movie names, we switch spaces out for commas.
    movie_name = movie_name.replace(/ /g, ",");
    FB.api("imdb:/search", {type:'movie', q:movie_name }, callback );
};

The request to get the movies an actor has acted in goes like this:

FB.api("imdb:/" + actor.id + "/acted_in", function( result ) {
  for (var i = 0; i < result.data.length; i++)
  {
      movie = result.data[i];
      // do something with the movie here!
  }
});

To get all actors in a movie, simply use the following request:

FB.api("imdb:/" + movie.id + "/actors", function(result) {
  for (var i = 0; i < result.data.length; i++)
  {
      actor = result.data[i];
      // do something with the actor here!
  }
});

Actually, these three different requests are all our small suggestion engine needs to fullfill it’s task. Have a look at SuggestionEngine.js to see the full code.

How to create a Graph API service on top of Neo4j

Let’s take a closer look at the movie backend now. It’s built using the Neo4j Ruby bindings. In our example data set we have Actors and Movies connected through Roles, here’s how these look in Ruby code:

class Movie; end

class Role
include Neo4j::RelationshipMixin
property :title, :character
end

class Actor
include Neo4j::NodeMixin
property :name
has_n(:acted_in).to(Movie).relationship(Role)
index :name, :tokenized => true
end

class Movie
include Neo4j::NodeMixin
property :title
property :year
index :title, :tokenized => true

# defines a method for traversing incoming acted_in relationships from Actor
has_n(:actors).from(Actor, :acted_in)
end

The code above is from the backend/model.rb file. On the Neo4j level, this is the kind of structure we’ll have:

By defining indexes on Actor and Movie we can later use the find method on the classes to perform searches.

Our next step is to expose this model over the Graph API, where we’ll use Sinatra and WEBrick to do the heavy lifting. The application is defined in the backend/neo4j_app.rb file – we’ll dive into portions of that code right here. To begin with, how to return data for an Actor or Movie by Id?

get '/:id' do # show a node
 content_type 'text/javascript'
 node = node_by_id(params[:id])
 props = external_props_for(node)
 props.merge! metadata_for(node) if params[:metadata] == "1"
 json = JSON.pretty_generate(props)
 json = callback_wrapper(json, params[:callback])
 json
end

The Sinatra route above uses a few small utility functions, let’s look into them as well. The first one is very simple, but useful if we want to extend the URIs to allow for requesting for example /{moviename}/actors and not only numeric IDs.

def node_by_id(id)
 node = Neo4j.load_node(id) if id =~ /^(d+)$/
 halt 404 if node.nil?
 node
end

The next function returns the properties of a node, while filtering out those that have a name starting with a “_” character. It also adds the node id to the result.

def external_props_for(node)
 ext_props = node.props.delete_if{|key, value| key =~ /^_/}
 ext_props[:id] = node.neo_id
 ext_props
end

Then there’s a function that gathers metadata for a node, including a link to the list of connections to other nodes, and the type of the node.

def metadata_for(node)
 if node.kind_of? Actor
   connections = url_for(node, "acted_in")
 elsif node.kind_of? Movie
   connections = url_for(node, "actors")
 end
 metadata = { :metadata => { :connections => connections }, :type => node.class.name.downcase }
end

There’s a couple more utility functions, but we’ll skip them here as they are unrelated to Neo4j.

Next up is getting the relationships from an Actor or Movie. The code will only care about valid paths, that is, paths having /acted_in or /actors in the end. In other cases, an empty data set is returned. Other than that, it simply delegates the work to the domain classes, by doing node.send(relationship) to get the relationships. Using the send method in Ruby will here equal the statements node.acted_in or node.actors.

get '/:id/:relation' do # show a relationship
 content_type 'text/javascript'
 node = node_by_id(params[:id])
 data = []
 [ :acted_in, :actors ].each do |relationship|
   if params[:relation] == relationship.to_s and node.respond_to? relationship
     data = node.send(relationship)
   end
 end
 data = data.map{|node| node_data(node)}
 json = JSON.pretty_generate({:data => data})
 json = callback_wrapper(json, params[:callback])
 json
end

When viewing the relationships, we only want to show the most basic node info, so there’s a utility function to do that as well:

def node_data(node)
 data = { :id => node.neo_id }
 [ :name, :title ].each do |property|
   data.merge!({ property => node[property] }) unless node[property].nil?
 end
 data
end

Performing the searches are basically handled by adding indexes to the model (see the code further above). So what’s left to do in the application is some sanity checks, delegating the search to the model and finally to format the output properly. Here goes:

get '/search' do
 content_type 'text/javascript'
 q = params[:q]
 type = params[:type]
 halt 400 unless q && type
 result = case type
   when 'actor'
     Actor.find(to_lucene(:name, q))
   when 'movie'
     Movie.find(to_lucene(:title, q))
   else
     []
 end
 json = JSON.pretty_generate(result.map{|node| external_props_for(node)})
 json = callback_wrapper(json, params[:callback])
 json
end

Wrap up

Here’s some major takeaways from this post:

Graphs are going mainstream, as evidenced by initiatives like the Facebook Graph API.
It’s often convenient to look at your data in the form of a graph, and with recent support in graph databases like Neo4j, it’s easy to use different data sources in tandem through the Graph API.
Exposing data through the Graph API is simple if you have a graphdb backend.

And once you put your data in a graphdb, you can of course do more advanced graphy things too, like finding shortest paths, routing with A*, modeling of complex domains and whatnot. Just get started!

Example source code

To get the source code of the example, go to the download page.

Credits

Here’s the guys who wrote the code of the example:

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook