Hasbolt: The New Haskell Neo4j Bolt Driver
data:image/s3,"s3://crabby-images/9e0b2/9e0b20cfc42e33bbbd47bcb2b87a2d236a749999" alt="Pavel Yakolev, Director of Computational Biology, BIOCAD"
Director of Computational Biology, BIOCAD
4 min read
data:image/s3,"s3://crabby-images/2ca57/2ca5701c813335af96dfc3ffb70609f91c91c195" alt="Learn all about Hasbolt – the new Haskell-Neo4j language driver using the Bolt binary protocol"
[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]
Intro
Graph databases and especially Neo4j have proven to be a great solution for close-related data analysis. This is why biotechnology research groups are looking into such technologies, and why we’ve given birth to a new Bolt driver. A Haskell one.
My name is Pavel Yakovlev, and I am a director of the Computational Biology Department at BIOCAD — a leading Russian biotechnology company. Our department is involved in R&D of anti-cancer drugs of several types: small molecules, monoclonal antibodies and gene therapy.
One of the keys to success in our company is wide usage of computational technologies, like docking-based rational drug design, wet-lab automatisation and data analysis of automated experiments. Most of our computations are very computationally expensive, so we cannot afford runtime errors, and growing data flows require easy parallelism. These factors have led us to use functional programming paradigms like strong typing, purity and immutable data structures.
But algorithms on isolated data cannot solve every problem – we need relationships. For example, each drug candidate should has a lot of experimental or predicted ADME(T) parameters and be linked with its target.
Forming data in such a way, we can do predictive analytics, such as which molecular subunits are the most important to achieve the quality profile of an active drug component. So, we looked into graph databases and their ability to solve our tasks.
As my primary education is applied math and physics, I used to write scientific code a lot in the past, but now I do not write any for our systems. Nevertheless I began a one-week pet project to dive into graph technology using the Bolt driver and some toy programs (with real biological data) around it.
I stated with boltkit, especially driver.py. The guide is great, but it’s for imperative programming languages.
When you use Haskell, you have to think of how to organise the code in some type-driven way. Also, I did not understand the structure concept clearly. I have implemented a first version, but then I have found a Bolt Protocol specification and fully rewrote my implementation.
The Hasbolt Neo4j Bolt Driver
To use Neo4j via Bolt from Haskell, we want to support an API like this example demonstrates:
myConfiguration :: BoltCfg myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" } main :: IO () main = do pipe <- connect myConfiguration records <- run pipe $ query "MATCH (n:Person) WHERE n.name CONTAINS \"Tom\" RETURN n" let first = head records cruise <- first `at` "n" >>= exact :: IO Node print cruise close pipe
For this to work, we need some building blocks in Hasbolt which I’ll explain in the following sections. If you’re into Haskell you hopefully enjoy this and can also provide me some good feedback.
The Hasbolt driver has two main low-level concepts:
- Value — serialization and deserialization of primitive types, strings, lists, maps and structures. It also introduces Neo4j types like node, relationship, unbounded relationship and path.
- Connection — network overlay for sending and receiving data.
In
Data.Value.Type
, we can find the definition of all Bolt-able types:data Value = N () -- Null | B Bool -- Boolean | I Int -- 64-bit integer | F Double -- 64-bit float | T Text -- UTF8 strings | L [Value] -- lists | M (Map Text Value) -- maps with string keys | S Structure -- bolt structures deriving (Show, Eq)
As we have to unpack lots of values from a single bytestring, I used
StateT
monad transformer to save a current state — yet unpacked bytestring suffix:type UnpackT = StateT ByteString
All these values can be packed and unpacked to bytestrings using the
BoltValue
type-class:class BoltValue a where pack :: a -> ByteString unpackT :: Monad m => UnpackT m a unpack :: Monad m => ByteString -> m a unpack = evalStateT unpackT
Of course, all
Values
already have BoltValue
implementations.Nodes, relationships and paths as well as protocol requests and responses are structures. So, we have to convert these types to structures and back. I have a
Structable
type-class for this purpose:class Structable a where toStructure :: a -> Structure fromStructure :: Monad m => Structure -> m a
I use monadic context for each unpacking operations since it can fail: bytestring can be invalid and structure can have an unknown signature. Failure is a side effect, so you can use any monad to work with it.
Connection concepts are not so interesting, so I will describe just some important classes. First of all, to create a new connection, you need to fill a configuration record:
data BoltCfg = BoltCfg { magic :: Word32 -- '6060B017' value , version :: Word32 -- '00000001' value , userAgent :: Text -- Driver user agent , maxChunkSize :: Word16 -- Maximum chunk size of request , socketTimeout :: Int -- Driver socket timeout , host :: String -- Neo4j server hostname , port :: Int -- Neo4j server port , user :: Text -- Neo4j user , password :: Text -- Neo4j password }
Of course, most of these values would not change in near future, so
BoltCfg
implements a Default
type-class that points the configuration to localhost
with an empty user and password. As a result, you can fill only the fields of interest:myConfiguration :: BoltCfg myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" }
To create a new connection just put this configuration to a connection function like this in any MonadIO:
pipe <- connect myConfiguration
And to close the connection, run:
close pipe
The other important type is the
BoltActionT
monad transformer. It lets you chain queries using one connection. So, every query function returns a computation inside this monad:-- |Runs Cypher query and ignores response query_ :: MonadIO m => Text -> BoltActionT m () -- |Runs Cypher query and returns list of obtained Records query :: MonadIO m => Text -> BoltActionT m [Record] -- |Runs Cypher query with parameters and returns list of obtained Records queryP :: MonadIO m => Text -> Map Text Value -> BoltActionT m [Record]
To run this transformer use run function:
run :: MonadIO m => Pipe -> BoltActionT m a -> m a
If you are interested in the response, you can receive a
Record
. You can think about Records
like maps from strings to any data but with a possibility to extract any strong-typed value via the RecordValue
type-class:class RecordValue a where exact :: Monad m => Value -> m a
The implementation is provided for all
Value
types, nodes, relationships and paths.With these building blocks, we can now write the full code of our original example:
myConfiguration :: BoltCfg myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" } main :: IO () main = do pipe <- connect myConfiguration records <- run pipe $ query "MATCH (n:Person) WHERE n.name CONTAINS \"Tom\" RETURN n" let first = head records cruise <- first `at` "n" >>= exact :: IO Node print cruise close pipe
GitHub repository: https://github.com/zmactep/hasbolt
Docs: https://hackage.haskell.org/package/hasbolt
Example Application
The example movie application that is also used for many of the other Neo4j drivers is a single page web app that just uses jQuery to talk to three different endpoints of a backend implemented in the programming language and stack of a given driver.
The three endpoints provide movie search, single movie and cast listing and graph visualization of the whole example movie database. The front end just consumes the responses from these three endpoints and renders the results in place.
The HTTP backend uses the lightweight Scotty web framework. To store the internal state with the connection pool, we can use a
ReaderT
monad transformer over the resource-pool. Both packages will be installed from stackage with the stack build
command.To deploy on Heroku, just follow these steps:
export app=neo4j-movies-haskell-`whoami` heroku apps:create $app
Add the Neo4j addon and make it available from the application:
heroku addons:add graphenedb:chalk --app $app
Set the Haskell Stack buildpack:
heroku buildpacks:set https://github.com/mfine/heroku-buildpack-stack
Deploy a Heroku app:
git push heroku master
Open the application:
heroku open --app $app
Open the application on GrapheneDB:
heroku addons:open graphenedb
In the GrapheneDB user interface, use “Launch Neo4j Admin UI”. In the Neo4j Browser, import the
:play movies
dataset.You can find the Hasbolt example application with a very detailed README here: https://github.com/neo4j-examples/neo4j-movies-haskell-bolt
Sign Me Up