Decentralized finance (DeFi) is a movement working to replicate and replace traditional finance (TradFi) using blockchain and cryptocurrencies.
Today you can lend, borrow, swap, margin trade, and even create your own mini hedge fund on chain. It’s all permissionless and trustless.
Most DeFi takes place on Ethereum. One notable exception is a protocol called Sovryn which brings financial primitives to bitcoin using the rootstock sidechain RSK.
My goals were to
- Learn how the Sovryn protocol really works. Blockchain and DeFi are advancing quickly, and the pace of progress is faster than the advances in tooling. After reading the documentation you have to read the source code of the smart contracts, or explore the raw block-level data. Both are difficult and unintuitive, particularly for developers that may be new to the space.
- Get a robust dataset about Sovryn. If you want to understand the activity on the protocol, and the developers don’t happen to provide the answer on the official app, you have to dig into the block data. This is clunky and much of the data is not human-readable.
Its also difficult to get a summary of the data. This transaction describes a swap between bitcoin (technically WRBTC) and the stable coin Tether. What if you wanted to find all such transactions? Today you would have to download the whole chain and interact with the ABI.
If you would rather spend your time doing data science instead of blockchain development, this is a serious barrier to entry.
Graphing the Chain
To create a knowledge graph of blockchain data we need to define only a few different types of nodes: Block
, Transaction
, Address
, Token
, Contract
, and LogEvent
. Token
and Contract
are subtypes of Address
. Strictly speaking Token
and Contract
aren’t necessary but they certainly are convenient for helping humans make sense of what’s going on.
Each Block
will CONTAIN
zero or more Transaction
s. The Transactions are where much of the action is. Each Transaction is from
one Address
and may be to another one. If these addresses describe known Token
s or Contract
s then the information for those will be filled in.
Each Transaction has one or more LogEvent
s. Each of these events CALLS
various Addresses (or Tokens, or Contracts). In creating this knowledge graph, a number of ABIs were parsed so that the information in each of the CALLS
.
The result is a simple schema that can capture all the richness of the blockchain. Very satisfying!
A Quick Tour of the Knowledge Graph
If you want to take the tour with me, check out the repository or the video tour.
One goal is to be able to load data from the protocol directly into Python.
To that end I put a wrapper around the Neo4j session to give it a little syntactic sugar. You can type any query directly into a knowledge_graph.Query
object. First, lets see a few blocks with available data.
from sovrynkg.knowledge_graph import Query
q = Query()
q.add("MATCH (b:Block) RETURN b.height as height ORDER BY height LIMIT 10")
q.data()
[{'height': 2742418},
{'height': 2742441},
{'height': 2742445},
{'height': 2742446},
{'height': 2742448},
{'height': 2742450},
{'height': 2742451},
{'height': 2742453},
{'height': 2742457},
{'height': 2742460}]
So 2742418
is where it all began. Let’s see the transaction at that block. A Cypher query that gets us that data is:
MATCH (b:Block)-[:CONTAINS]->(tx:Transaction)-[:HAS_EVENT]->
(le:LogEvent)-[:CALLS]-(addy:Address)
WHERE b.height=2742418
RETURN b, tx, le, addy
Deciphering the Cypher: this says to find a block with the given block height, and also find the Transaction
, LogEvent
, and any Address
that is connected (remember Token
and Contract
are also Address
).
Inspecting the CALLS
relationship, it has
"name": "OwnershipTransferred",
"newOwner": "0x7be508451cd748ba55dcbe75c8067f9420909b49",
"previousOwner: "0x0000000000000000000000000000000000000000"
The first transaction on the Sovryn protocol is the creation of the contract. On RSK contracts are created by “transferring” ownership from the null address.
So What? I’m Here for the Money
Let’s chase the money. Find a reasonably high value transaction
q = Query()
q.add("
MATCH (tx:Transaction) RETURN tx ORDER BY tx.value DESC LIMIT 1")
result = q.only()
result
{'tx': {'gas_price': 60000000,
'gas_offered': 172201,
'gas_spent': 172201,
'gas_quote': 0,
'gas_quote_rate': 4083,
'tx_offset': 4,
'value_quote': 7350,
'tx_hash': '0xcaefac99f076cd6e9e02a2b1309056eebab634f7cdf0ff28b7050dbc37c9110d',
'value': 1800000000000000000,
'successful': True}}
This transaction involved 1.8 wrapped BTC ($55k USD) (BTC is given to the 18 decimal places).
Let’s get more details. We use the following (slightly verbose) query to pull out everything having to do with that single transaction.
It’s very similar to the above query, except this time we’re getting the Address
that the bitcoin was sent TO
and FROM
, in addition to all the other information.
MATCH (b:Block)-[:CONTAINS]->(tx:Transaction)
WHERE tx.tx_hash="0xbef02237efff3788082b28d74e34c7c245e1e8ea6a5b1da4d40967ddd08fd5a8"
MATCH (frm:Address)<-[:FROM]-(tx)-[:TO*0..1]-(to:Address)
MATCH (tx)-[:HAS_EVENT]->(le:LogEvent)-[:CALLS]-(addy:Address)
RETURN tx, le, addy, frm, to
Looks like this transaction was a loan. Whoever owns the from
address 0x5d0eeaeabd5123e3d557c8a552134f24c6271a74
borrowed 1.8 WRBTC.
This address doesn’t seem to match any Contract or Token documented as part of the Sovryn protocol so its probably just some person out there on the chain.
Larger Scale Analysis
These colorful circles are all well and good, but what if you want to analyze meaningful amounts of data.
We can use the knowledge graph to do larger scale analysis as well. Let’s look at a swap — exchanging one type of token for an equal monetary value of another.
We’ll limit the number of results for this example, but you could just remove the limit
and skip
keyword arguments and get all the data.
import plotly.express as px
from sovrynkg.swaps import get_swap_df
df = get_swap_df(skip=1000, limit=1000)
df.head()
Great, we have the data. Now lets try to make sense of it. If we want to get more information about the addresses we can use a built-in tool.
import sovrynkg.contracts as contracts
wrbtc = contracts.BY_NAME['WRBTC']
wrbtc, wrbtc.address
(<Token WRBTC:0x542…677d>, '0x542fda317318ebf1d3deaf76e0b632741a7e677d')
You can slice and dice your dataframe in powerful ways. Let’s look at the history of the WRBTC/USDT
swaps here.
bt_pair = df[df.to_token=='WRBTC']
bt_pair = bt_pair[bt_pair.from_token=='USDT']
#both WRBTC and USDT have 18 decimals
bt_pair['exchange_rate'] = bt_pair.from_amount/bt_pair.to_amount
fig = px.line(bt_pair, x='signed_at', y='exchange_rate', \
title='WRBTC vs USDT swap on Sovryn')
fig.show()
Knowledge Graphs to Answer Any Question
The amazing thing about a knowledge graph is that for just about any question you can dream up, the answer is embedded in the data somehow.
You just have to be clever enough to craft a query to find it. It’s this richness of exploration that makes knowledge graphing with Neo4j such a good tool for exploring blockchains.
- How does the protocol work?
- Who are the biggest users?
- Are there any leading indicators of price movements between one cryptocurrency pair or another?
- Given an outside dataset of the Sovryn team’s marketing efforts, is there an effect on trading volume on the protocol?
If you’ve ever had the experience of setting up an SQL database to answer one question, only to be immediately asked an entirely different question you’ll be able to sympathize.
To Ethereum, and Beyond
Because the RSK sidechain of bitcoin is compatible with the Ethereum virtual machine, we could unleash this same code onto Ethereum and map out that entire chain as well.
A continuously updated knowledge graph plus a convenient SDK would be a very convenient package.
If anyone out there is interested in seeing that I’d invite you to get in touch.
Again, the GitHub repository is here if you want to try it yourself.
Knowledge Graph DeFi with Neo4j was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.