Introduction
For the past years, Neosemantics has been the go-to solution for importing/exporting rdf data into/from Neo4j, model mapping, and graph validation based on SHACL validation and inferencing. For Neo4j users, Neosemantics was the only way to ingest any type of RDF formatted data and transform it into a labeled property graph in Neo4j. Unfortunately, a big limitation of Neosemantics is that it is not available on cloud-based Neo4j deployments.
Enter rdflib-neo4j, an open-source Python library that transcends this boundary. It introduces a client-side evolution of neosemantics, empowering data ingestion not only in on-premise scenarios but also within the Neo4j Aura ecosystem. In this blog post, we will explore the reason for developing this library, what functionality is in place, how to use it, and how you can contribute to this open-source project.
Neosemantics vs. Rdflib-Neo4j
There is one important thing to highlight. Currently, the rdflib-neo4j does not have the same capabilities as Neosemantics. For now, the library is in early access and has just the import functionality that neosemantics has. See the table below for a full comparison:
Getting Started With Rdflib-Neo4j
Incorporating the powerful capabilities of rdflib-neo4j into your workflow is a straightforward process. This section outlines the steps you need to follow for both your Neo4j database and your Python code.
Configuring Your Neo4j Graph Database
Configuring your Neo4j Graph Database to seamlessly integrate with rdflib-neo4j is a breeze. To get started:
Step 1: Initialize the database by creating a uniqueness constraint on Resources’ URIs. Execute the following Cypher fragment within your Neo4j environment:
CREATE CONSTRAINT n10s_unique_uri FOR (r:Resource) REQUIRE r.uri IS UNIQUE;
This constraint ensures the uniqueness of URIs for Resource nodes, simplifying the integration process significantly.
Alternative Approach: You can opt for a more streamlined setup. When attempting to open the store in your Python code, simply set create=True. The library will create the constraint for you, further simplifying the configuration process.
Step 2: Install rdflib-neo4j using Python’s package management tool, pip. Open your terminal or command prompt and execute the following commands:
$ pip install rdflib-neo4j
And that is it! You are good to go.
Example Of How To Use the Library
To use the library, you have to import a couple of things at the beginning of your python script:
- Neo4jStoreConfig: This object is used to configure the Neo4j store to connect to your Neo4j instance and to manage the parsing of a Triple Store.
- Neo4jStore: This class is an implementation of the rdflib-store class that uses Neo4j as a backend. In this way, it is possible to persist your RDF data directly in Neo4j, with the power of rdflib to process your data.
- Graph: RDFLib’s main data object is a Graph, which is a Python collection of RDF Subject, Predicate, Object Triples.
- Configuration strategies: You can have different strategies (which include custom prefixes, custom mappings, and multi-value arrays) depending on how you would like to convert your triples to nodes, relationships, and properties. For instance, if you want to keep the full URIs or you want to shorten the URIs using prefixes for property names, relationship names, and labels. In this simple example, we choose the ignore option, where the URIs are ignored and only the local names are kept.
from rdflib_neo4j import Neo4jStoreConfig, Neo4jStore, HANDLE_VOCAB_STRATEGY
from rdflib import Graph
Let’s explain what happens in the code below in some simple steps:
Step 1: Setting up Database Connection Credentials
The first step involves configuring the connection to your Neo4j Aura database. You’ll need to provide your database URI, username, and password. These authentication details are stored in a dictionary called auth_data, where the keys ‘uri’, ‘database’, ‘user’, and ‘pwd’ hold the respective values.
# set the confguration to connect to your Aura DB
AURA_DB_URI = "your_db_uri"
AURA_DB_USERNAME = "neo4j"
AURA_DB_PWD = "your_db_pwd"
auth_data = {'uri': AURA_DB_URI,
'database':"neo4j"
'user':AURA_DB_USERNAME,
'pwd':AURA_DB_PWD}
Step 2: Defining Custom Mapping and Store Configuration
Next, we define custom mappings and store configurations. This step allows you to tailor how the RDF data will be handled during ingestion. It includes options for handling vocabulary URI strategies and batching (for optimized performance). The custom_prefixes variable, not shown in this code snippet, can be used to define custom namespace prefixes for your data.
# Define your custom mappings & store config
config = Neo4jStoreConfig(auth_data=auth_data,
custom_prefixes=prefixes,
handle_vocab_uri_strategy=HANDLE_VOCAB_URI_STRATEGY.IGNORE
batching=True)
Step 3: Specifying RDF Data Source
In this example, we specify the source of the RDF data that we want to ingest. The file_path variable points to the location of an RDF file hosted on GitHub. You can replace this with the URL or file path to your own RDF data.
file_path = 'https://github.com/jbarrasa/gc-2022/raw/main/search/onto/concept-scheme-skos.ttl'
Step 4: Creating an RDF Graph, Parsing, and Ingesting Data
Now, we create an RDF graph called neo4j_aura using the Graph class from rdflib. We also initialize a Neo4j store using the Neo4jStore class from rdflib-neo4j, configuring it with the previously defined connection credentials (auth_data) and other settings.
We use the parse method to read and ingest the RDF data from the specified file (file_path) into the neo4j_aura graph. This step essentially imports the RDF data into your Neo4j Aura database.
# Create the RDF Graph, parse & ingest the data to Neo4j, and close the store(If the field
# batching is set to True in the Neo4jStoreConfig, remember to close the store to prevent the loss
# of any uncommitted records.)
neo4j_aura = Graph(store=Neo4jStore(config=config)
# Calling the parse method will implicitly open the store
neo4j_aura.parse(file_path, format="ttl")
Step 5: Closing the Store
Finally, we close the store to ensure that any uncommitted records are saved. This is especially important if batching is enabled in the store configuration (batching=True), as it prevents data loss that hasn’t been committed.
neo4j_aura.close(True)
Community and Contributions
rdflib-neo4j is an open-source project as part of the Neo4j Labs incubator. This means that you can use the project for free, and code contributions are very welcome! If you need additional support/help with extension and developing custom features, please reach out for a commercial agreement to us at ps_emea_pmo@neotechnology.com.
Conclusion
We shared more insights and updates about RDFLib at NODES 2023. In our talk, we dive deeper into the technical details of how we built the app and give you a crash course on getting started with the library.
Rdflib-Neo4j: A New Era in RDF Integration for Neo4j was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.