API Documentation¶
Components¶
KGWriter¶
- class neo4j_graphrag.experimental.components.kg_writer.KGWriter[source]¶
Abstract class used to write a knowledge graph to a data store.
- abstract run(graph)[source]¶
Writes the graph to a data store.
- Parameters:
graph (Neo4jGraph) – The knowledge graph to write to the data store.
- Return type:
Neo4jWriter¶
- class neo4j_graphrag.experimental.components.kg_writer.Neo4jWriter(driver, neo4j_database=None, max_concurrency=5)[source]¶
Writes a knowledge graph to a Neo4j database.
- Parameters:
Example:
from neo4j import AsyncGraphDatabase from neo4j_graphrag.experimental.components.kg_writer import Neo4jWriter from neo4j_graphrag.experimental.pipeline import Pipeline URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") DATABASE = "neo4j" driver = AsyncGraphDatabase.driver(URI, auth=AUTH, database=DATABASE) writer = Neo4jWriter(driver=driver, neo4j_database=DATABASE) pipeline = Pipeline() pipeline.add_component(writer, "writer")
- run(graph)[source]¶
Upserts a knowledge graph into a Neo4j database.
- Parameters:
graph (Neo4jGraph) – The knowledge graph to upsert into the database.
- Return type:
TextSplitter¶
LangChainTextSplitterAdapter¶
- class neo4j_graphrag.experimental.components.text_splitters.langchain.LangChainTextSplitterAdapter(text_splitter)[source]¶
Adapter for LangChain TextSplitters. Allows instances of this class to be used in the knowledge graph builder pipeline.
- Parameters:
text_splitter (LangChainTextSplitter) – An instance of LangChain’s TextSplitter class.
Example:
from langchain_text_splitters import RecursiveCharacterTextSplitter from neo4j_graphrag.experimental.components.text_splitters.langchain import LangChainTextSplitterAdapter from neo4j_graphrag.experimental.pipeline import Pipeline pipeline = Pipeline() text_splitter = LangChainTextSplitterAdapter(RecursiveCharacterTextSplitter()) pipeline.add_component(text_splitter, "text_splitter")
LlamaIndexTextSplitterAdapter¶
- class neo4j_graphrag.experimental.components.text_splitters.llamaindex.LlamaIndexTextSplitterAdapter(text_splitter)[source]¶
Adapter for LlamaIndex TextSplitters. Allows instances of this class to be used in the knowledge graph builder pipeline.
- Parameters:
text_splitter (LlamaIndexTextSplitter) – An instance of LlamaIndex’s TextSplitter class.
Example:
from llama_index.core.node_parser.text.sentence import SentenceSplitter from neo4j_graphrag.experimental.components.text_splitters.langchain import LangChainTextSplitterAdapter from neo4j_graphrag.experimental.pipeline import Pipeline pipeline = Pipeline() text_splitter = LlamaIndexTextSplitterAdapter(SentenceSplitter()) pipeline.add_component(text_splitter, "text_splitter")
TextChunkEmbedder¶
- class neo4j_graphrag.experimental.components.embedder.TextChunkEmbedder(embedder)[source]¶
Component for creating embeddings from text chunks.
- Parameters:
embedder (Embedder) – The embedder to use to create the embeddings.
Example:
from neo4j_graphrag.experimental.components.embedder import TextChunkEmbedder from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings from neo4j_graphrag.experimental.pipeline import Pipeline embedder = OpenAIEmbeddings(model="text-embedding-3-large") chunk_embedder = TextChunkEmbedder(embedder) pipeline = Pipeline() pipeline.add_component(chunk_embedder, "chunk_embedder")
- run(text_chunks)[source]¶
Embed a list of text chunks.
- Parameters:
text_chunks (TextChunks) – The text chunks to embed.
- Returns:
The input text chunks with each one having an added embedding.
- Return type:
SchemaBuilder¶
- class neo4j_graphrag.experimental.components.schema.SchemaBuilder[source]¶
A builder class for constructing SchemaConfig objects from given entities, relations, and their interrelationships defined in a potential schema.
Example:
from neo4j_graphrag.experimental.components.schema import ( SchemaBuilder, SchemaEntity, SchemaProperty, SchemaRelation, ) from neo4j_graphrag.experimental.pipeline import Pipeline entities = [ SchemaEntity( label="PERSON", description="An individual human being.", properties=[ SchemaProperty( name="name", type="STRING", description="The name of the person" ) ], ), SchemaEntity( label="ORGANIZATION", description="A structured group of people with a common purpose.", properties=[ SchemaProperty( name="name", type="STRING", description="The name of the organization" ) ], ), ] relations = [ SchemaRelation( label="EMPLOYED_BY", description="Indicates employment relationship." ), ] potential_schema = [ ("PERSON", "EMPLOYED_BY", "ORGANIZATION"), ] pipe = Pipeline() schema_builder = SchemaBuilder() pipe.add_component(schema_builder, "schema_builder") pipe_inputs = { "schema": { "entities": entities, "relations": relations, "potential_schema": potential_schema, }, ... } pipe.run(pipe_inputs)
- run(entities, relations, potential_schema)[source]¶
Asynchronously constructs and returns a SchemaConfig object.
- Parameters:
entities (List[SchemaEntity]) – List of Entity objects.
relations (List[SchemaRelation]) – List of Relation objects.
potential_schema (Dict[str, List[str]]) – Dictionary mapping entity names to Lists of relation names.
- Returns:
A configured schema object, constructed asynchronously.
- Return type:
EntityRelationExtractor¶
- class neo4j_graphrag.experimental.components.entity_relation_extractor.EntityRelationExtractor(*args, on_error=OnError.IGNORE, create_lexical_graph=True, **kwargs)[source]¶
Abstract class for entity relation extraction components.
- Parameters:
on_error (OnError) – What to do when an error occurs during extraction. Defaults to raising an error.
create_lexical_graph (bool) – Whether to include the text chunks in the graph in addition to the extracted entities and relations. Defaults to True.
args (Any)
kwargs (Any)
- abstract async run(chunks, document_info=None, **kwargs)[source]¶
- Parameters:
chunks (TextChunks)
document_info (DocumentInfo | None)
kwargs (Any)
- Return type:
- update_ids(graph, chunk_index, run_id)[source]¶
Make node IDs unique across chunks and pipeline runs by prefixing them with a custom prefix (set in the run method) and chunk index.
- Parameters:
graph (Neo4jGraph)
chunk_index (int)
run_id (str)
- Return type:
LLMEntityRelationExtractor¶
- class neo4j_graphrag.experimental.components.entity_relation_extractor.LLMEntityRelationExtractor(llm, prompt_template=<neo4j_graphrag.generation.prompts.ERExtractionTemplate object>, create_lexical_graph=True, on_error=OnError.RAISE, max_concurrency=5)[source]¶
Extracts a knowledge graph from a series of text chunks using a large language model.
- Parameters:
llm (LLMInterface) – The language model to use for extraction.
prompt_template (ERExtractionTemplate | str) – A custom prompt template to use for extraction.
create_lexical_graph (bool) – Whether to include the text chunks in the graph in addition to the extracted entities and relations. Defaults to True.
on_error (OnError) – What to do when an error occurs during extraction. Defaults to raising an error.
max_concurrency (int) – The maximum number of concurrent tasks which can be used to make requests to the LLM.
Example:
from neo4j_graphrag.experimental.components.entity_relation_extractor import LLMEntityRelationExtractor from neo4j_graphrag.llm import OpenAILLM from neo4j_graphrag.experimental.pipeline import Pipeline llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0, "response_format": {"type": "object"}}) extractor = LLMEntityRelationExtractor(llm=llm) pipe = Pipeline() pipe.add_component(extractor, "extractor")
- run(chunks, document_info=None, schema=None, examples='', **kwargs)[source]¶
Perform entity and relation extraction for all chunks in a list.
- Parameters:
chunks (TextChunks)
document_info (DocumentInfo | None)
schema (SchemaConfig | None)
examples (str)
kwargs (Any)
- Return type:
Retrievers¶
RetrieverInterface¶
- class neo4j_graphrag.retrievers.base.Retriever(driver, neo4j_database=None)[source]¶
Abstract class for Neo4j retrievers
- Parameters:
driver (neo4j.Driver)
neo4j_database (Optional[str])
- VERIFY_NEO4J_VERSION = True¶
- search(*args, **kwargs)[source]¶
Search method. Call the get_search_results method that returns a list of neo4j.Record, and format them using the function returned by get_result_formatter to return RetrieverResult.
- Parameters:
- Return type:
- abstract get_search_results(*args, **kwargs)[source]¶
This method must be implemented in each child class. It will receive the same parameters provided to the public interface via the search method, after validation. It returns a RawSearchResult object which comprises a list of neo4j.Record objects and an optional metadata dictionary that can contain retriever-level information.
Note that, even though this method is not intended to be called from outside the class, we make it public to make it clearer for the developers that it should be implemented in child classes.
- Returns:
List of Neo4j Records and optional metadata dict
- Return type:
- Parameters:
- get_result_formatter()[source]¶
Returns the function to use to transform a neo4j.Record to a RetrieverResultItem.
- Return type:
Callable[[Record], RetrieverResultItem]
VectorRetriever¶
- class neo4j_graphrag.retrievers.VectorRetriever(driver, index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using vector search over embeddings. If an embedder is provided, it needs to have the required Embedder type.
Example:
import neo4j from neo4j_graphrag.retrievers import VectorRetriever driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retriever = VectorRetriever(driver, "vector-index-name", custom_embedder) retriever.search(query_text="Find me a book about Fremen", top_k=5)
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
index_name (str) – Vector index name.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) –
Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
Two variables are provided in the neo4j.Record:
node: Represents the node retrieved from the vector index search.
score: Denotes the similarity score.
neo4j_database (Optional[str]) – The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
RetrieverInitializationError – If validation of the input arguments fail.
- search(query_vector=None, query_text=None, top_k=5, filters=None)¶
Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. See the following documentation for more details:
To query by text, an embedder must be provided when the class is instantiated. The embedder is not required if query_vector is passed.
- Parameters:
query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
query_text (Optional[str]) – The text to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
filters (Optional[dict[str, Any]]) – Filters for metadata pre-filtering. Defaults to None.
- Raises:
SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
VectorCypherRetriever¶
- class neo4j_graphrag.retrievers.VectorCypherRetriever(driver, index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using vector similarity augmented by a Cypher query. This retriever builds on VectorRetriever. If an embedder is provided, it needs to have the required Embedder type.
Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.
Example:
import neo4j from neo4j_graphrag.retrievers import VectorCypherRetriever driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name" retriever = VectorCypherRetriever( driver, "vector-index-name", retrieval_query, custom_embedder ) retriever.search(query_text="Find me a book about Fremen", top_k=5)
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
index_name (str) – Vector index name.
retrieval_query (str) – Cypher query that gets appended.
embedder (Optional[Embedder]) – Embedder object to embed query text.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- search(query_vector=None, query_text=None, top_k=5, query_params=None, filters=None)¶
Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. See the following documentation for more details:
To query by text, an embedder must be provided when the class is instantiated. The embedder is not required if query_vector is passed.
- Parameters:
query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
query_text (Optional[str]) – The text to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
query_params (Optional[dict[str, Any]]) – Parameters for the Cypher query. Defaults to None.
filters (Optional[dict[str, Any]]) – Filters for metadata pre-filtering. Defaults to None.
- Raises:
SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
HybridRetriever¶
- class neo4j_graphrag.retrievers.HybridRetriever(driver, vector_index_name, fulltext_index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using combination of vector search over embeddings and fulltext search. If an embedder is provided, it needs to have the required Embedder type.
Example:
import neo4j from neo4j_graphrag.retrievers import HybridRetriever driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retriever = HybridRetriever( driver, "vector-index-name", "fulltext-index-name", custom_embedder ) retriever.search(query_text="Find me a book about Fremen", top_k=5)
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
vector_index_name (str) – Vector index name.
fulltext_index_name (str) – Fulltext index name.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) –
Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
Two variables are provided in the neo4j.Record:
node: Represents the node retrieved from the vector index search.
score: Denotes the similarity score.
- search(query_text, query_vector=None, top_k=5)¶
Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.
See the following documentation for more details:
To query by text, an embedder must be provided when the class is instantiated.
- Parameters:
- Raises:
SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
HybridCypherRetriever¶
- class neo4j_graphrag.retrievers.HybridCypherRetriever(driver, vector_index_name, fulltext_index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using combination of vector search over embeddings and fulltext search, augmented by a Cypher query. This retriever builds on HybridRetriever. If an embedder is provided, it needs to have the required Embedder type.
Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.
Example:
import neo4j from neo4j_graphrag.retrievers import HybridCypherRetriever driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name" retriever = HybridCypherRetriever( driver, "vector-index-name", "fulltext-index-name", retrieval_query, custom_embedder ) retriever.search(query_text="Find me a book about Fremen", top_k=5)
To query by text, an embedder must be provided when the class is instantiated.
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
vector_index_name (str) – Vector index name.
fulltext_index_name (str) – Fulltext index name.
retrieval_query (str) – Cypher query that gets appended.
embedder (Optional[Embedder]) – Embedder object to embed query text.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
RetrieverInitializationError – If validation of the input arguments fail.
- search(query_text, query_vector=None, top_k=5, query_params=None)¶
Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.
See the following documentation for more details:
- Parameters:
query_text (str) – The text to get the closest neighbors of.
query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
query_params (Optional[dict[str, Any]]) – Parameters for the Cypher query. Defaults to None.
- Raises:
SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
Text2CypherRetriever¶
- class neo4j_graphrag.retrievers.Text2CypherRetriever(driver, llm, neo4j_schema=None, examples=None, result_formatter=None, custom_prompt=None)[source]¶
Allows for the retrieval of records from a Neo4j database using natural language. Converts a user’s natural language query to a Cypher query using an LLM, then retrieves records from a Neo4j database using the generated Cypher query
- Parameters:
driver (neo4j.driver) – The Neo4j Python driver.
llm (neo4j_graphrag.generation.llm.LLMInterface) – LLM object to generate the Cypher query.
neo4j_schema (Optional[str]) – Neo4j schema used to generate the Cypher query.
examples (Optional[list[str], optional) – Optional user input/query pairs for the LLM to use as examples.
custom_prompt (Optional[str]) – Optional custom prompt to use instead of auto generated prompt. Will not include the neo4j_schema or examples args, if provided.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]])
- Raises:
RetrieverInitializationError – If validation of the input arguments fail.
- search(query_text)¶
- Converts query_text to a Cypher query using an LLM.
Retrieve records from a Neo4j database using the generated Cypher query.
- Parameters:
query_text (str) – The natural language query used to search the Neo4j database.
- Raises:
SearchValidationError – If validation of the input arguments fail.
Text2CypherRetrievalError – If the LLM fails to generate a correct Cypher query.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
External Retrievers¶
This section includes retrievers that integrate with databases external to Neo4j.
WeaviateNeo4jRetriever¶
- class neo4j_graphrag.retrievers.external.weaviate.weaviate.WeaviateNeo4jRetriever(driver, client, collection, id_property_external, id_property_neo4j, embedder=None, return_properties=None, retrieval_query=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using vector search over embeddings with a Weaviate database. If an embedder is provided, it needs to have the required Embedder type.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.retrievers import WeaviateNeo4jRetriever from weaviate.connect.helpers import connect_to_local with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver: with connect_to_local() as w_client: retriever = WeaviateNeo4jRetriever( driver=neo4j_driver, client=w_client, collection="Jeopardy", id_property_external="neo4j_id", id_property_neo4j="id" ) result = retriever.search(query_text="biology", top_k=2)
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
client (WeaviateClient) – The Weaviate client object.
collection (str) – Name of a set of Weaviate objects that share the same data structure.
id_property_external (str) – The name of the Weaviate property that has the identifier that refers to a corresponding Neo4j node id property.
id_property_neo4j (str) – The name of the Neo4j node property that’s used as the identifier for relating matches from Weaviate to Neo4j nodes.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
retrieval_query (Optional[str])
- Raises:
RetrieverInitializationError – If validation of the input arguments fail.
- search(query_vector=None, query_text=None, top_k=5, **kwargs)¶
Get the top_k nearest neighbor embeddings using Weaviate for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search. If query_text is provided, then it will check if an embedder is provided and use it to generate the query_vector. If no embedder is provided, then it will assume that the vectorizer is used in Weaviate.
Example:
import neo4j from neo4j_graphrag.retrievers import WeaviateNeo4jRetriever driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retriever = WeaviateNeo4jRetriever( driver=driver, client=weaviate_client, collection="Jeopardy", id_property_external="neo4j_id", id_property_neo4j="id", ) biology_embedding = ... retriever.search(query_vector=biology_embedding, top_k=2)
- Parameters:
- Raises:
SearchValidationError – If validation of the input arguments fail.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
PineconeNeo4jRetriever¶
- class neo4j_graphrag.retrievers.external.pinecone.pinecone.PineconeNeo4jRetriever(driver, client, index_name, id_property_neo4j, embedder=None, return_properties=None, retrieval_query=None, result_formatter=None, neo4j_database=None)[source]¶
Provides retrieval method using vector search over embeddings with a Pinecone database. If an embedder is provided, it needs to have the required Embedder type.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.retrievers import PineconeNeo4jRetriever from pinecone import Pinecone with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver: pc_client = Pinecone(PC_API_KEY) embedder = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") retriever = PineconeNeo4jRetriever( driver=neo4j_driver, client=pc_client, index_name="jeopardy", id_property_neo4j="id", embedder=embedder, ) result = retriever.search(query_text="biology", top_k=2)
- Parameters:
driver (neo4j.Driver) – The Neo4j Python driver.
client (Pinecone) – The Pinecone client object.
index_name (str) – The name of the Pinecone index.
id_property_neo4j (str) – The name of the Neo4j node property that’s used as the identifier for relating matches from Pinecone to Neo4j nodes.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
retrieval_query (str) – Cypher query that gets appended.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
RetrieverInitializationError – If validation of the input arguments fail.
- search(query_vector=None, query_text=None, top_k=5, **kwargs)¶
Get the top_k nearest neighbor embeddings using Pinecone for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search. If query_text is provided, then it will check if an embedder is provided and use it to generate the query_vector.
See the following documentation for more details: - Query a vector index - db.index.vector.queryNodes() - db.index.fulltext.queryNodes()
Example:
from neo4j import GraphDatabase from neo4j_graphrag.retrievers import PineconeNeo4jRetriever from pinecone import Pinecone with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver: pc_client = Pinecone(PC_API_KEY) retriever = PineconeNeo4jRetriever( driver=neo4j_driver, client=pc_client, index_name="jeopardy", id_property_neo4j="id" ) biology_embedding = ... retriever.search(query_vector=biology_embedding, top_k=2)
- Parameters:
- Raises:
SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided when using text as an input.
- Returns:
The results of the search query as a list of neo4j.Record and an optional metadata dict
- Return type:
Embedder¶
- class neo4j_graphrag.embedder.Embedder[source]¶
Interface for embedding models. An embedder passed into a retriever must implement this interface.
SentenceTransformerEmbeddings¶
OpenAIEmbeddings¶
VertexAIEmbeddings¶
Generation¶
LLMInterface¶
- class neo4j_graphrag.llm.LLMInterface(model_name, model_params=None, **kwargs)[source]¶
Interface for large language models.
- abstract invoke(input)[source]¶
Sends a text input to the LLM and retrieves a response.
- Parameters:
input (str) – Text sent to the LLM
- Returns:
The response from the LLM.
- Return type:
- Raises:
LLMGenerationError – If anything goes wrong.
- abstract async ainvoke(input)[source]¶
Asynchronously sends a text input to the LLM and retrieves a response.
- Parameters:
input (str) – Text sent to the LLM
- Returns:
The response from the LLM.
- Return type:
- Raises:
LLMGenerationError – If anything goes wrong.
OpenAILLM¶
- class neo4j_graphrag.llm.OpenAILLM(model_name, model_params=None, **kwargs)[source]¶
-
- invoke(input)[source]¶
Sends a text input to the OpenAI chat completion model and returns the response’s content.
- Parameters:
input (str) – Text sent to the LLM
- Returns:
The response from OpenAI.
- Return type:
- Raises:
LLMGenerationError – If anything goes wrong.
- async ainvoke(input)[source]¶
Asynchronously sends a text input to the OpenAI chat completion model and returns the response’s content.
- Parameters:
input (str) – Text sent to the LLM
- Returns:
The response from OpenAI.
- Return type:
- Raises:
LLMGenerationError – If anything goes wrong.
PromptTemplate¶
- class neo4j_graphrag.generation.prompts.PromptTemplate(template=None, expected_inputs=None)[source]¶
This class is used to generate a parameterized prompt. It is defined from a string (the template) using the Python format syntax (parameters between curly braces {}) and a list of required inputs. Before sending the instructions to an LLM, call the format method that will replace parameters with the provided values. If any of the expected inputs is missing, a PromptMissingInputError is raised.
- format(*args, **kwargs)[source]¶
This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs
Example:
prompt_template = PromptTemplate( template='''Explain the following concept to {target_audience}: Concept: {concept} Answer: ''', expected_inputs=['target_audience', 'concept'] ) prompt = prompt_template.format('12 yo children', concept='graph database') print(prompt) # Result: # '''Explain the following concept to 12 yo children: # Concept: graph database # Answer: # '''
RagTemplate¶
- class neo4j_graphrag.generation.prompts.RagTemplate(template=None, expected_inputs=None)[source]¶
-
- DEFAULT_TEMPLATE: str = 'Answer the user question using the following context\n\nContext:\n{context}\n\nExamples:\n{examples}\n\nQuestion:\n{query_text}\n\nAnswer:\n'¶
- format(query_text, context, examples)[source]¶
This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs
Example:
prompt_template = PromptTemplate( template='''Explain the following concept to {target_audience}: Concept: {concept} Answer: ''', expected_inputs=['target_audience', 'concept'] ) prompt = prompt_template.format('12 yo children', concept='graph database') print(prompt) # Result: # '''Explain the following concept to 12 yo children: # Concept: graph database # Answer: # '''
ERExtractionTemplate¶
- class neo4j_graphrag.generation.prompts.ERExtractionTemplate(template=None, expected_inputs=None)[source]¶
-
- DEFAULT_TEMPLATE: str = '\nYou are a top-tier algorithm designed for extracting\ninformation in structured formats to build a knowledge graph.\n\nExtract the entities (nodes) and specify their type from the following text.\nAlso extract the relationships between these nodes.\n\nReturn result as JSON using the following format:\n{{"nodes": [ {{"id": "0", "label": "Person", "properties": {{"name": "John"}} }}],\n"relationships": [{{"type": "KNOWS", "start_node_id": "0", "end_node_id": "1", "properties": {{"since": "2024-08-01"}} }}] }}\n\nUse only fhe following nodes and relationships (if provided):\n{schema}\n\nAssign a unique ID (string) to each node, and reuse it to define relationships.\nDo respect the source and target node types for relationship and\nthe relationship direction.\n\nDo not return any additional information other than the JSON in it.\n\nExamples:\n{examples}\n\nInput text:\n\n{text}\n'¶
- format(text, schema, examples)[source]¶
This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs
Example:
prompt_template = PromptTemplate( template='''Explain the following concept to {target_audience}: Concept: {concept} Answer: ''', expected_inputs=['target_audience', 'concept'] ) prompt = prompt_template.format('12 yo children', concept='graph database') print(prompt) # Result: # '''Explain the following concept to 12 yo children: # Concept: graph database # Answer: # '''
RAG¶
GraphRAG¶
- class neo4j_graphrag.generation.graphrag.GraphRAG(retriever, llm, prompt_template=<neo4j_graphrag.generation.prompts.RagTemplate object>)[source]¶
Performs a GraphRAG search using a specific retriever and LLM.
Example:
import neo4j from neo4j_graphrag.retrievers import VectorRetriever from neo4j_graphrag.llm.openai_llm import OpenAILLM from neo4j_graphrag.generation import GraphRAG driver = neo4j.GraphDatabase.driver(URI, auth=AUTH) retriever = VectorRetriever(driver, "vector-index-name", custom_embedder) llm = OpenAILLM() graph_rag = GraphRAG(retriever, llm) graph_rag.search(query_text="Find me a book about Fremen")
- Parameters:
retriever (Retriever) – The retriever used to find relevant context to pass to the LLM.
llm (LLMInterface) – The LLM used to generate the answer.
prompt_template (RagTemplate) – The prompt template that will be formatted with context and user question and passed to the LLM.
- Raises:
RagInitializationError – If validation of the input arguments fail.
- search(query_text='', examples='', retriever_config=None, return_context=False, query=None)[source]¶
This method performs a full RAG search: 1. Retrieval: context retrieval 2. Augmentation: prompt formatting 3. Generation: answer generation with LLM
- Parameters:
query_text (str) – The user question
examples (str) – Examples added to the LLM prompt.
retriever_config (Optional[dict]) – Parameters passed to the retriever search method; e.g.: top_k
return_context (bool) – Whether to append the retriever result to the final result (default: False)
query (Optional[str]) – The user question. Will be deprecated in favor of query_text.
- Returns:
The LLM-generated answer
- Return type:
Database Interaction¶
- neo4j_graphrag.indexes.create_vector_index(driver, name, label, embedding_property, dimensions, similarity_fn, neo4j_database=None)[source]¶
This method constructs a Cypher query and executes it to create a new vector index in Neo4j.
See Cypher manual on creating vector indexes.
Important: This operation will fail if an index with the same name already exists. Ensure that the index name provided is unique within the database context.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.indexes import create_vector_index URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") INDEX_NAME = "vector-index-name" # Connect to Neo4j database driver = GraphDatabase.driver(URI, auth=AUTH) # Creating the index create_vector_index( driver, INDEX_NAME, label="Document", embedding_property="vectorProperty", dimensions=1536, similarity_fn="euclidean", )
- Parameters:
driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The unique name of the index.
label (str) – The node label to be indexed.
embedding_property (str) – The property key of a node which contains embedding values.
dimensions (int) – Vector embedding dimension
similarity_fn (str) – case-insensitive values for the vector similarity function:
euclidean
orcosine
.neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
ValueError – If validation of the input arguments fail.
neo4j.exceptions.ClientError – If creation of vector index fails.
- Return type:
None
- neo4j_graphrag.indexes.create_fulltext_index(driver, name, label, node_properties, neo4j_database=None)[source]¶
This method constructs a Cypher query and executes it to create a new fulltext index in Neo4j.
See Cypher manual on creating fulltext indexes.
Important: This operation will fail if an index with the same name already exists. Ensure that the index name provided is unique within the database context.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.indexes import create_fulltext_index URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") INDEX_NAME = "fulltext-index-name" # Connect to Neo4j database driver = GraphDatabase.driver(URI, auth=AUTH) # Creating the index create_fulltext_index( driver, INDEX_NAME, label="Document", node_properties=["vectorProperty"], )
- Parameters:
driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The unique name of the index.
label (str) – The node label to be indexed.
node_properties (list[str]) – The node properties to create the fulltext index on.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
ValueError – If validation of the input arguments fail.
neo4j.exceptions.ClientError – If creation of fulltext index fails.
- Return type:
None
- neo4j_graphrag.indexes.drop_index_if_exists(driver, name, neo4j_database=None)[source]¶
This method constructs a Cypher query and executes it to drop an index in Neo4j, if the index exists. See Cypher manual on dropping vector indexes.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.indexes import drop_index_if_exists URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") INDEX_NAME = "fulltext-index-name" # Connect to Neo4j database driver = GraphDatabase.driver(URI, auth=AUTH) # Dropping the index if it exists drop_index_if_exists( driver, INDEX_NAME, )
- Parameters:
driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The name of the index to delete.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
neo4j.exceptions.ClientError – If dropping of index fails.
- Return type:
None
- neo4j_graphrag.indexes.upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]¶
This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific node.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.indexes import upsert_vector URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") # Connect to Neo4j database driver = GraphDatabase.driver(URI, auth=AUTH) # Upsert the vector data upsert_vector( driver, node_id="nodeId", embedding_property="vectorProperty", vector=..., )
- Parameters:
driver (neo4j.Driver) – Neo4j Python driver instance.
node_id (int) – The id of the node.
embedding_property (str) – The name of the property to store the vector in.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
Neo4jInsertionError – If upserting of the vector fails.
- Return type:
None
- neo4j_graphrag.indexes.upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]¶
This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific relationship.
Example:
from neo4j import GraphDatabase from neo4j_graphrag.indexes import upsert_vector_on_relationship URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") # Connect to Neo4j database driver = GraphDatabase.driver(URI, auth=AUTH) # Upsert the vector data upsert_vector_on_relationship( driver, node_id="nodeId", embedding_property="vectorProperty", vector=..., )
- Parameters:
driver (neo4j.Driver) – Neo4j Python driver instance.
rel_id (int) – The id of the relationship.
embedding_property (str) – The name of the property to store the vector in.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
Neo4jInsertionError – If upserting of the vector fails.
- Return type:
None
- async neo4j_graphrag.indexes.async_upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]¶
This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific node.
Example:
from neo4j import AsyncGraphDatabase from neo4j_graphrag.indexes import upsert_vector URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") # Connect to Neo4j database driver = AsyncGraphDatabase.driver(URI, auth=AUTH) # Upsert the vector data async_upsert_vector( driver, node_id="nodeId", embedding_property="vectorProperty", vector=..., )
- Parameters:
driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.
node_id (int) – The id of the node.
embedding_property (str) – The name of the property to store the vector in.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
Neo4jInsertionError – If upserting of the vector fails.
- Return type:
None
- async neo4j_graphrag.indexes.async_upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]¶
This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific relationship.
Example:
from neo4j import AsyncGraphDatabase from neo4j_graphrag.indexes import upsert_vector_on_relationship URI = "neo4j://localhost:7687" AUTH = ("neo4j", "password") # Connect to Neo4j database driver = AsyncGraphDatabase.driver(URI, auth=AUTH) # Upsert the vector data async_upsert_vector_on_relationship( driver, node_id="nodeId", embedding_property="vectorProperty", vector=..., )
- Parameters:
driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.
rel_id (int) – The id of the relationship.
embedding_property (str) – The name of the property to store the vector in.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).
- Raises:
Neo4jInsertionError – If upserting of the vector fails.
- Return type:
None
Errors¶
Neo4jGraphRagError¶
RetrieverInitializationError¶
- class neo4j_graphrag.exceptions.RetrieverInitializationError(errors)[source]¶
Bases:
Neo4jGraphRagError
Exception raised when initialization of a retriever fails.
- Parameters:
errors (list[ErrorDetails])
SearchValidationError¶
- class neo4j_graphrag.exceptions.SearchValidationError(errors)[source]¶
Bases:
Neo4jGraphRagError
Exception raised for validation errors during search.
- Parameters:
errors (list[ErrorDetails])
FilterValidationError¶
- class neo4j_graphrag.exceptions.FilterValidationError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when input validation for metadata filtering fails.
EmbeddingRequiredError¶
- class neo4j_graphrag.exceptions.EmbeddingRequiredError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when an embedding method is required but not provided.
InvalidRetrieverResultError¶
- class neo4j_graphrag.exceptions.InvalidRetrieverResultError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when the Retriever fails to return a result.
Neo4jIndexError¶
- class neo4j_graphrag.exceptions.Neo4jIndexError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when handling Neo4j index fails.
Neo4jInsertionError¶
- class neo4j_graphrag.exceptions.Neo4jInsertionError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when inserting data into the Neo4j database fails.
Neo4jVersionError¶
- class neo4j_graphrag.exceptions.Neo4jVersionError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when Neo4j version does not meet minimum requirements.
Text2CypherRetrievalError¶
- class neo4j_graphrag.exceptions.Text2CypherRetrievalError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when text-to-cypher retrieval fails.
SchemaFetchError¶
- class neo4j_graphrag.exceptions.SchemaFetchError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when a Neo4jSchema cannot be fetched.
RagInitializationError¶
- class neo4j_graphrag.exceptions.RagInitializationError(errors)[source]¶
Bases:
Neo4jGraphRagError
- Parameters:
errors (list[ErrorDetails])
PromptMissingInputError¶
- class neo4j_graphrag.exceptions.PromptMissingInputError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when a prompt required input is missing.
LLMGenerationError¶
- class neo4j_graphrag.exceptions.LLMGenerationError[source]¶
Bases:
Neo4jGraphRagError
Exception raised when answer generation from LLM fails.
PipelineDefinitionError¶
- class neo4j_graphrag.experimental.pipeline.exceptions.PipelineDefinitionError[source]¶
Bases:
Neo4jGraphRagError
Raised when the pipeline graph is invalid
PipelineMissingDependencyError¶
- class neo4j_graphrag.experimental.pipeline.exceptions.PipelineMissingDependencyError[source]¶
Bases:
Neo4jGraphRagError
Raised when a task is scheduled but its dependencies are not yet done
PipelineStatusUpdateError¶
- class neo4j_graphrag.experimental.pipeline.exceptions.PipelineStatusUpdateError[source]¶
Bases:
Neo4jGraphRagError
Raises when trying an invalid change of state (e.g. DONE => DOING)