API Documentation

Components

KGWriter

class neo4j_graphrag.experimental.components.kg_writer.KGWriter[source]

Abstract class used to write a knowledge graph to a data store.

abstract run(graph, lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]

Writes the graph to a data store.

Parameters:
  • graph (Neo4jGraph) – The knowledge graph to write to the data store.

  • lexical_graph_config (LexicalGraphConfig) – Node labels and relationship types in the lexical graph.

Return type:

KGWriterModel

Neo4jWriter

class neo4j_graphrag.experimental.components.kg_writer.Neo4jWriter(driver, neo4j_database=None, batch_size=1000)[source]

Writes a knowledge graph to a Neo4j database.

Parameters:
  • driver (neo4j.driver) – The Neo4j driver to connect to the database.

  • neo4j_database (Optional[str]) – The name of the Neo4j database to write to. Defaults to ‘neo4j’ if not provided.

  • batch_size (int) – The number of nodes or relationships to write to the database in a batch. Defaults to 1000.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.kg_writer import Neo4jWriter
from neo4j_graphrag.experimental.pipeline import Pipeline

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH, database=DATABASE)
writer = Neo4jWriter(driver=driver, neo4j_database=DATABASE)

pipeline = Pipeline()
pipeline.add_component(writer, "writer")
run(graph, lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]

Upserts a knowledge graph into a Neo4j database.

Parameters:
  • graph (Neo4jGraph) – The knowledge graph to upsert into the database.

  • lexical_graph_config (LexicalGraphConfig)

Return type:

KGWriterModel

TextSplitter

class neo4j_graphrag.experimental.components.text_splitters.base.TextSplitter[source]

Interface for a text splitter.

abstract async run(text)[source]

Splits a piece of text into chunks.

Parameters:

text (str) – The text to be split.

Returns:

A list of chunks.

Return type:

TextChunks

FixedSizeSplitter

class neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter.FixedSizeSplitter(chunk_size=4000, chunk_overlap=200)[source]

Text splitter which splits the input text into fixed size chunks with optional overlap.

Parameters:
  • chunk_size (int) – The number of characters in each chunk.

  • chunk_overlap (int) – The number of characters from the previous chunk to overlap with each chunk. Must be less than chunk_size.

Example:

from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter
from neo4j_graphrag.experimental.pipeline import Pipeline

pipeline = Pipeline()
text_splitter = FixedSizeSplitter(chunk_size=4000, chunk_overlap=200)
pipeline.add_component(text_splitter, "text_splitter")
run(text)[source]

Splits a piece of text into chunks.

Parameters:

text (str) – The text to be split.

Returns:

A list of chunks.

Return type:

TextChunks

LangChainTextSplitterAdapter

LlamaIndexTextSplitterAdapter

TextChunkEmbedder

class neo4j_graphrag.experimental.components.embedder.TextChunkEmbedder(embedder)[source]

Component for creating embeddings from text chunks.

Parameters:

embedder (Embedder) – The embedder to use to create the embeddings.

Example:

from neo4j_graphrag.experimental.components.embedder import TextChunkEmbedder
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline import Pipeline

embedder = OpenAIEmbeddings(model="text-embedding-3-large")
chunk_embedder = TextChunkEmbedder(embedder)
pipeline = Pipeline()
pipeline.add_component(chunk_embedder, "chunk_embedder")
run(text_chunks)[source]

Embed a list of text chunks.

Parameters:

text_chunks (TextChunks) – The text chunks to embed.

Returns:

The input text chunks with each one having an added embedding.

Return type:

TextChunks

LexicalGraphBuilder

Neo4jChunkReader

class neo4j_graphrag.experimental.components.neo4j_reader.Neo4jChunkReader(driver, fetch_embeddings=False)[source]
Parameters:
  • driver (neo4j.Driver)

  • fetch_embeddings (bool)

run(lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]
Parameters:

lexical_graph_config (LexicalGraphConfig)

Return type:

TextChunks

SchemaBuilder

class neo4j_graphrag.experimental.components.schema.SchemaBuilder[source]

A builder class for constructing SchemaConfig objects from given entities, relations, and their interrelationships defined in a potential schema.

Example:

from neo4j_graphrag.experimental.components.schema import (
    SchemaBuilder,
    SchemaEntity,
    SchemaProperty,
    SchemaRelation,
)
from neo4j_graphrag.experimental.pipeline import Pipeline

entities = [
    SchemaEntity(
        label="PERSON",
        description="An individual human being.",
        properties=[
            SchemaProperty(
                name="name", type="STRING", description="The name of the person"
            )
        ],
    ),
    SchemaEntity(
        label="ORGANIZATION",
        description="A structured group of people with a common purpose.",
        properties=[
            SchemaProperty(
                name="name", type="STRING", description="The name of the organization"
            )
        ],
    ),
]
relations = [
    SchemaRelation(
        label="EMPLOYED_BY", description="Indicates employment relationship."
    ),
]
potential_schema = [
    ("PERSON", "EMPLOYED_BY", "ORGANIZATION"),
]
pipe = Pipeline()
schema_builder = SchemaBuilder()
pipe.add_component(schema_builder, "schema_builder")
pipe_inputs = {
    "schema": {
        "entities": entities,
        "relations": relations,
        "potential_schema": potential_schema,
    },
    ...
}
pipe.run(pipe_inputs)
run(entities, relations=None, potential_schema=None)[source]

Asynchronously constructs and returns a SchemaConfig object.

Parameters:
  • entities (List[SchemaEntity]) – List of Entity objects.

  • relations (List[SchemaRelation]) – List of Relation objects.

  • potential_schema (Dict[str, List[str]]) – Dictionary mapping entity names to Lists of relation names.

Returns:

A configured schema object, constructed asynchronously.

Return type:

SchemaConfig

EntityRelationExtractor

LLMEntityRelationExtractor

SinglePropertyExactMatchResolver

class neo4j_graphrag.experimental.components.resolver.SinglePropertyExactMatchResolver(driver, filter_query=None, resolve_property='name', neo4j_database=None)[source]

Resolve entities with same label and exact same property (default is “name”).

Parameters:
  • driver (neo4j.Driver) – The Neo4j driver to connect to the database.

  • filter_query (Optional[str]) – To reduce the resolution scope, add a Cypher WHERE clause.

  • resolve_property (str) – The property that will be compared (default: “name”). If values match exactly, entities are merged.

  • neo4j_database (Optional[str]) – The name of the Neo4j database to write to. Defaults to ‘neo4j’ if not provided.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.resolver import SinglePropertyExactMatchResolver

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH, database=DATABASE)
resolver = SinglePropertyExactMatchResolver(driver=driver, neo4j_database=DATABASE)
await resolver.run()  # no expected parameters
async run()[source]

Resolve entities based on the following rule: For each entity label, entities with the same ‘resolve_property’ value (exact match) are grouped into a single node:

  • Properties: the property from the first node will remain if already set, otherwise the first property in list will be written.

  • Relationships: merge relationships with same type and target node.

See apoc.refactor.mergeNodes documentation for more details.

Return type:

ResolutionStats

Pipelines

Pipeline

class neo4j_graphrag.experimental.pipeline.Pipeline(store=None)[source]

This is the main pipeline, where components and their execution order are defined

Parameters:

store (Optional[ResultStore])

draw(path, layout='dot', hide_unused_outputs=True)[source]
Parameters:
  • path (str)

  • layout (str)

  • hide_unused_outputs (bool)

Return type:

Any

add_component(component, name)[source]

Add a new component. Components are uniquely identified by their name. If ‘name’ is already in the pipeline, a ValueError is raised.

Parameters:
  • component (Component)

  • name (str)

Return type:

None

connect(start_component_name, end_component_name, input_config=None)[source]

Connect one component to another.

Parameters:
  • start_component_name (str) – name of the component as defined in the add_component method

  • end_component_name (str) – name of the component as defined in the add_component method

  • input_config (Optional[dict[str, str]]) – end component input configuration: propagate previous components outputs.

Raises:

PipelineDefinitionError – if the provided component are not in the Pipeline or if the graph that would be created by this connection is cyclic.

Return type:

None

async run(data)[source]
Parameters:

data (dict[str, Any])

Return type:

PipelineResult

SimpleKGPipeline

Retrievers

RetrieverInterface

class neo4j_graphrag.retrievers.base.Retriever(driver, neo4j_database=None)[source]

Abstract class for Neo4j retrievers

Parameters:
  • driver (neo4j.Driver)

  • neo4j_database (Optional[str])

index_name: str
VERIFY_NEO4J_VERSION = True
search(*args, **kwargs)[source]

Search method. Call the get_search_results method that returns a list of neo4j.Record, and format them using the function returned by get_result_formatter to return RetrieverResult.

Parameters:
Return type:

RetrieverResult

abstract get_search_results(*args, **kwargs)[source]

This method must be implemented in each child class. It will receive the same parameters provided to the public interface via the search method, after validation. It returns a RawSearchResult object which comprises a list of neo4j.Record objects and an optional metadata dictionary that can contain retriever-level information.

Note that, even though this method is not intended to be called from outside the class, we make it public to make it clearer for the developers that it should be implemented in child classes.

Returns:

List of Neo4j Records and optional metadata dict

Return type:

RawSearchResult

Parameters:
get_result_formatter()[source]

Returns the function to use to transform a neo4j.Record to a RetrieverResultItem.

Return type:

Callable[[Record], RetrieverResultItem]

default_record_formatter(record)[source]

Best effort to guess the node-to-text method. Inherited classes can override this method to implement custom text formatting.

Parameters:

record (Record)

Return type:

RetrieverResultItem

VectorRetriever

class neo4j_graphrag.retrievers.VectorRetriever(driver, index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]

Provides retrieval method using vector search over embeddings. If an embedder is provided, it needs to have the required Embedder type.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = VectorRetriever(driver, "vector-index-name", custom_embedder)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

or if the vector embedding of the query text is available:

retriever.search(query_vector=..., top_k=5)
Parameters:
  • driver (neo4j.Driver) – The Neo4j Python driver.

  • index_name (str) – Vector index name.

  • embedder (Optional[Embedder]) – Embedder object to embed query text.

  • return_properties (Optional[list[str]]) – List of node properties to return.

  • result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) –

    Provided custom function to transform a neo4j.Record to a RetrieverResultItem.

    Two variables are provided in the neo4j.Record:

    • node: Represents the node retrieved from the vector index search.

    • score: Denotes the similarity score.

  • neo4j_database (Optional[str]) – The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_vector=None, query_text=None, top_k=5, filters=None)

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. See the following documentation for more details:

To query by text, an embedder must be provided when the class is instantiated. The embedder is not required if query_vector is passed.

Parameters:
  • query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.

  • query_text (Optional[str]) – The text to get the closest neighbors of. Defaults to None.

  • top_k (int) – The number of neighbors to return. Defaults to 5.

  • filters (Optional[dict[str, Any]]) – Filters for metadata pre-filtering. Defaults to None.

Raises:
Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

RawSearchResult

VectorCypherRetriever

class neo4j_graphrag.retrievers.VectorCypherRetriever(driver, index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]

Provides retrieval method using vector similarity augmented by a Cypher query. This retriever builds on VectorRetriever. If an embedder is provided, it needs to have the required Embedder type.

Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.

The retrieval_query is additional Cypher that can allow for graph traversal after retrieving node.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorCypherRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name"
retriever = VectorCypherRetriever(
  driver, "vector-index-name", retrieval_query, custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)
Parameters:
  • driver (neo4j.Driver) – The Neo4j Python driver.

  • index_name (str) – Vector index name.

  • retrieval_query (str) – Cypher query that gets appended.

  • embedder (Optional[Embedder]) – Embedder object to embed query text.

  • result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Read more in the User Guide.

search(query_vector=None, query_text=None, top_k=5, query_params=None, filters=None)

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. See the following documentation for more details:

To query by text, an embedder must be provided when the class is instantiated. The embedder is not required if query_vector is passed.

Parameters:
  • query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.

  • query_text (Optional[str]) – The text to get the closest neighbors of. Defaults to None.

  • top_k (int) – The number of neighbors to return. Defaults to 5.

  • query_params (Optional[dict[str, Any]]) – Parameters for the Cypher query. Defaults to None.

  • filters (Optional[dict[str, Any]]) – Filters for metadata pre-filtering. Defaults to None.

Raises:
Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

RawSearchResult

HybridRetriever

class neo4j_graphrag.retrievers.HybridRetriever(driver, vector_index_name, fulltext_index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]

Provides retrieval method using combination of vector search over embeddings and fulltext search. If an embedder is provided, it needs to have the required Embedder type.

Example:

import neo4j
from neo4j_graphrag.retrievers import HybridRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = HybridRetriever(
    driver, "vector-index-name", "fulltext-index-name", custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)
Parameters:
  • driver (neo4j.Driver) – The Neo4j Python driver.

  • vector_index_name (str) – Vector index name.

  • fulltext_index_name (str) – Fulltext index name.

  • embedder (Optional[Embedder]) – Embedder object to embed query text.

  • return_properties (Optional[list[str]]) – List of node properties to return.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

  • result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) –

    Provided custom function to transform a neo4j.Record to a RetrieverResultItem.

    Two variables are provided in the neo4j.Record:

    • node: Represents the node retrieved from the vector index search.

    • score: Denotes the similarity score.

search(query_text, query_vector=None, top_k=5)

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.

See the following documentation for more details:

To query by text, an embedder must be provided when the class is instantiated.

Parameters:
  • query_text (str) – The text to get the closest neighbors of.

  • query_vector (Optional[list[float]], optional) – The vector embeddings to get the closest neighbors of. Defaults to None.

  • top_k (int, optional) – The number of neighbors to return. Defaults to 5.

Raises:
Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

RawSearchResult

HybridCypherRetriever

class neo4j_graphrag.retrievers.HybridCypherRetriever(driver, vector_index_name, fulltext_index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]

Provides retrieval method using combination of vector search over embeddings and fulltext search, augmented by a Cypher query. This retriever builds on HybridRetriever. If an embedder is provided, it needs to have the required Embedder type.

Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.

Example:

import neo4j
from neo4j_graphrag.retrievers import HybridCypherRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name"
retriever = HybridCypherRetriever(
    driver, "vector-index-name", "fulltext-index-name", retrieval_query, custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

To query by text, an embedder must be provided when the class is instantiated.

Parameters:
  • driver (neo4j.Driver) – The Neo4j Python driver.

  • vector_index_name (str) – Vector index name.

  • fulltext_index_name (str) – Fulltext index name.

  • retrieval_query (str) – Cypher query that gets appended.

  • embedder (Optional[Embedder]) – Embedder object to embed query text.

  • result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_text, query_vector=None, top_k=5, query_params=None)

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.

See the following documentation for more details:

Parameters:
  • query_text (str) – The text to get the closest neighbors of.

  • query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.

  • top_k (int) – The number of neighbors to return. Defaults to 5.

  • query_params (Optional[dict[str, Any]]) – Parameters for the Cypher query. Defaults to None.

Raises:
Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

RawSearchResult

Text2CypherRetriever

class neo4j_graphrag.retrievers.Text2CypherRetriever(driver, llm, neo4j_schema=None, examples=None, result_formatter=None, custom_prompt=None)[source]

Allows for the retrieval of records from a Neo4j database using natural language. Converts a user’s natural language query to a Cypher query using an LLM, then retrieves records from a Neo4j database using the generated Cypher query

Parameters:
  • driver (neo4j.Driver) – The Neo4j Python driver.

  • llm (neo4j_graphrag.generation.llm.LLMInterface) – LLM object to generate the Cypher query.

  • neo4j_schema (Optional[str]) – Neo4j schema used to generate the Cypher query.

  • examples (Optional[list[str], optional) – Optional user input/query pairs for the LLM to use as examples.

  • custom_prompt (Optional[str]) – Optional custom prompt to use instead of auto generated prompt. Will not include the neo4j_schema or examples args, if provided.

  • result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]])

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_text, prompt_params=None)
Converts query_text to a Cypher query using an LLM.

Retrieve records from a Neo4j database using the generated Cypher query.

Parameters:
  • query_text (str) – The natural language query used to search the Neo4j database.

  • prompt_params (Dict[str, Any]) – additional values to inject into the custom prompt, if it is provided. Example: {‘schema’: ‘this is the graph schema’}

Raises:
Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

RawSearchResult

External Retrievers

This section includes retrievers that integrate with databases external to Neo4j.

WeaviateNeo4jRetriever

PineconeNeo4jRetriever

QdrantNeo4jRetriever

Embedder

class neo4j_graphrag.embeddings.base.Embedder[source]

Interface for embedding models. An embedder passed into a retriever must implement this interface.

abstract embed_query(text)[source]

Embed query text.

Parameters:

text (str) – Text to convert to vector embedding

Returns:

A vector embedding.

Return type:

list[float]

SentenceTransformerEmbeddings

class neo4j_graphrag.embeddings.sentence_transformers.SentenceTransformerEmbeddings(model='all-MiniLM-L6-v2', *args, **kwargs)[source]
Parameters:
embed_query(text)[source]

Embed query text.

Parameters:

text (str) – Text to convert to vector embedding

Returns:

A vector embedding.

Return type:

list[float]

OpenAIEmbeddings

class neo4j_graphrag.embeddings.openai.OpenAIEmbeddings(model='text-embedding-ada-002', **kwargs)[source]

OpenAI embeddings class. This class uses the OpenAI python client to generate embeddings for text data.

Parameters:
  • model (str) – The name of the OpenAI embedding model to use. Defaults to “text-embedding-ada-002”.

  • kwargs (Any) – All other parameters will be passed to the openai.OpenAI init.

AzureOpenAIEmbeddings

class neo4j_graphrag.embeddings.openai.AzureOpenAIEmbeddings(model='text-embedding-ada-002', **kwargs)[source]

Azure OpenAI embeddings class. This class uses the Azure OpenAI python client to generate embeddings for text data.

Parameters:
  • model (str) – The name of the Azure OpenAI embedding model to use. Defaults to “text-embedding-ada-002”.

  • kwargs (Any) – All other parameters will be passed to the openai.AzureOpenAI init.

VertexAIEmbeddings

class neo4j_graphrag.embeddings.vertexai.VertexAIEmbeddings(model='text-embedding-004')[source]

Vertex AI embeddings class. This class uses the Vertex AI Python client to generate vector embeddings for text data.

Parameters:

model (str) – The name of the Vertex AI text embedding model to use. Defaults to “text-embedding-004”.

embed_query(text, task_type='RETRIEVAL_QUERY', **kwargs)[source]

Generate embeddings for a given query using a Vertex AI text embedding model.

Parameters:
Return type:

list[float]

MistralAIEmbeddings

class neo4j_graphrag.embeddings.mistral.MistralAIEmbeddings(model='mistral-embed', **kwargs)[source]

Mistral AI embeddings class. This class uses the Mistral AI Python client to generate vector embeddings for text data.

Parameters:
  • model (str) – The name of the Mistral AI text embedding model to use. Defaults to “mistral-embed”.

  • kwargs (Any)

embed_query(text, **kwargs)[source]

Generate embeddings for a given query using a Mistral AI text embedding model.

Parameters:
  • text (str) – The text to generate an embedding for.

  • **kwargs (Any) – Additional keyword arguments to pass to the Mistral AI client.

Return type:

list[float]

CohereEmbeddings

class neo4j_graphrag.embeddings.cohere.CohereEmbeddings(model='', **kwargs)[source]
Parameters:
  • model (str)

  • kwargs (Any)

embed_query(text, **kwargs)[source]

Embed query text.

Parameters:
  • text (str) – Text to convert to vector embedding

  • kwargs (Any)

Returns:

A vector embedding.

Return type:

list[float]

Generation

LLM

LLMInterface

class neo4j_graphrag.llm.LLMInterface(model_name, model_params=None, **kwargs)[source]

Interface for large language models.

Parameters:
  • model_name (str) – The name of the language model.

  • model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.

  • **kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.

abstract invoke(input)[source]

Sends a text input to the LLM and retrieves a response.

Parameters:

input (str) – Text sent to the LLM

Returns:

The response from the LLM.

Return type:

LLMResponse

Raises:

LLMGenerationError – If anything goes wrong.

abstract async ainvoke(input)[source]

Asynchronously sends a text input to the LLM and retrieves a response.

Parameters:

input (str) – Text sent to the LLM

Returns:

The response from the LLM.

Return type:

LLMResponse

Raises:

LLMGenerationError – If anything goes wrong.

OpenAILLM

class neo4j_graphrag.llm.openai_llm.OpenAILLM(model_name, model_params=None, **kwargs)[source]
Parameters:
  • model_name (str)

  • model_params (Optional[dict[str, Any]])

  • kwargs (Any)

AzureOpenAILLM

class neo4j_graphrag.llm.openai_llm.AzureOpenAILLM(model_name, model_params=None, **kwargs)[source]
Parameters:
  • model_name (str)

  • model_params (Optional[dict[str, Any]])

  • kwargs (Any)

VertexAILLM

class neo4j_graphrag.llm.vertexai_llm.VertexAILLM(model_name='gemini-1.5-flash-001', model_params=None, **kwargs)[source]

Interface for large language models on Vertex AI

Parameters:
  • model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.

  • model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.

  • **kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import VertexAILLM
from vertexai.generative_models import GenerationConfig

generation_config = GenerationConfig(temperature=0.0)
llm = VertexAILLM(
    model_name="gemini-1.5-flash-001", generation_config=generation_config
)
llm.invoke("Who is the mother of Paul Atreides?")
invoke(input)[source]

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

async ainvoke(input)[source]

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

AnthropicLLM

class neo4j_graphrag.llm.anthropic_llm.AnthropicLLM(model_name, model_params=None, **kwargs)[source]

Interface for large language models on Anthropic

Parameters:
  • model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.

  • model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.

  • **kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import AnthropicLLM

llm = AnthropicLLM(
    model_name="claude-3-opus-20240229",
    model_params={"max_tokens": 1000},
    api_key="sk...",   # can also be read from env vars
)
llm.invoke("Who is the mother of Paul Atreides?")
invoke(input)[source]

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

async ainvoke(input)[source]

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

CohereLLM

class neo4j_graphrag.llm.cohere_llm.CohereLLM(model_name='', model_params=None, **kwargs)[source]

Interface for large language models on the Cohere platform

Parameters:
  • model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.

  • model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.

  • **kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import CohereLLM

llm = CohereLLM(api_key="...")
llm.invoke("Say something")
invoke(input)[source]

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

async ainvoke(input)[source]

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.

Returns:

The response from the LLM.

Return type:

LLMResponse

MistralAILLM

class neo4j_graphrag.llm.mistralai_llm.MistralAILLM(model_name, model_params=None, **kwargs)[source]
Parameters:
  • model_name (str)

  • model_params (Optional[dict[str, Any]])

  • kwargs (Any)

get_messages(input)[source]
Parameters:

input (str)

Return type:

list[MessageType]

invoke(input)[source]

Sends a text input to the Mistral chat completion model and returns the response’s content.

Parameters:

input (str) – Text sent to the LLM

Returns:

The response from MistralAI.

Return type:

LLMResponse

Raises:

LLMGenerationError – If anything goes wrong.

async ainvoke(input)[source]

Asynchronously sends a text input to the MistralAI chat completion model and returns the response’s content.

Parameters:

input (str) – Text sent to the LLM

Returns:

The response from MistralAI.

Return type:

LLMResponse

Raises:

LLMGenerationError – If anything goes wrong.

PromptTemplate

class neo4j_graphrag.generation.prompts.PromptTemplate(template=None, expected_inputs=None)[source]

This class is used to generate a parameterized prompt. It is defined from a string (the template) using the Python format syntax (parameters between curly braces {}) and a list of required inputs. Before sending the instructions to an LLM, call the format method that will replace parameters with the provided values. If any of the expected inputs is missing, a PromptMissingInputError is raised.

Parameters:
  • template (Optional[str])

  • expected_inputs (Optional[list[str]])

DEFAULT_TEMPLATE: str = ''
EXPECTED_INPUTS: list[str] = []
format(*args, **kwargs)[source]

This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs

Example:

prompt_template = PromptTemplate(
    template='''Explain the following concept to {target_audience}:
    Concept: {concept}
    Answer:
    ''',
    expected_inputs=['target_audience', 'concept']
)
prompt = prompt_template.format('12 yo children', concept='graph database')
print(prompt)

# Result:
# '''Explain the following concept to 12 yo children:
# Concept: graph database
# Answer:
# '''
Parameters:
Return type:

str

RagTemplate

class neo4j_graphrag.generation.prompts.RagTemplate(template=None, expected_inputs=None)[source]
Parameters:
  • template (Optional[str])

  • expected_inputs (Optional[list[str]])

DEFAULT_TEMPLATE: str = 'Answer the user question using the following context\n\nContext:\n{context}\n\nExamples:\n{examples}\n\nQuestion:\n{query_text}\n\nAnswer:\n'
EXPECTED_INPUTS: list[str] = ['context', 'query_text', 'examples']
format(query_text, context, examples)[source]

This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs

Example:

prompt_template = PromptTemplate(
    template='''Explain the following concept to {target_audience}:
    Concept: {concept}
    Answer:
    ''',
    expected_inputs=['target_audience', 'concept']
)
prompt = prompt_template.format('12 yo children', concept='graph database')
print(prompt)

# Result:
# '''Explain the following concept to 12 yo children:
# Concept: graph database
# Answer:
# '''
Parameters:
  • query_text (str)

  • context (str)

  • examples (str)

Return type:

str

ERExtractionTemplate

class neo4j_graphrag.generation.prompts.ERExtractionTemplate(template=None, expected_inputs=None)[source]
Parameters:
  • template (Optional[str])

  • expected_inputs (Optional[list[str]])

DEFAULT_TEMPLATE: str = '\nYou are a top-tier algorithm designed for extracting\ninformation in structured formats to build a knowledge graph.\n\nExtract the entities (nodes) and specify their type from the following text.\nAlso extract the relationships between these nodes.\n\nReturn result as JSON using the following format:\n{{"nodes": [ {{"id": "0", "label": "Person", "properties": {{"name": "John"}} }}],\n"relationships": [{{"type": "KNOWS", "start_node_id": "0", "end_node_id": "1", "properties": {{"since": "2024-08-01"}} }}] }}\n\nUse only the following nodes and relationships (if provided):\n{schema}\n\nAssign a unique ID (string) to each node, and reuse it to define relationships.\nDo respect the source and target node types for relationship and\nthe relationship direction.\n\nDo not return any additional information other than the JSON in it.\n\nExamples:\n{examples}\n\nInput text:\n\n{text}\n'
EXPECTED_INPUTS: list[str] = ['text']
format(schema, examples, text='')[source]

This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs

Example:

prompt_template = PromptTemplate(
    template='''Explain the following concept to {target_audience}:
    Concept: {concept}
    Answer:
    ''',
    expected_inputs=['target_audience', 'concept']
)
prompt = prompt_template.format('12 yo children', concept='graph database')
print(prompt)

# Result:
# '''Explain the following concept to 12 yo children:
# Concept: graph database
# Answer:
# '''
Parameters:
Return type:

str

RAG

GraphRAG

class neo4j_graphrag.generation.graphrag.GraphRAG(retriever, llm, prompt_template=<neo4j_graphrag.generation.prompts.RagTemplate object>)[source]

Performs a GraphRAG search using a specific retriever and LLM.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm.openai_llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = VectorRetriever(driver, "vector-index-name", custom_embedder)
llm = OpenAILLM()
graph_rag = GraphRAG(retriever, llm)
graph_rag.search(query_text="Find me a book about Fremen")
Parameters:
  • retriever (Retriever) – The retriever used to find relevant context to pass to the LLM.

  • llm (LLMInterface) – The LLM used to generate the answer.

  • prompt_template (RagTemplate) – The prompt template that will be formatted with context and user question and passed to the LLM.

Raises:

RagInitializationError – If validation of the input arguments fail.

search(query_text='', examples='', retriever_config=None, return_context=None)[source]

Warning

The default value of ‘return_context’ will change from ‘False’ to ‘True’ in a future version.

This method performs a full RAG search:
  1. Retrieval: context retrieval

  2. Augmentation: prompt formatting

  3. Generation: answer generation with LLM

Parameters:
  • query_text (str) – The user question

  • examples (str) – Examples added to the LLM prompt.

  • retriever_config (Optional[dict]) – Parameters passed to the retriever search method; e.g.: top_k

  • return_context (bool) – Whether to append the retriever result to the final result (default: False)

Returns:

The LLM-generated answer

Return type:

RagResultModel

Database Interaction

neo4j_graphrag.indexes.create_vector_index(driver, name, label, embedding_property, dimensions, similarity_fn, fail_if_exists=False, neo4j_database=None)[source]

This method constructs a Cypher query and executes it to create a new vector index in Neo4j.

See Cypher manual on creating vector indexes.

Ensure that the index name provided is unique within the database context.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "vector-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_vector_index(
    driver,
    INDEX_NAME,
    label="Document",
    embedding_property="vectorProperty",
    dimensions=1536,
    similarity_fn="euclidean",
    fail_if_exists=False,
)
Parameters:
  • driver (neo4j.Driver) – Neo4j Python driver instance.

  • name (str) – The unique name of the index.

  • label (str) – The node label to be indexed.

  • embedding_property (str) – The property key of a node which contains embedding values.

  • dimensions (int) – Vector embedding dimension

  • similarity_fn (str) – case-insensitive values for the vector similarity function: euclidean or cosine.

  • fail_if_exists (bool) – If True raise an error if the index already exists. Defaults to False.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:
  • ValueError – If validation of the input arguments fail.

  • neo4j.exceptions.ClientError – If creation of vector index fails.

Return type:

None

neo4j_graphrag.indexes.create_fulltext_index(driver, name, label, node_properties, fail_if_exists=False, neo4j_database=None)[source]

This method constructs a Cypher query and executes it to create a new fulltext index in Neo4j.

See Cypher manual on creating fulltext indexes.

Ensure that the index name provided is unique within the database context.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_fulltext_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "fulltext-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_fulltext_index(
    driver,
    INDEX_NAME,
    label="Document",
    node_properties=["vectorProperty"],
    fail_if_exists=False,
)
Parameters:
  • driver (neo4j.Driver) – Neo4j Python driver instance.

  • name (str) – The unique name of the index.

  • label (str) – The node label to be indexed.

  • node_properties (list[str]) – The node properties to create the fulltext index on.

  • fail_if_exists (bool) – If True raise an error if the index already exists. Defaults to False.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:
  • ValueError – If validation of the input arguments fail.

  • neo4j.exceptions.ClientError – If creation of fulltext index fails.

Return type:

None

neo4j_graphrag.indexes.drop_index_if_exists(driver, name, neo4j_database=None)[source]

This method constructs a Cypher query and executes it to drop an index in Neo4j, if the index exists. See Cypher manual on dropping vector indexes.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import drop_index_if_exists

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "fulltext-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Dropping the index if it exists
drop_index_if_exists(
    driver,
    INDEX_NAME,
)
Parameters:
  • driver (neo4j.Driver) – Neo4j Python driver instance.

  • name (str) – The name of the index to delete.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

neo4j.exceptions.ClientError – If dropping of index fails.

Return type:

None

neo4j_graphrag.indexes.upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]

This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific node.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
upsert_vector(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)
Parameters:
  • driver (neo4j.Driver) – Neo4j Python driver instance.

  • node_id (int) – The id of the node.

  • embedding_property (str) – The name of the property to store the vector in.

  • vector (list[float]) – The vector to store.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

neo4j_graphrag.indexes.upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]

This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific relationship.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector_on_relationship

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
upsert_vector_on_relationship(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)
Parameters:
  • driver (neo4j.Driver) – Neo4j Python driver instance.

  • rel_id (int) – The id of the relationship.

  • embedding_property (str) – The name of the property to store the vector in.

  • vector (list[float]) – The vector to store.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

async neo4j_graphrag.indexes.async_upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]

This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific node.

Example:

from neo4j import AsyncGraphDatabase
from neo4j_graphrag.indexes import upsert_vector

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = AsyncGraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
async_upsert_vector(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)
Parameters:
  • driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.

  • node_id (int) – The id of the node.

  • embedding_property (str) – The name of the property to store the vector in.

  • vector (list[float]) – The vector to store.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

async neo4j_graphrag.indexes.async_upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]

This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific relationship.

Example:

from neo4j import AsyncGraphDatabase
from neo4j_graphrag.indexes import upsert_vector_on_relationship

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = AsyncGraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
async_upsert_vector_on_relationship(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)
Parameters:
  • driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.

  • rel_id (int) – The id of the relationship.

  • embedding_property (str) – The name of the property to store the vector in.

  • vector (list[float]) – The vector to store.

  • neo4j_database (Optional[str]) –

    The name of the Neo4j database. If not provided, this defaults to “neo4j” in the database (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

Errors

Neo4jGraphRagError

class neo4j_graphrag.exceptions.Neo4jGraphRagError[source]

Bases: Exception

Global exception used for the neo4j-graphrag package.

RetrieverInitializationError

class neo4j_graphrag.exceptions.RetrieverInitializationError(errors)[source]

Bases: Neo4jGraphRagError

Exception raised when initialization of a retriever fails.

Parameters:

errors (list[ErrorDetails])

SearchValidationError

class neo4j_graphrag.exceptions.SearchValidationError(errors)[source]

Bases: Neo4jGraphRagError

Exception raised for validation errors during search.

Parameters:

errors (list[ErrorDetails])

FilterValidationError

class neo4j_graphrag.exceptions.FilterValidationError[source]

Bases: Neo4jGraphRagError

Exception raised when input validation for metadata filtering fails.

EmbeddingRequiredError

class neo4j_graphrag.exceptions.EmbeddingRequiredError[source]

Bases: Neo4jGraphRagError

Exception raised when an embedding method is required but not provided.

InvalidRetrieverResultError

class neo4j_graphrag.exceptions.InvalidRetrieverResultError[source]

Bases: Neo4jGraphRagError

Exception raised when the Retriever fails to return a result.

Neo4jIndexError

class neo4j_graphrag.exceptions.Neo4jIndexError[source]

Bases: Neo4jGraphRagError

Exception raised when handling Neo4j index fails.

Neo4jInsertionError

class neo4j_graphrag.exceptions.Neo4jInsertionError[source]

Bases: Neo4jGraphRagError

Exception raised when inserting data into the Neo4j database fails.

Neo4jVersionError

class neo4j_graphrag.exceptions.Neo4jVersionError[source]

Bases: Neo4jGraphRagError

Exception raised when Neo4j version does not meet minimum requirements.

Text2CypherRetrievalError

class neo4j_graphrag.exceptions.Text2CypherRetrievalError[source]

Bases: Neo4jGraphRagError

Exception raised when text-to-cypher retrieval fails.

SchemaFetchError

class neo4j_graphrag.exceptions.SchemaFetchError[source]

Bases: Neo4jGraphRagError

Exception raised when a Neo4jSchema cannot be fetched.

RagInitializationError

class neo4j_graphrag.exceptions.RagInitializationError(errors)[source]

Bases: Neo4jGraphRagError

Parameters:

errors (list[ErrorDetails])

PromptMissingInputError

class neo4j_graphrag.exceptions.PromptMissingInputError[source]

Bases: Neo4jGraphRagError

Exception raised when a prompt required input is missing.

LLMGenerationError

class neo4j_graphrag.exceptions.LLMGenerationError[source]

Bases: Neo4jGraphRagError

Exception raised when answer generation from LLM fails.

PipelineDefinitionError

class neo4j_graphrag.experimental.pipeline.exceptions.PipelineDefinitionError[source]

Bases: Neo4jGraphRagError

Raised when the pipeline graph is invalid

PipelineMissingDependencyError

class neo4j_graphrag.experimental.pipeline.exceptions.PipelineMissingDependencyError[source]

Bases: Neo4jGraphRagError

Raised when a task is scheduled but its dependencies are not yet done

PipelineStatusUpdateError

class neo4j_graphrag.experimental.pipeline.exceptions.PipelineStatusUpdateError[source]

Bases: Neo4jGraphRagError

Raises when trying an invalid change of state (e.g. DONE => DOING)