Join Us on Nov 6 for 24 Hours of Live Sessions at NODES 2025 | Register Today

Neo4j logo

From Unstructured Docs to Graph Intelligence: A Framework for Building Graph+Embedding Pipelines

Session Track: Knowledge Graphs

Session Time:

Session description

Enterprise data lives in documents (PDFs, contracts, emails, product manuals) but extracting actionable insights from them remains a major challenge. In this session, Satej Sahu will introduce a practical, modular framework that transforms unstructured and semi-structured documents into a hybrid graph and embedding representation, enabling LLM-based reasoning and GraphRAG applications. The framework—D2GEP (Document-to-Graph-and-Embeddings Pipeline)—parses raw text into knowledge graph triples, embeds both textual content and graph structure, and stores them for hybrid retrieval in systems like Neo4j and vector databases. Using open-source tools and sample data (e.g. legal texts, scientific publications, or customer-service transcripts), I will demonstrate how to: - Parse and chunk documents using LangChain or spaCy - Extract entities and relationships into Cypher-friendly graph triples using LLMs - Generate node- and passage-level embeddings with models like OpenAI or SentenceTransformers - Store structured data in Neo4j and unstructured embeddings in Pinecone or FAISS - Enable natural language querying via GraphRAG — combining vector similarity and Cypher queries This session will walk through an end-to-end, reproducible pipeline, with reusable code, a template schema, and prompt engineering examples for extracting domain-specific knowledge.

Speaker

photo of Satej Sahu

Satej Sahu

Principal Data Engineer, Zalando SE

Satej Sahu works as principal data engineer at Zalando SE and has more than 14 years of experience in the industry. He has worked with renowned organizations such as Boeing, Adidas, and Honeywell, specializing in architecture, big data, and machine learning use cases. With a strong track record of architecting scalable and efficient systems, Satej has successfully delivered data-driven and ML applied solutions. He's also the author of two programming books: "Building Secure PHP Applications" and "PHP 8 Basics: For Programming and Web Development" (Apress/Springer) and has another book in pipeline.