Neo4j Live: Generating Graph Data from Unstructured Data with BAML

In the fast-evolving world of AI and data engineering, making unstructured data usable is one of the biggest challenges. At a recent Neo4j Live session, Vaibhav Gupta, co-founder and CEO of Boundary, joined us to showcase BAML, a powerful open-source framework designed to make LLM outputs structured, reliable and graph-ready.

You can watch the full session above.

In this post, we’ll explore how BAML works, its impact on LLM-powered graph generation and how you can build more innovative and trustworthy applications with Neo4j and BAML.

The Challenge: Making LLMs Reliable for Structured Data

Large Language Models (LLMs) are great at generating free-form text. But when it comes to:

Extracting structured data (like JSON, graphs, or SQL)
Handling multimodal inputs (images, PDFs, text)
Preventing hallucinations and enforcing schema consistency

…they often fall short.

Most production teams struggle with two problems:

Unpredictable outputs: LLMs don’t naturally produce well-formed, structured data without significant prompt engineering.
Fragile pipelines: Even minor formatting errors can break downstream systems.

Here, BAML wants to bridge the gap between LLM flexibility and software engineering rigour. BAML is a domain-specific language (DSL) and framework that adds reliability to LLM interactions. Instead of manually parsing unpredictable LLM outputs, you define strict data models in BAML. LLMs are then constrained to produce outputs that match these models, automatically validating and error-correcting results as needed.

Live Demo: Structured Graphs from Any Data

During the livestream, Vaibhav demonstrated how BAML can transform unstructured inputs, like images, log files, visa forms and even spontaneous webcam pictures, into structured graph data, ready for ingestion into Neo4j.

How it works:

Input: Unstructured data (text, image, audio, PDF, etc.)
Zero-shot or few-shot prompt: Ask the LLM to generate a BAML schema describing the data.
Schema Execution: BAML enforces structure, type checks, and validates the LLM output.
Output: Reliable structured data, like a graph schema, ready for database insertion.

In seconds, Vaibhav showed examples like:

Parsing system logs into navigable graph structures
Extracting entities from visa forms
Building a movie-actor graph directly from unstructured lists

And impressively, all of this was done without any complex manual prompting.

Why BAML Matters for Graph Builders

BAML is a big deal for anyone building AI-powered applications with graphs because it:

✅ Guarantees structure: You define the schema; BAML enforces it. No more messy post-processing.
✅ Handles multimodal inputs: Text, images, audio, PDFs, they all can be transformed into graph structures.
✅ Reduces hallucinations: By combining LLM flexibility with strict validation, you minimise bad data.
✅ Simplifies code: Instead of brittle prompt chains, you define clear schemas and workflows.
✅ Supports hybrid reasoning: Marry LLM creativity with deterministic algorithms for powerful systems.

From Unstructured Data to Cypher Queries

Vaibhav went further by showing how BAML can generate graph data formatted for Cytoscape.js and transform it into Cypher statements ready for Neo4j.

He demonstrated:

Generating nodes (movies, actors) and relationships (e.g., “ACTED_IN”) from LLM outputs
Automatically enforcing data types and preventing structural errors
Pushing the structured data directly into a Neo4j database using simple pipelines

This makes BAML a natural fit for anyone working on GenAI, knowledge graphs, or Retrieval-Augmented Generation (RAG) projects.

Beyond Generation: Smart Pipelines and Error Handling

One of the biggest hidden challenges with LLM-based pipelines is error recovery: what happens when the LLM gets it wrong?

Vaibhav explained best practices, including:

Automatic retries: If a generated query fails, capture the error, modify the input, and retry intelligently.
Context pruning: Keep conversations and states clean by pruning irrelevant or outdated messages.
Soft constraints: Use validation inside BAML to gently nudge models without over-restricting flexibility.
Hard caps: Prevent infinite loops by setting maximum retries or fallback strategies.

These engineering patterns ensure that LLM-powered agents behave reliably, crucial when building real-world applications.

Key Takeaways for Developers

If you’re exploring how to generate structured graph data from unstructured inputs, here are the main insights from the session:

Structure matters. Defining clear data schemas, whether with BAML or another tool, can dramatically reduce LLM errors and make downstream processing much easier.
Combine flexibility with validation. Let LLMs generate creative outputs, but use validation steps to enforce structure, maintain reliability, and catch issues early.
Work with multiple data types. Modern workflows can and should handle inputs beyond plain text, including images, PDFs and logs, to unlock new use cases.
Expect to iterate. No pipeline will be perfect on the first try. Building intelligent retries, error handling, and context management into your LLM systems makes them much more resilient.
Think beyond raw outputs. Structured outputs like graphs are more useful, queryable and scalable than flat text responses, especially for knowledge-driven applications.