Implement an Automated Report-Generation Agent


I’m a fan of agentic flows with LLMs. They not only enable more advanced Text2Cypher implementations but also open the door to various semantic-layer implementations. It’s an incredibly powerful and versatile approach.

In this blog post, I set out to implement a different kind of agent. Instead of the usual question-and-answer use case, this agent is designed to generate detailed reports about specific industries in a given location. The implementation leverages crewAI, a platform that empowers developers to easily orchestrate AI agents.

Three agents forming a crew

This system orchestrates three agents working in harmony to deliver a comprehensive business report:

  1. Data Researcher Agent: Specializes in gathering and analyzing industry-specific data for organizations in a given city, providing insights into company counts, public companies, combined revenue, and top-performing organizations
  2. News Analyst Agent: Focuses on extracting and summarizing the latest news about relevant companies, offering a snapshot of trends, market movements, and sentiment analysis
  3. Report Writer Agent: Synthesizes the research and news insights into a well-structured, actionable markdown report, ensuring clarity and precision without adding any unsupported information

Together, these agents form a flow for generating insightful industry reports tailored to specific locations.

The code is available on GitHub.

Dataset

We will use the companies database available on the Neo4j demo server, which includes detailed information about organizations, individuals, and even the latest news for some of these organizations. This data was fetched via Diffbot API.

Graph schema

The dataset focuses on details such as investors, board members, and related aspects, making it an excellent resource for demonstrating industry report generation.

# Neo4j connection setup
URI = "neo4j+s://demo.neo4jlabs.com"
AUTH = ("companies", "companies")
driver = GraphDatabase.driver(URI, auth=AUTH)

Next, we need to define the OpenAI key as we will be using GPT-4o throughout this blog post:

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI key: ")
llm = LLM(model='gpt-4o', temperature=0)

Knowledge Graph-Based Tools

We will begin by implementing tools that enable an agent/LLM to retrieve relevant information from the database. The first tool will focus on fetching key statistics about companies within a specific industry in a given city:

industry_options = ["Software Companies", "Professional Service Companies", "Enterprise Software Companies", "Manufacturing Companies", "Software As A Service Companies", "Computer Hardware Companies", "Media And Information Companies", "Financial Services Companies", "Artificial Intelligence Companies", "Advertising Companies"]

class GetCityInfoInput(BaseModel):
"""Input schema for MyCustomTool."""
city: str = Field(..., description="City name")
industry: str = Field(..., description=f"Industry name, available options are: {industry_options}")

class GetCityInfo(BaseTool):
name: str = "Get information about a specific city"
description: str = "You can use this tools when you want to find information about specific industry within a city."
args_schema: Type[BaseModel] = GetCityInfoInput

def _run(self, city: str, industry: str) -> str:
data, _, _ = driver.execute_query("""MATCH (c:City)<-[:IN_CITY]-(o:Organization)-[:HAS_CATEGORY]->(i:IndustryCategory)
WHERE c.name = $city AND i.name = $industry
WITH o
ORDER BY o.nbrEmployees DESC
RETURN count(o) AS organizationCount,
sum(CASE WHEN o.isPublic THEN 1 ELSE 0 END) AS publicCompanies,
sum(o.revenue) AS combinedRevenue,
collect(CASE WHEN o.nbrEmployees IS NOT NULL THEN o END)[..5] AS topFiveOrganizations""", city=city, industry=industry)
return [el.data() for el in data]

The GetCityInfo tool retrieves key statistics about companies in a specific industry within a given city. It provides information such as total number of organizations, count of public companies, combined revenue, and details about the top five organizations by number of employees. This tool could be expanded, but for our purposes I kept it simple.

The second tool can be used to fetch the latest information about a given company:

class GetNews(BaseTool):
name: str = "Get the latest news for a specific company"
description: str = "You can use this tool when you want to find the latest news about specific company"

def _run(self, company: str) -> str:
data, _, _ = driver.execute_query("""MATCH (c:Chunk)<-[:HAS_CHUNK]-(a:Article)-[:MENTIONS]->(o:Organization)
WHERE o.name = $company AND a.date IS NOT NULL
WITH c, a
ORDER BY a.date DESC
LIMIT 5
RETURN a.title AS title, a.date AS date, a.sentiment AS sentiment, collect(c.text) AS chunks""", company=company)
return [el.data() for el in data]

The GetNews tool retrieves the latest news about a specific company. It provides details such as article titles, publication dates, sentiment analysis, and key excerpts from the articles. This tool is ideal for staying updated on recent developments and market trends related to a particular organization, allowing us to generate more detailed summaries.

Agents

As mentioned, we will implement three agents. With crewAI, minimal prompt engineering is required because the platform handles the rest.

The agents are implemented as follows:

# Define Agents
class ReportAgents:
def __init__(self):
self.researcher = Agent(
role='Data Researcher',
goal='Gather comprehensive information about specific companies that are in relevant cities and industries',
backstory="""You are an expert data researcher with deep knowledge of
business ecosystems and city demographics. You excel at analyzing
complex data relationships.""",
verbose=True,
allow_delegation=False,
tools=[GetCityInfo()],
llm=llm
)

self.news_analyst = Agent(
role='News Analyst',
goal='Find and analyze recent news about relevant companies in the specified industry and city',
backstory="""You are a seasoned news analyst with expertise in
business journalism and market research. You can identify key trends
and developments from news articles.""",
verbose=True,
allow_delegation=False,
tools=[GetNews()],
llm=llm
)

self.report_writer = Agent(
role='Report Writer',
goal='Create comprehensive, well-structured reports combining the provided research and news analysis. Do not include any information that isnt explicitly provided.',
backstory="""You are a professional report writer with experience in
business intelligence and market analysis. You excel at synthesizing
information into clear, actionable insights. Do not include any information that isn't explicitly provided.""",
verbose=True,
allow_delegation=False,
llm=llm
)

In crewAI, agents are defined by specifying their role, goal, and backstory, with optional tools to enhance their capabilities. In this setup, three agents are implemented: a Data Researcher responsible for gathering detailed information about companies in specific cities and industries using the GetCityInfo tool; a News Analyst tasked with analyzing recent news about relevant companies using the GetNews tool; and a Report Writer, who synthesizes the gathered information and news into a structured, actionable report without relying on external tools. This clear definition of roles and objectives ensures effective collaboration among the agents.

Tasks

In addition to defining the agents, we also need to outline the tasks they will tackle. In this case, we’ll define three distinct tasks:

# Define Tasks
city_research_task = Task(
description=f"""Research and analyze {city_name} and its business ecosystem in {industry_name} industry:
1. Get city summary and key information
2. Find organizations in the specified industry
3. Analyze business relationships and economic indicators""",
agent=agents.researcher,
expected_output="Basic statistics about the companies in the given city and industry as well as top performers"
)

news_analysis_task = Task(
description=f"""Analyze recent news about the companies provided by the city researcher""",
agent=agents.news_analyst,
expected_output="Summarization of the latest news for the company and how it might affect the market",
context=[city_research_task]

)

report_writing_task = Task(
description=f"""Create a detailed markdown report about the
results you got from city research and news analysis tasks.
Do not include any information that isn't provided""",
agent=agents.report_writer,
expected_output="Markdown summary",
context=[city_research_task, news_analysis_task]

)

The tasks are designed to align with the agents’ capabilities. The city research task focuses on analyzing the business ecosystem of a specified city and industry, gathering key statistics and identifying top-performing organizations, handled by the Data Researcher. The news analysis task examines recent developments related to these companies, summarizing key trends and market impacts, using the output from city research and performed by the News Analyst. Finally, the report writing task synthesizes the findings from the previous tasks into a comprehensive markdown report, completed by the Report Writer.

Finally, we just have to put it all together:

# Create and run the crew
crew = Crew(
agents=[agents.researcher, agents.news_analyst, agents.report_writer],
tasks=[city_research_task, news_analysis_task, report_writing_task],
verbose=True,
process=Process.sequential,

)

Let’s test it!

city = "Seattle"
industry = "Hardware Companies"
report = generate_report(city, industry)
print(report)

The agent’s intermediate steps are too detailed to include here, but the process essentially begins by gathering key statistics for the specified industry and identifying relevant companies, followed by retrieving the latest news about those companies.

The results:

# Seattle Computer Hardware Industry Report

## Overview

The Computer Hardware Companies industry in Seattle comprises 24 organizations, including 4 public companies. The combined revenue of these companies is approximately $229.14 billion. This report highlights the top performers in this industry and recent news developments affecting them.

## Top Performers

1. **Microsoft Corporation**
- **Revenue**: $198.27 billion
- **Employees**: 221,000
- **Status**: Public Company
- **Mission**: To empower every person and organization on the planet to achieve more.

2. **Nvidia Corporation**
- **Revenue**: $26.97 billion
- **Employees**: 26,196
- **Status**: Public Company
- **Formerly Known As**: Mellanox Technologies and Cumulus Networks

3. **F5 Networks**
- **Revenue**: $2.695 billion
- **Employees**: 7,089
- **Status**: Public Company
- **Focus**: Multi-cloud cybersecurity and application delivery

4. **Quest Software**
- **Revenue**: $857.415 million
- **Employees**: 4,055
- **Status**: Public Company
- **Base**: California

5. **SonicWall**
- **Revenue**: $310 million
- **Employees**: 1,600
- **Status**: Private Company
- **Focus**: Cybersecurity

These companies significantly contribute to Seattle's economic landscape, driving growth and innovation in the hardware industry.

## Recent News and Developments

- **Microsoft Corporation**: Faces legal challenges with its Activision Blizzard acquisition, which could impact its gaming market strategy.

- **Nvidia Corporation**: Experiences strong demand for GPUs in China, highlighting its critical role in AI advancements and potentially boosting its market position.

- **F5 Networks**: Gains recognition for its cybersecurity solutions, enhancing its industry reputation.

- **Quest Software**: Launches a new data intelligence platform aimed at improving data accessibility and AI model development.

- **SonicWall**: Undergoes leadership changes and releases a threat report, emphasizing its focus on cybersecurity growth and challenges.

These developments are poised to influence market dynamics, investor perceptions, and competitive strategies within the industry.

Note that the demo dataset is outdated; we don’t import news regularly.

Summary

Building an automated report-generation pipeline using agentic flows, Neo4j, and crewAI offers a glimpse into how LLMs can move beyond simple question-and-answer interactions. By assigning specialized tasks to a suite of agents and arming them with the right tools, we can orchestrate a dynamic, exploratory workflow that pulls relevant data, processes it, and composes well-structured insights.

Through this approach, agents collaborate to uncover key statistics about an industry in a given city, gather the latest news, and synthesize everything into a polished markdown report. This goes to show that LLMs can be deployed in creative, multi-step processes, enabling more sophisticated use cases like automated business intelligence, data-driven content creation, and beyond.

The code is available on GitHub.


Implementing an Automated Report-Generation Agent was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.