Linghua Jin

Posted on Dec 11

I Turned My Meeting Notes Into a Self-Updating Neo4j Knowledge Graph

#neo4j #ai #python #tutorial

Every team swims in meeting notes, but almost nobody can answer simple questions like "Who was in all the budget meetings?" or "What tasks did Alex get assigned this month?" from that pile of docs.

This article walks through a CocoIndex example that turns Markdown meeting notes in Google Drive into a live Neo4j knowledge graph that updates itself whenever notes change.

The pain: meetings as dead text

Large organizations generate tens of thousands to millions of meeting notes, spread across folders, inboxes, and different tools.

Those notes evolve constantly—names are fixed, tasks move between people, decisions get revised—yet most systems treat them as static text searched by keywords at best.

Without incremental processing, trying to build a knowledge graph from this data forces you to choose between huge compute/LLM bills or an outdated graph that never reflects reality.

The idea: a live meeting graph

The CocoIndex flow in this example connects directly to Google Drive, detects only changed documents, runs LLM extraction on just those sections, and pushes updates into Neo4j with upserts instead of full rewrites.

In the graph you end up with three node types—Meeting, Person, Task—and three relationship types—ATTENDED, DECIDED, ASSIGNED_TO.

That structure is enough to power questions like:

"Which meetings did Dana attend?"
"Where was this task decided?"
"Who currently owns all tasks created in Q4?"

Architecture at a glance

The pipeline is intentionally linear and easy to reason about.

Google Drive (with change tracking)
Identify changed documents
Split each file into individual meetings
Use an LLM to extract structured data (only for changed meetings)
Collect nodes and relationships
Export to Neo4j with upsert semantics

CocoIndex's source for Drive relies on the service account and last‑modified timestamps to ensure only new or updated documents flow downstream, which keeps LLM usage and database writes under control even at enterprise scale.

Minimal setup

You need:

A local Neo4j instance (default UI at http://localhost:7474), with user neo4j and password cocoindex.
An OpenAI API key configured in your environment.
A Google Cloud service account that can read the meeting note folders in Drive.

Environment variables:

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2

The Drive root IDs can be a comma‑separated list if your meetings live in multiple folders.

Defining the incremental flow

The flow is declared once using CocoIndex's decorator, then wired to Google Drive as a source.

@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(flow_builder: cocoindex.FlowBuilder,
                             data_scope: cocoindex.DataScope) -> None:
    credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
    root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")

    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.GoogleDrive(
            service_account_credential_path=credential_path,
            root_folder_ids=root_folder_ids,
            recent_changes_poll_interval=datetime.timedelta(seconds=10),
        ),
        refresh_interval=datetime.timedelta(minutes=1),
    )

The recent_changes_poll_interval determines how often the source checks Drive for modified files, and the refresh_interval controls how frequently the whole flow runs.

Splitting each file into meetings

Many teams keep multiple meetings in a single Markdown file separated by headings.

The flow splits the content into separate "meeting" chunks using CocoIndex's transformer.

with data_scope["documents"].row() as document:
    document["meetings"] = document["content"].transform(
        cocoindex.functions.SplitBySeparators(
            separators_regex=[r"\n\n##?\ "],
            keep_separator="RIGHT",
        )
    )

Keeping the header with the right segment preserves titles, dates, and other cues that help the LLM infer meeting metadata.

Teaching the LLM what a meeting looks like

Instead of asking the model for "some JSON," the example defines explicit dataclasses that describe people, tasks, and meetings.

@dataclass
class Person:
    name: str

@dataclass
class Task:
    description: str
    assigned_to: list[Person]

@dataclass
class Meeting:
    time: datetime.date
    note: str
    organizer: Person
    participants: list[Person]
    tasks: list[Task]

CocoIndex feeds this schema to the LLM, which returns structured data that already fits the types, making it much simpler to map into a graph later.

LLM extraction with caching

Each meeting chunk goes through an extraction step that uses the dataclass as the output type.

with document["meetings"].row() as meeting:
    parsed = meeting["parsed"] = meeting["text"].transform(
        cocoindex.functions.ExtractByLlm(
            llm_spec=cocoindex.LlmSpec(
                api_type=cocoindex.LlmApiType.OPENAI,
                model="gpt-4o",
            ),
            output_type=Meeting,
        )
    )

Because this step is expensive, CocoIndex caches the result and reuses it as long as the input text, model, and schema do not change, which is critical for keeping costs down in environments with frequent edits.

Collecting nodes and relationships

Collectors are like in‑memory tables for the data you want to push to Neo4j.

meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()

meeting_key = {"note_file": document["filename"], "time": parsed["time"]}

meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
    id=cocoindex.GeneratedField.UUID,
    **meeting_key,
    person=parsed["organizer"]["name"],
    is_organizer=True,
)

Similar loops over participants and tasks populate ATTENDED edges for all attendees, DECIDED edges from meetings to tasks, and ASSIGNED_TO edges from people to tasks.

Exporting to Neo4j

Meetings become graph nodes with a clear label and primary key.

meeting_nodes.export(
    "meeting_nodes",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Nodes(label="Meeting"),
    ),
    primary_key_fields=["note_file", "time"],
)

By using the note file and meeting time as the key, edits update existing nodes rather than creating duplicates across runs.

People and tasks are declared once so relationships can refer to them consistently.

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Person",
        primary_key_fields=["name"],
    )
)

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Task",
        primary_key_fields=["description"],
    )
)

Wiring ATTENDED, DECIDED, ASSIGNED_TO

Relationships are exported with clear types that tie everything together.

attended_rels.export(
    "attended_rels",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Relationships(
            rel_type="ATTENDED",
            source=cocoindex.targets.NodeFromFields(
                label="Person",
                fields=[cocoindex.targets.TargetFieldMapping(
                    source="person", target="name"
                )],
            ),
            target=cocoindex.targets.NodeFromFields(
                label="Meeting",
                fields=[
                    cocoindex.targets.TargetFieldMapping("note_file"),
                    cocoindex.targets.TargetFieldMapping("time"),
                ],
            ),
        ),
    ),
    primary_key_fields=["id"],
)

Equivalent exports for DECIDED and ASSIGNED_TO define edges from Meeting → Task and Person → Task, with relationship IDs used to avoid duplicates when the flow re-runs.

Running the flow and exploring the graph

To build or update the graph:

pip install -e .
cocoindex update main

Then open the Neo4j browser at http://localhost:7474 and run queries like:

MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m;

MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t;

MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t;

CocoIndex only mutates nodes and relationships that actually changed, so the graph stays synced with your documents without churning the database.

Beyond meetings: a reusable pattern

This "source → detect changes → split → extract → collect → export" pattern generalizes well beyond meeting notes.

You can apply the same flow to research papers, support tickets, emails, compliance docs, or competitive intel, as long as you can describe the entities and relationships you care about.

If your org is already drowning in text, the fastest win might be: "pick one messy document type and give it a graph." Meetings just happen to be where the pain—and the payoff—are immediately obvious.

Want to try this yourself? Check out the full example on GitHub or dive into the CocoIndex docs.

DEV Community