DEV Community

Cover image for Build an analytics agent to analyze your Ghost blog traffic with the Vercel AI SDK and Tinybird
Cameron Archer for Tinybird

Posted on • Originally published at tinybird.co

Build an analytics agent to analyze your Ghost blog traffic with the Vercel AI SDK and Tinybird

What excites me most about the inflow of AI and LLMs into every nook and cranny of my daily life is the opportunity to point an agent at a big block of data, ask questions, and get answers. I've spent most of my career analyzing data. I like data. I like analyzing data. I used to like writing SQL. Now I don't (mostly). Now that I've tasted the sweet goodness of agentic analytics powered by tools like Vercel AI SDK and Tinybird MCP Server, I'd rather give an agent context and have it query the database for me.

However, this isn't as simple as it sounds. LLMs are surprisingly bad at writing SQL. But with Tinybird (and the Tinybird MCP Server) and the Vercel AI SDK, it's quite straightforward (and mostly prompt engineering).

In this post, I'll show you how to build an agent with sufficient contextual understanding of underlying analytics data - and the tools to query it - so that you can have a chat with your data (any data!). Specifically, I'll build a simple analytics agent for a blog - hosted on the open-source publishing platform Ghost. The agent will tell us which content is performing the best, and why.

The tools for the job:

  • Ghost (blog hosting)
  • Vercel AI SDK (Typescript agent framework)
  • Tinybird (database + analytics MCP + tools)

Blog analytics on Ghost

Ghost is an open source publishing platform, great for anybody who wants to self-host their blogging platform. You can get up and running immediately with Ghost(Pro), their hosted offering, but many publishers choose to self-host.

In either case, the recent release of Ghost 6.0 includes real-time, multi-channel analytics via Tinybird, the platform on which Ghost's analytics run. If you choose to self-host, you'll also get unfettered access to the underlying data in Tinybird. Alongside Ghost, you can use Tinybird Cloud - for managed analytics hosting - or Tinybird self-managed regions for a complete self-hosted setup.

Self-hosting Ghost + Tinybird is straightforward, and I won't cover it here. You can follow Ghost's self-hosting guides to learn how to do it.

For now, I'll assume you've already set up Ghost self-hosted with Tinybird web analytics enabled, and, as a part of that setup, you have created a Tinybird workspace (hosted or self-managed) that is receiving web traffic events from Ghost.

If you don't already have that, but want to follow along anyway, here's how to set up Tinybird for basic testing:

Basic Tinybird setup

tb login
# deploy the Ghost Web Analytics template
tb --cloud deploy --template https://github.com/TryGhost/Ghost/tree/main/ghost/core/core/server/data/tinybird
Enter fullscreen mode Exit fullscreen mode

The template includes a small fixture of sample data you can append for testing:

tb --cloud datasource append analytics_events fixtures/analytics_events.ndjson
Enter fullscreen mode Exit fullscreen mode

You can then query the data in the analytics_events data source to get a basic feel for what's in it:

tb --cloud sql 'select uniq(session_id) from analytics_events'
# Running against Tinybird Local
#   uniq(session_id)  
#             UInt64  
# ─────────────────────
#                 24 
Enter fullscreen mode Exit fullscreen mode

(Or use Tinybird's built-in analytics agent, Tinybird Code):

tb --prompt "How many sessions are there?"
# Based on the data in the analytics_events data source, I can see that you have 24 unique sessions
Enter fullscreen mode Exit fullscreen mode

In addition to the data source holding the blog traffic events, the template aslo includes several SQL-based API endpoints that provide metrics like top pages, top browsers, top sources, etc.

ls endpoints
# api_active_visitors.pipe        api_top_browsers.pipe           api_top_os.pipe
# api_kpis.pipe                   api_top_devices.pipe            api_top_pages.pipe
# api_post_visitor_counts.pipe    api_top_locations.pipe          api_top_sources.pipe
Enter fullscreen mode Exit fullscreen mode

You can test the response from any of these APIs locally:

tb endpoint url api_top_browsers --language curl
# curl -X GET "http://localhost:7181/v0/pipes/api_top_browsers.json?site_uuid=mock_site_uuid&timezone=Etc%2FUTC&skip=0&limit=50&token=p.eyJ1Ijo..."
Enter fullscreen mode Exit fullscreen mode

So that's the Tinybird setup. You have data in the database and APIs that query it. Now for the fun part: building an AI agent that can turn a natural language question into helpful insights about our blog traffic.

Creating an analytics agent with Vercel AI SDK and Tinybird MCP Server

You can quickly build a simple analytical CLI agent with the Vercel AI SDK and Tinybird MCP Server. That agent will be able to analyze the blog traffic data stored in Tinybird and answer questions using the set of tools provided by the Tinybird MCP Server.

Setting up the agent basics

We're going to make this as simple as possible: A single Node.js CLI script that will accept a user input via the terminal and print out a response, maintaining a chat history in the context window for multi-message functionality. You'll need to have a recent version of Node installed.

You can start by creating a file, cli.js (full gist here), and importing + installing the packages you'll need:

import { streamText, experimental_createMCPClient as createMCPClient } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
import readline from 'readline';
import * as dotenv from 'dotenv';

dotenv.config();
Enter fullscreen mode Exit fullscreen mode

Here's what we're importing:

  • streamText function from the AI SDK to enabling streaming output from the LLM
  • experimental_createmCPClient from the AI SDK to create a client for the Tinybird MCP Server
  • StreamableHTTPClientTransport support from the MCP project, since the Tinybird MCP Server uses StreamableHTTP transport
  • readline to enable the CLI interface
  • dotenv to handle environment variables

Important! Defining a system prompt

The most important task when creating an analytics agent (or any agent) is prompt engineering. The tooling around agent development is so solid now that 90% of the work is writing a good prompt.

Here's the simple system prompt I gave my agent.

const SYSTEM_PROMPT = `You are a Ghost blog analytics assistant. Your job is the answer the user's questions to help them understand the performance of their Ghost blog, and to provide recommendations about how to improve their blog performance. Use available Tinybird tools to answer questions about page views, visitors, and site data. When querying data, use a default time range of the last 24 hours unless the user specific requests a different time range. Be concise in your responses. Today's date is ${new Date().toLocaleDateString()}.`
Enter fullscreen mode Exit fullscreen mode

This works, but there's plenty of room for improvement. Anthropic's prompt engineering docs are a great reference, specifically for Anthropic models but also for LLMs in general.

Set up the CLI interface

We're using readline to enable a simple CLI chat interface:

const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const chatHistory = [];
Enter fullscreen mode Exit fullscreen mode

We're also maintaining the chat history so the agent has full context. Keep in mind this is a naive approach with zero summarization or compaction. There are plenty of resources out their on how to minimize token usage as chat history grows, including retaining only recent messages and summarizing old messages (with another agent), using a vector DB + RAG approach to recall important context, etc. For now, we'll just maintain the full history and eat the tokens :).

Creating an MCP client for the Tinybird MCP Server

Next, we need to instantiate an MCP Client to connect to the Tinybird MCP Server:

let mcpClient = null;

async function initializeMCP() {
    // Check to make sure the Tinybird token is available
    if (!process.env.TINYBIRD_TOKEN) {
        console.error('TINYBIRD_TOKEN environment variable not found');
        console.log('Get your token with `tb info ls` and `tb token copy`');
        process.exit(1);
    }

    try {
        // Tinybird remote MCP server URL with token for scoped permissions
        const url = new URL(`https://mcp.tinybird.co?token=${process.env.TINYBIRD_TOKEN}`);

        // Create the client with a unique session ID
        mcpClient = await createMCPClient({
            transport: new StreamableHTTPClientTransport(url, {
                sessionId: `ghost-cli-${Date.now()}`,
            }),
        });

        console.log('✅ Connected to Tinybird MCP server');

        // Fetch and list the available tools from the MCP Server
        const tools = await mcpClient.tools();
        console.log('\nAvailable tools:');
        Object.keys(tools).forEach(tool => {
            console.log(`  • ${tool}`);
        });
        return tools;

    } catch (error) {
        console.error('❌ Failed to connect to Tinybird MCP server:', error.message);
        process.exit(1);
    }
}
Enter fullscreen mode Exit fullscreen mode

A couple notes about this:

  • The Tinybird MCP Server requires a Tinybird token for authentication. The token limits the scope of the MCP Server so that it can only access data allowed by the permissions granted that token. This can be really useful when you need to add security policies to your agentic interfaces, and you can even use custom JWTs with row-level policies in multi-tenant environments, for example, where specific users may need to only see data that match their user_id in a shared data source.
  • The Tinybird MCP Server includes a list of core tools, like explore_data, text_to_sql, and list_datasources. In addition, every published API endpoint in the Tinybird workspace is enabled as a tool. This can be very useful for agents when you want more deterministic results for certain questions. For example, since I have a api_top_browsers API endpoint in my Ghost project, I would an agent to respond to a question like What are the top 3 browsers used to access my blog in the last 7 days? by making a tool call to that API, rather than attempting its own SQL generation. This allows for much more efficient and deterministic analysis for common query patterns. Every Tinybird API Endpoint includes a plaintext description, which can assist the agent with tool discovery and selection. Write good descriptions!

More info about the Tinybird MCP Server, available tools, token scopes, and observability can be found in the Tinybird MCP Server docs.

Creating the question/answer CLI interface

Now let's create this CLI chat interface! We'll start with a bit of housekeeping: If the user types exit we should exit the chat:

async function askQuestion(question, mcpTools) {
    if (question.toLowerCase().trim() === 'exit') {
        console.log('👋 Goodbye!');
        process.exit(0);
    }
Enter fullscreen mode Exit fullscreen mode

(we will also add Ctrl+C handling in the main CLI function).

Then, we add the question to the chatHistory and use the streamText AI SDK function to get a response:

    try {
        chatHistory.push({ role: 'user', content: question });

        const result = await streamText({
            model: anthropic("claude-3-7-sonnet-20250219"),
            messages: chatHistory,
            maxSteps: 5,
            tools: { ...mcpTools },
            system: SYSTEM_PROMPT,
        });

        let fullResponse = '';
        console.log('\n💭 Thinking...');
Enter fullscreen mode Exit fullscreen mode

The streamText output is a text stream (result) broken up into pieces, that we'll call delta. These deltas can have several different types - like text, tool call, tool result, etc. - that we should handle for a graceful chat output:

        for await (const delta of result.fullStream) {
            try {
                switch (delta.type) {
                    case 'tool-call':
                        if (delta.toolName) {
                            console.log(`\n🔧 ${delta.toolName}`);
                            if (delta.args && Object.keys(delta.args).length > 0) {
                                console.log(`   args: ${JSON.stringify(delta.args)}`);
                            }
                        }
                        break;

                    case 'tool-result':
                        break;

                    case 'text-delta':
                        if (delta.textDelta) {
                            process.stdout.write(delta.textDelta);
                            fullResponse += delta.textDelta;
                        }
                        break;

                    case 'finish':
                        console.log('\n');
                        break;

                    default:
                        break;
                }
            } catch (err) {
                console.error(`⚠️  Error: ${err.message}`);
            }
        }

        chatHistory.push({ role: 'assistant', content: fullResponse });

        console.log('\n' + ''.repeat(50));

    } catch (error) {
        console.error('\n❌ Error:', error.message);
    }
}
Enter fullscreen mode Exit fullscreen mode

We're doing some very basic handling here. If the delta is just text, we write the text to stdout. If it's a tool call, we show which tool is being called and any arguments passed to the tool. If it's a tool result, we show nothing. Why? Because tool results are a bit of JSON gobbldygook and require some finesse to parse and look pretty. To me, that felt like a waste of time for this demo. But feel free to play around with parsing the delta.result object yourself :).

Note when the agent finishes we push its response to the chat history to maintain the full context.

Starting the CLI interface

Now we just need a main function to start the CLI, initialize the MCP server, and open the input for questions:

async function startCLI() {
    console.log('👻 Ghost Blog Analytics CLI Agent');
    console.log('Type "exit" to quit.\n');
    console.log('Connecting to Tinybird MCP Server...\n');

    const tbTools = await initializeMCP();

    const prompt = () => {
        rl.question('\n💭 Question: ', async (input) => {
            if (input.trim()) {
                await askQuestion(input.trim(), tbTools);
            }
            console.log('');
            prompt();
        });
    };

    prompt();
}

// Handle Ctrl+C
process.on('SIGINT', () => {
    console.log('\n👋 Goodbye!');
    process.exit(0);
});

startCLI().catch(console.error);
Enter fullscreen mode Exit fullscreen mode

The self-calling prompt arrow function creates a continuous prompt loop until a user types "exit" or Ctrl+C. At each interation, the input awaits the result of the askQuestion function, displaying the agent output and starting all over again.

Let's chat with our blog data

Running the agent is as simple as:

node cli.js
Enter fullscreen mode Exit fullscreen mode

Of course, running an analytics agent on a blog pageviews events table containing 31 rows is kind of boring. In theory you could use tb --cloud mock command to generate some mock data, but this may not properly populate the nested JSON in the payload column in such a way that the API endpoints we already have return meaninful data. No problem! I vibe coded a simple data generator that you can use to generate some "realistic" Ghost blog traffic in a way that matches the default Ghost data schema.

Just run the following in your terminal:

node generate-blog-data.js 100000 # outputs an ndjson with 100k rows of "realistic" blog traffic data
tb --cloud datasource append analytics_events name_of_the_generated_ndjson_file.ndjson
Enter fullscreen mode Exit fullscreen mode

You can verify the data was appended:

tb --cloud sql 'select count() from analytics_events'
# Running against Tinybird Cloud: Workspace ghost
#   count()  
#    UInt64  
# ───────────
#    100031  
Enter fullscreen mode Exit fullscreen mode

With some more data to analyze, you can fire up the agent! Here's an example of the agent helping me identify the top performing blog post and making content recommendations:

The output of the analytics agent when asked about top performing blogs

Conclusion

I've showed you how to create a simple analytics agent using the Vercel AI SDK and the Tinybird MCP Server. In this specific example, we used it to analyze blog traffic data from our Ghost blog, but the concept here can easily be applied to any use case or any kind of data. The only thing that changes is the data you have stored in Tinybird, the token you use to authenticate to the Tinybird MCP Server, and the system prompt. Everything else stays the same.

To get started with Tinybird, sign up for a free account. The Tinybird MCP Server is enabled for all workspaces by default, so you can begin testing this agentic workflow for free.

Top comments (0)