DEV Community

Cover image for All Data and AI Weekly #207: 15 Sept 2025
Timothy Spann
Timothy Spann

Posted on

All Data and AI Weekly #207: 15 Sept 2025

All Data and AI Weekly

( AI, Data, NiFi, Iceberg, Polaris, Streamlit, Flink, Kafka, Python, Java, SQL, MCP, LLM, RAG, Cortex AI, AISQL, Search, Unstructured Data )

#207: 15 Sept 2025

https://bsky.app/profile/paasdev.bsky.social

NiFi + AI + AI Data Cloud + Iceberg.
b

https://www.reddit.com/r/DataEngineeringForAI/hot/

Monthly NYC and Youtube Events

https://lu.ma/PINSAI

image

Code and Open Source Projects

AWS New York Summit
https://github.com/tspannhw/conferences/tree/main/2025/awsny

Hex + Snowflake Hackathon
https://github.com/tspannhw/hackathons/tree/main/2025-07-15

Apache NiFi + AI Agents + Cortex AI + Snowflake AISQL

https://github.com/tspannhw/TrafficAI/tree/main/Agents

https://github.com/tspannhw/transit-ridership

https://github.com/tspannhw/conferences

https://github.com/tspannhw/hackathons/tree/main/2025-07-15

Greetings,

This edition is a special one, as we're highlighting the fantastic Community Over Code 2025 event where I had the opportunity to present three talks. It was a great chance to connect with the open-source community and share insights on a variety of topics, from Apache NiFi to real-time data optimization.

Below, you'll find a recap of the talks, along with other key updates from the world of data engineering and AI.

image

Community Over Code 2025 - A Deep Dive

I was thrilled to present three talks at COC25. For anyone who missed them or wants to revisit the material, you can find the slide decks and related resources below:

  • NiFi Man: We're Here, But Should We Have Come? This talk explored the practical considerations and real-world implications of deploying Apache NiFi. We went beyond the "how-to" and delved into the "why" and "when" of using this powerful dataflow tool.
  • Utilizing Real-Time Transit Data for Travel Optimization This session demonstrated how to leverage real-time transit data streams to build intelligent travel optimization solutions. We discussed the architecture, data processing, and benefits of such a system.
  • Enhancing Apache NiFi 2.x with Python Processors For the more technical audience, this talk showcased how to extend Apache NiFi's functionality using custom Python processors. It's a great way to integrate specialized logic and libraries directly into your dataflows.

All the code and materials for these presentations can be found in my public GitHub repository for the conference:

Other Industry and Partner Updates

Here is a quick look at other noteworthy developments and releases from the past week:

Weekly Agent Update

Spotlight: CA Open Data AI Agent

Project Link: https://medium.com/@gabriel.mullen/ca-open-data-ai-agent-d09b10d09e32

Summary: This week, we're taking a look at the California Open Data AI Agent. Built in just 60 minutes using Snowflake, this agent demonstrates how to create a real-time Retrieval-Augmented Generation (RAG) workflow over live government data without setting up new servers. It showcases the power of agentic AI in synthesizing answers from thousands of datasets with clear citations.

Key Takeaway: The project highlights the practicality and speed of deploying production-ready, serverless agent solutions for real-world data challenges.

Framework & Tool of the Week: Agentscope

GitHub Link: https://github.com/agentscope-ai/agentscope

Summary: Agentscope is an agent-oriented programming library that makes it easier to build LLM applications. It's designed to be "developer-centric" with features like asynchronous execution, parallel tool calls, and real-time steering. It offers a transparent approach where prompt engineering and API invocation are fully visible and controllable.

Why it's important: Agentscope, along with its related libraries like agentscope-runtime and agentscope-studio, provides a comprehensive toolkit for not only developing but also deploying and visualizing agent-based applications.

Technical Deep Dive: Snowflake Cortex Agents API

Article Link: https://medium.com/@masato.takada/%EF%B8%8F-snowflake-cortex-agents-a-rest-api-guide-49b3a754ef92

Summary: The Snowflake Cortex Agent is a powerful AI data assistant that automates complex data workflows. This guide explains how to use its REST API to build applications that can orchestrate across both structured (using Cortex Analyst) and unstructured (using Cortex Search) data. It's designed to be secure, with existing Snowflake security controls applying automatically.

Key Concepts:

  • Planning: The agent analyzes a request and creates a comprehensive plan.
  • Tool Use: It selects the right tools (Cortex Analyst for SQL, Cortex Search for text).
  • Reflection: It evaluates results and refines its approach.

Model Watch: Google VaultGemma-1B

Hugging Face Link: https://huggingface.co/google/vaultgemma-1b

Summary: VaultGemma is a variant of the Gemma family of open models from Google, but with a key difference: it's pre-trained from the ground up using Differential Privacy (DP). This provides strong, mathematically-backed privacy guarantees for its training data, making it a great choice for applications where data privacy is a critical concern.

Note: While it may have a utility trade-off compared to non-private models, its primary benefit is providing privacy by design, making it a significant step forward in private AI.

Videos & Webinars 🎥

Thanks

https://sessionize.com/tspann

https://github.com/timothyspann

© 2020-2025 Tim Spann https://www.youtube.com/@FLaNK-Stack

Top comments (0)