All Data and AI Weekly
( AI, Data, NiFi, Iceberg, Polaris, Streamlit, Flink, Kafka, Python, Java, SQL, MCP, LLM, RAG, Cortex AI, AISQL, Search, Unstructured Data )
#207: 15 Sept 2025
https://bsky.app/profile/paasdev.bsky.social
NiFi + AI + AI Data Cloud + Iceberg.
https://www.reddit.com/r/DataEngineeringForAI/hot/
Monthly NYC and Youtube Events
Code and Open Source Projects
AWS New York Summit
https://github.com/tspannhw/conferences/tree/main/2025/awsny
Hex + Snowflake Hackathon
https://github.com/tspannhw/hackathons/tree/main/2025-07-15
Apache NiFi + AI Agents + Cortex AI + Snowflake AISQL
https://github.com/tspannhw/TrafficAI/tree/main/Agents
https://github.com/tspannhw/transit-ridership
https://github.com/tspannhw/conferences
https://github.com/tspannhw/hackathons/tree/main/2025-07-15
Greetings,
This edition is a special one, as we're highlighting the fantastic Community Over Code 2025 event where I had the opportunity to present three talks. It was a great chance to connect with the open-source community and share insights on a variety of topics, from Apache NiFi to real-time data optimization.
Below, you'll find a recap of the talks, along with other key updates from the world of data engineering and AI.
Community Over Code 2025 - A Deep Dive
I was thrilled to present three talks at COC25. For anyone who missed them or wants to revisit the material, you can find the slide decks and related resources below:
- NiFi Man: We're Here, But Should We Have Come? This talk explored the practical considerations and real-world implications of deploying Apache NiFi. We went beyond the "how-to" and delved into the "why" and "when" of using this powerful dataflow tool.
- Utilizing Real-Time Transit Data for Travel Optimization This session demonstrated how to leverage real-time transit data streams to build intelligent travel optimization solutions. We discussed the architecture, data processing, and benefits of such a system.
- Enhancing Apache NiFi 2.x with Python Processors For the more technical audience, this talk showcased how to extend Apache NiFi's functionality using custom Python processors. It's a great way to integrate specialized logic and libraries directly into your dataflows.
All the code and materials for these presentations can be found in my public GitHub repository for the conference:
Other Industry and Partner Updates
Here is a quick look at other noteworthy developments and releases from the past week:
- Apache Iceberg: A new article from The New Stack aims to dispel common myths about the complexity of open-source frameworks like Apache Iceberg. Additionally, the Snowflake Engineering blog released a post detailing the new features and fixes in Apache Iceberg 1.1.0.
- Snowflake: Snowflake has announced the general availability of Workspaces, a feature designed to enhance collaboration and organization. We also saw some great articles on using Snowflake Cortex Agents via a REST API and the upgrade of the open-source MCP Server for Snowflake.
Weekly Agent Update
Spotlight: CA Open Data AI Agent
Project Link: https://medium.com/@gabriel.mullen/ca-open-data-ai-agent-d09b10d09e32
Summary: This week, we're taking a look at the California Open Data AI Agent. Built in just 60 minutes using Snowflake, this agent demonstrates how to create a real-time Retrieval-Augmented Generation (RAG) workflow over live government data without setting up new servers. It showcases the power of agentic AI in synthesizing answers from thousands of datasets with clear citations.
Key Takeaway: The project highlights the practicality and speed of deploying production-ready, serverless agent solutions for real-world data challenges.
Framework & Tool of the Week: Agentscope
GitHub Link: https://github.com/agentscope-ai/agentscope
Summary: Agentscope is an agent-oriented programming library that makes it easier to build LLM applications. It's designed to be "developer-centric" with features like asynchronous execution, parallel tool calls, and real-time steering. It offers a transparent approach where prompt engineering and API invocation are fully visible and controllable.
Why it's important: Agentscope, along with its related libraries like agentscope-runtime and agentscope-studio, provides a comprehensive toolkit for not only developing but also deploying and visualizing agent-based applications.
Technical Deep Dive: Snowflake Cortex Agents API
Article Link: https://medium.com/@masato.takada/%EF%B8%8F-snowflake-cortex-agents-a-rest-api-guide-49b3a754ef92
Summary: The Snowflake Cortex Agent is a powerful AI data assistant that automates complex data workflows. This guide explains how to use its REST API to build applications that can orchestrate across both structured (using Cortex Analyst) and unstructured (using Cortex Search) data. It's designed to be secure, with existing Snowflake security controls applying automatically.
Key Concepts:
- Planning: The agent analyzes a request and creates a comprehensive plan.
- Tool Use: It selects the right tools (Cortex Analyst for SQL, Cortex Search for text).
- Reflection: It evaluates results and refines its approach.
Model Watch: Google VaultGemma-1B
Hugging Face Link: https://huggingface.co/google/vaultgemma-1b
Summary: VaultGemma is a variant of the Gemma family of open models from Google, but with a key difference: it's pre-trained from the ground up using Differential Privacy (DP). This provides strong, mathematically-backed privacy guarantees for its training data, making it a great choice for applications where data privacy is a critical concern.
Note: While it may have a utility trade-off compared to non-private models, its primary benefit is providing privacy by design, making it a significant step forward in private AI.
Videos & Webinars 🎥
- Building Cortex Agents On Snowflake: Why It Matters And Best Practices: Building Cortex Agents On Snowflake: Why It Matters And Best Practices
Thanks
https://github.com/timothyspann
© 2020-2025 Tim Spann https://www.youtube.com/@FLaNK-Stack
Top comments (0)