DEV Community: Benson King'ori

Resurrecting Google Reader for the modern web using Kiro

Benson King'ori — Sat, 20 Dec 2025 14:13:09 +0000

TLDR;

Google Reader solved content tracking for the open web, but died when RSS could not keep up with SPAs.
Watcher resurrects the Google Reader experience for the modern web.
Instead of relying on RSS, Watcher creates RSS by monitoring live webpages.
Users define “haunts” using natural language; AI generates selectors and structure.
The UI intentionally mirrors Google Reader’s three-column layout and power-user workflow.
AI + scraping + RSS unlocks a new class of user-controlled web monitoring tools.

Introduction
Kiro and its features
- Spec driven development
- Vibe coding
- Agent hooks
- Steering docs
- MCP
Google Reader
Watcher
- How we built it
- Similarities Between Watcher and Google Reader
- Differences Between Watcher and Google Reader
- Lessons learnt
Conclusion

Introduction

I was honored to participate in the Kiroween Hackathon on Devpost, an event that challenged participants to build ambitious projects using Kiro, AWS’s newly released AI-native IDE. The hackathon encouraged not just technical execution, but creative re-thinking across four themes: resurrecting dead technologies, stitching together unlikely systems, building flexible foundations, or delivering unforgettable interfaces.

For my submission, I chose Resurrection.

Over a decade ago, Google Reader quietly disappeared from the web. Its shutdown marked more than the loss of a product, it signaled a shift away from user-controlled, open content consumption toward algorithmically curated feeds. Yet the problem Reader solved never went away. In fact, it became harder: the modern web moved to dynamic, JavaScript-heavy applications that no longer expose RSS at all.

This project explores a simple question: What would Google Reader look like if it were rebuilt for today’s web?

The result is Watcher, a system that haunts modern websites, detects meaningful change, and resurrects the RSS model using AI, scraping, and a deliberately nostalgic interface. Kiro made this possible.

Kiro and its features

Kiro is an IDE that AWS released this year. It has several cool features such as:

Spec driven development

Spec-driven development in Kiro places formal specifications at the center of the workflow. Instead of writing code first and documenting later, developers define structured specs that describe intent, constraints, and expected behavior. These specs are then used by the IDE and its agents to guide implementation, validation, and refactoring. This approach reduces ambiguity, improves alignment between stakeholders, and creates a durable source of truth that evolves alongside the codebase. For AI-assisted development, specs act as guardrails, ensuring that generated code remains consistent with the system’s design goals.

To use this mode, I first wrote the PRD (Product Requirements Document) by hand and added it as a spec in Kiro. Kiro then used it to generate the requirements, design and tasks file. I then executed the tasks one by one and checked to ensure that the code generated met my standards and creative vision.

Vibe coding

Vibe coding is Kiro’s term for an exploratory, conversational style of development where developers work at the level of intent rather than syntax. Instead of issuing narrowly scoped prompts, developers express what they are trying to achieve,architecturally or experientially, and allow the IDE to propose implementations that fit the broader context of the project. This mode is particularly effective in early-stage prototyping, where requirements are fluid and rapid iteration is essential. Vibe coding prioritizes flow and momentum while still grounding outputs in the project’s specifications and constraints.

I used vie coding to debug and refine the UI components to my needs. It proved useful in understanding the code generated as well as the cause of errors.

Agent hooks

Agent hooks allow developers to attach AI agents to specific lifecycle events such as file changes, test failures, or deployment steps. These agents can observe state, reason about deltas, and take targeted actions,ranging from suggesting fixes to generating artifacts or alerts. Rather than operating as a monolithic assistant, Kiro’s agents are modular and event-driven, which makes them predictable and composable. This model mirrors how modern systems are built: loosely coupled components reacting to well-defined signals.

I created agent hooks for security, performance and unit testing goals. These ensured that I had the basics covered as I continued to iteratively develop my project.

Steering docs

Steering documents in Kiro are lightweight, high-leverage artifacts that encode architectural principles, design philosophies, and non-functional requirements. They serve as long-lived guidance for both humans and AI agents, shaping decisions without prescribing implementation details. In practice, steering docs help maintain coherence as a project grows, especially when multiple contributors or agents are involved. They are particularly valuable in AI-assisted environments, where consistent direction is necessary to avoid fragmentation and unintended complexity.

I used steering docs to set guardrails for the design and set up. I wanted to try and mimic Google Reader’s UI and functionality as much as possible and this came in handy.

MCP

The Model Context Protocol (MCP) provides a standardized way to supply structured context,such as schemas, APIs, domain models, and external tools,to AI agents. By formalizing how context is shared, MCP reduces hallucinations and increases the reliability of agent outputs. It enables agents to operate with a clear understanding of the system’s boundaries and available capabilities, making them more effective collaborators rather than generic text generators. MCP is a critical enabler for building production-grade, AI-native developer workflows.

Google Reader

Google Reader was a web-based RSS and Atom feed aggregator launched by Google in 2005. At its core, it allowed users to subscribe to content feeds,blogs, news sites, academic journals, forums,and consume updates in a single, unified interface. Rather than visiting dozens of websites individually, users could rely on Google Reader to surface new content as it was published, ordered chronologically and optimized for rapid scanning. Its minimal, text-first interface emphasized efficiency over distraction, enabling power users to process large volumes of information quickly.

Google Reader was important because it embodied an open, decentralized model of the web. It rewarded publishers who exposed structured feeds and gave users direct control over how and where they consumed information, independent of proprietary algorithms. For researchers, journalists, developers, and analysts, it became an indispensable tool for monitoring changes across many sources. It also pioneered interaction patterns,such as keyboard shortcuts, starring, tagging, and sharing,that influenced later content consumption tools.

Despite its loyal user base, Google shut down Reader in 2013, citing declining usage and a strategic shift toward fewer, more focused products. In practice, its closure reflected a broader industry transition away from open syndication toward algorithmically curated social feeds. While platforms like Twitter and Facebook offered scale and engagement, they replaced user intent with opaque ranking systems. The shutdown left a lasting gap for users who valued transparency, control, and signal over noise,a gap that many modern tools, including Watcher, aim to address.

Watcher

Google Reader was one of the most beloved tools on the web: simple, fast, and incredibly efficient at keeping people updated. But as the web shifted to SPAs and dynamic content, most of it without RSS, Reader’s death left a real gap.

Watcher was born from the idea: What if we resurrected Google Reader, but upgraded it to haunt the modern web? Meaning:

It can watch any page, including SPAs
It understands natural language
It detects meaningful changes
And it exposes everything again as RSS, just like the old days
That mix of nostalgia + modern constraints was the spark.

Watcher resurrects the Google Reader experience for the modern web.

You can view the deployed website here. Use the below credentials:

Email:

demo@watcher.local

Password:

demo123

It lets users:

Define a haunt by giving a URL + natural language description like: “Tell me when the admissions page says applications are open for 2026.”
Behind the scenes, an LLM generates selectors, keys, and normalization rules.
A headless browser (Playwright) scrapes the target on a schedule.
Watcher tracks key/value state diffs, not raw HTML, and generates structured change events.
Each haunt produces an RSS feed.
The UI is a faithful rebirth of the 3-column Google Reader layout, complete with folders, unread counts, stars, refresh, and keyboard shortcuts.

In short: Watcher turns any webpage, even SPAs, into a live RSS source.

How we built it

Watcher is built as a Django-based system with:

Spec-driven functional requirements covering scraping, diffing, RSS construction, and a Reader-style UI.
Playwright for SPA rendering and key extraction.
Celery for periodic haunting and change detection.
A fully modeled haunt configuration, derived via LLM from natural language.
Structured state tracking, storing only key/value diffs and summaries.
RSS feed generation for both private and public haunts.
A Google Reader–inspired front-end, implemented to feel as close as possible to the original.

Kiro powered the development loop, particularly around specs, architecture constraints, steering for UI generation, and consistency between backend and frontend layers.

You can find the code base on github

Similarities Between Watcher and Google Reader

Feed-Oriented Information Consumption Both Watcher and Google Reader organize information in a feed-like format that lets users see updates from multiple sources in a unified view.
RSS Integration and Support Both systems can work with RSS sources: Google Reader was built around RSS/Atom feed aggregation, while Watcher supports adding RSS cybersecurity sources into its monitoring.
Three-Panel Interface and Navigation Watcher’s interface intentionally draws on the three-panel layout that was characteristic of Google Reader, navigation pane, feed list, and content view.
Unread/Read Tracking Both platforms include mechanisms to mark items as read or unread, enabling users to track what they have and have not seen.
Keyboard Shortcuts and Power User Features Google Reader popularized keyboard shortcuts (J/K/M/S) and Watcher includes similar navigation controls inspired by Reader.
Subscription Model for Content Google Reader let users subscribe to feeds; Watcher lets users subscribe to monitoring configurations (“haunts”) and view updates similarly.

Differences Between Watcher and Google Reader

Dimension	Google Reader	Watcher
Primary Purpose	General-purpose RSS/Atom feed aggregator for web content and news.	Website change monitoring and alerting with AI-assisted context.
Core Functionality	Aggregates syndicated feeds and surfaces updates for reading.	Continuously monitors pages (including SPAs) and detects meaningful changes.
AI Integration	None; designed as a human-driven feed reader.	Uses AI to interpret change relevance and generate selectors from natural language.
Update Detection Mechanism	Pulls standardized feed entries as published by websites.	Uses headless browsers (e.g., Playwright) to detect changes beyond RSS.
Notification Types	In-app unread counts and keyword search; limited alerts.	Email alerts and structured summaries when defined conditions trigger.
User Interaction Model	Users subscribe to feeds and consume published entries.	Users define what to monitor (“haunts”); the system proactively watches for changes.
Social Features	Experimental sharing features (later removed).	Public haunts and subscriptions to other users’ monitoring configurations.
Scope of Content	Limited to content explicitly exposed via RSS/Atom.	Can monitor arbitrary webpages, including dynamic and JavaScript-rendered content.
Historical Status	Discontinued in 2013.	Actively developed and deployable.

Lessons learnt

LLMs work exceptionally well when guided by tight specs + steering documents.
The web’s move to SPAs made RSS impossible, but not undetectable.
State diffs matter more than raw HTML when building meaningful alerts.
Nostalgia is a powerful design force, porting old UX patterns into modern stacks teaches discipline.
Combining AI + scraping + RSS can create genuinely new value.

Conclusion

Watcher began as an exercise in nostalgia, but it ended as a statement about the modern web. While RSS disappeared not because it was flawed, but because the web outgrew it, the underlying need, to know when something meaningfully changes, never went away.

By combining AI-driven interpretation, structured state diffing, and headless browser scraping, Watcher turns even the most dynamic SPA into a first-class, queryable feed. In doing so, it restores user intent, transparency, and control, values that defined tools like Google Reader but are largely absent today.

Kiro proved to be more than an IDE in this process. Its emphasis on specs, steering documents, and agent-driven workflows enabled a level of architectural consistency that would have been difficult to maintain in an AI-assisted build. Rather than fighting the model, the system was shaped by constraints.

The broader lesson is this: AI does not replace structure, it amplifies it. When paired with clear specs, thoughtful design, and a respect for proven UX patterns, it enables entirely new classes of systems.

Watcher is one such system. A resurrection, not of a product, but of an idea: that the web should work for its users, not the other way around.

AWS Cloud Resume Challenge - my attempt

Benson King'ori — Tue, 02 Dec 2025 10:25:13 +0000

TLDR;

I built and deployed an AWS Cloud Resume Challenge project: a secure, static resume site with a live visitor counter.
Frontend: HTML/CSS/JS hosted in S3 and served globally via CloudFront with HTTPS and a private S3 origin (OAI).
Backend: API Gateway → Lambda → DynamoDB to increment and return the visitor count, then display it on the page.
IaC + CI/CD: Provisioned resources with AWS SAM (CloudFormation under the hood) and automated deployments/testing with GitHub Actions.
Production-minded extras: Added CloudWatch logs, metrics, alarms, a dashboard, and SNS email notifications for monitoring and alerting.

Introduction
The Problem
My Design
Implementation
Issues encountered
Lessons learnt
Future Work
Conclusion

Introduction

The Cloud resume challenge is a great introduction to cloud concepts and gives you a detailed specification of what to build. Even for experienced builders, this challenge offers a refresher on the basic principles of the cloud such as databases and serverless architecture. You can find out more information about the aws version of the challenge here.

Personally, I decided to attempt the challenge for three main reasons:

I am AWS Certified twice, which gives me a head start. You see, the first step of the challenge is certification and the last step is a blog post.
I already have AWS credits that I could utilize. I got these as part of the benefits of being an AWS Cloud Builder. This was also how I got the vouchers to get certified in the first place. If you’re interested to learn more, read up about the program and how to join the upcoming cohort please have a look at the guide here
I wanted to learn new technologies that I had not interacted with before such as SAM (Serverless Application Model).

The Problem

The Problem Statement of the Cloud Resume Challenge, if I was to use my own words, is to host your resume on the cloud, keep a record of the number of visitors on the exposed page and expose this number on your Front end. It also encourages use of Infrastructure as Code and CICD Principles. For the AWS Challenge, the specification include the following:

Certification
HTML
CSS
Static Website
HTTPS
DNS (no custom domain yet – using CloudFront URL)
JavaScript
Database (DynamoDB)
API (API Gateway + Lambda)
Python
Tests
Infrastructure as Code (AWS SAM)
Source Control (GitHub)
CI/CD (Backend)
CI/CD (Frontend)
Blog Post

Spoiler alert: I did not complete all of the steps, but I will expound on that later in this blog post.

My Design

I made a simple design that included:

The infrastructure defined on a SAM template
The Frontend consisting of the HTML, CSS and JS hosted in an s3 bucket as needed, as well as cloudfront
The Backend consisting of my lambda, database and API Gateway

I also extended the design to add:

cloudwatch metrics and alerts
SNS topic to send me emails of the cloudwatch alerts

I decided not to implement the custom domain name using Route 53 because of the costs. So far everything else would cost me nothing because of my cloud credits. However, the credits do not cover the purchase of a domain.

Implementation

Regardless of the order of the specs, I needed to find a way to iteratively build that works for me. I started with the Frontend and tested it locally first. I then deployed all the resources I needed (for the Frontend) using the SAM template. I then implemented the lambda, API Gateway and Dynamodb for the backend. I tested the lambda locally using unit tests and the API Gateway (deployed) using integration tests. Only then did I add monitoring via cloudwatch logs, alerts and a dashboard. I then added an SNS topic to receive the alerts from cloudwatch.

As part of the extension, I added dark mode on the front end and a button to download my resume as a pdf. Initially, this was just using the browsers print to pdf functionality but the resume was turning out as two pages. Eventually, I just made the button to open up to a pdf version hosted online which one can download.

Security

Security is a major issue to consider when designing any system. I addressed it in the following ways:

I prevented direct access to the site using the s3 bucket link. I implemented Origin Access Identity (OAI) so that only CloudFront is allowed to fetch objects from the bucket
Cloufront redirects HTTP to HTTPS. This ensures that all data that is in transit is encrypted.
I implemented least privilege permissions in IAM. For example, Lambda can only make calls to the dynamodb and cloudwatch metrics

Costs

What typically stays free / very low-cost for this project

Within normal personal-portfolio traffic, the costs tend to be near-zero because:

S3 storage is tiny (a few files)
CloudFront has a generous free tier (and static content is cheap)
Lambda free tier includes lots of requests and compute time
DynamoDB on-demand for a single item updated occasionally is tiny
CloudWatch basic metrics are included; small log volume is cheap

Where cost risks can appear

Even “free-tier-friendly” setups can cost money if something spikes:
1. CloudWatch Logs

If your Lambda logs a lot (especially per request), logs can grow.
Log ingestion and retention may incur costs.
Mitigation: set log retention (e.g., 7–14 days) and avoid noisy logs.

2. API Gateway

It charges per request.
A traffic spike or bot traffic can increase costs.
Mitigation: rate limiting (usage plans), WAF, or CloudFront in front of API (advanced).

3. Lambda invocations

Still cheap, but if your API is hammered, invocations increase.
Mitigation: caching, bot protection, or reducing how often the browser calls /count.

4. CloudFront data transfer

If you ever serve large files or heavy traffic, bandwidth is usually the cost driver.
Mitigation: caching, compression, keep assets small.

CloudFront caching reduces origin traffic for static assets, which keeps performance high and costs low. This is because after the first request, many users are served from CloudFront edge locations instead of pulling from S3 every time.

The visitor counter remains dynamic and triggers Lambda; we could reduce those calls by caching the API response or only calling it once per session.

Issues encountered

Permissions & Least Privilege (IAM)

One of the first issues I ran into was related to IAM permissions. I started with a strict least-privilege policy for the Lambda function: it could only update a single DynamoDB table. That worked fine until I expanded the project to include custom CloudWatch metrics (for things like page views).
At that point, my integration tests began failing with HTTP 500 responses from the API. Unit tests still passed because they used mocked AWS services, but the deployed Lambda in AWS was failing at runtime. The root cause was that the Lambda role didn’t have permission to publish metrics to CloudWatch (cloudwatch:PutMetricData). Adding that permission fixed the 500s.

Testing Strategy (Unit + Integration)

Testing was another area where the implementation forced me to think more clearly about what I was validating.
Unit tests ran fully offline using mocks (e.g., moto), allowing me to test the Lambda logic quickly and repeatedly.

Integration tests hit the live API Gateway endpoint and verified that DynamoDB was actually updating. This was useful because it caught problems unit tests could never detect, such as missing permissions, incorrect region configuration, and miswired resources.

To keep CI/CD efficient and reduce noise, I configured the pipeline so tests run only when relevant code changes. For example, backend tests trigger on changes in backend folders or infrastructure templates, while deployment and integration tests only run on the main branch. That approach keeps pull requests fast while still protecting the main branch with “real” end-to-end validation.

CI/CD Failures Due to DynamoDB Region / AWS Region Configuration

A particularly annoying CI issue came from region configuration. The Lambda and DynamoDB code relied on boto3’s default region discovery. Locally, I had a region configured, so everything seemed fine. But in CI/CD, boto3 sometimes didn’t resolve a region the way I expected, which caused failures like NoRegionError when the code tried to talk to DynamoDB.

The fix was to be explicit: set the region consistently via environment variables and ensure boto3 clients/resources use it. It was a good lesson in writing cloud code that behaves the same in three environments: local development, GitHub Actions, and AWS Lambda. When something works locally but fails in CI, it’s often because local credentials or config is hiding assumptions.

SNS Subscription Emails Going to Spam

After setting up monitoring alerts, I added an SNS topic to notify my email address. The infrastructure deployed fine, but I initially missed the subscription confirmation email because it landed in my email’s spam folder. Since SNS won’t send alerts until the subscription is confirmed, this can silently break alerting.
Once I confirmed the subscription and moved the email out of spam, notifications started working.

Lessons learnt

SAM templates still end up as standard CloudFormation resources. This means that you can inspect Cloudformation in the aws console when debugging
Permissions and infrastructure management, just like code, is iterative.
Observability and monitoring are a part of the non-functional requirements, they can help catch errors that tests won’t
Cloudfront caching can help reduce costs and enhance performance for static assets
Operational details still matter e.g., if a user does not confirm an sns subscription, it doesn’t matter how correct the infrastructure is.

Future Work

Enable active tracing for the Lambda + API Gateway using AWS X-ray to visualize request paths, latency, and failures end-to-end.
Attach a custom domain to CloudFront using Route 53 + ACM
Improve visitor counting: cumulative, daily, or unique visitors

Conclusion

Overall, I found the challenge fun and engaging. I hope you decide to take it. Like me, you can take it for your reasons and you can even skip parts. Don’t let the lack of a certification or anything else stop you. You can view my deployed site here and the github repository here

Building “Sentinel”: multi-agent cybersecurity news triage and publishing system on AWS

Benson King'ori — Mon, 29 Sep 2025 11:36:16 +0000

TLDR;

I built Sentinel, a serverless, agentic pipeline that turns noisy RSS feeds into actionable cybersecurity intel (dedup, triage, publish).
Deterministic first, agentic later: Step Functions → Lambdas, then flipped a feature flag to Bedrock AgentCore (Strands) without changing contracts.
Reliability by design: SQS buffering, DLQs, idempotency keys, guardrails, and graceful degradation (semantic → heuristic).
Search that scales: OpenSearch Serverless with BM25 + vectors, cached embeddings, clusters for near-dupes.
Secure & observable: Cognito + least privilege, KMS, WAF, VPC endpoints, JSON logs + X-Ray, SLOs & cost alarms.

Introduction
Problem statement
My solution
The architecture
Use of Strands
Use of Bedrock
Human in the loop
Challenges and breakthroughs
Key learnings
Future Plans
Conclusion

Introduction

Exploring AWS AI offerings has been on my TODO list for the longest time. I was particularly interested in Strands, Bedrock and Nova Act. Thus, for this AI Engineering month, I decided to take on the challenge to solve a practical problem that I have seen in my industry using these tools, while learning and exploring in the process. I recently earned the AWS Certified Solutions Architect Associate certification and also got access to Kiro so this project allowed me to play the part of a technical PM and apply my system design skills. I hope you learn something that may aid you in your work. Enjoy.

Problem statement

As my company’s CISO, I would like to develop an internal cybersecurity newsletter that collates news from different RSS feeds, filters out those relevant to my organization based on a list of keywords and shares them with fellow employees either via email or published on an internal site. I wanted to be kept abreast of the latest happenings in the industry but I want to automatically share anything that may be relevant so that’s why I expanded my requirements.

My solution

Sentinel is an AWS-native, multi-agent cybersecurity news triage and publishing system that autonomously ingests, processes, and publishes cybersecurity intelligence from RSS feeds and news sources. The system reduces analyst workload by automatically deduplicating content, extracting relevant entities, and intelligently routing items for human review or auto-publication.
For the full code, visit my repository here

The architecture

I designed a decoupled, serverless microservices architecture that scales cleanly and kept costs predictable. The core remained a set of Lambda functions behind Step Functions, with EventBridge schedules kicking off ingestion. I routed all content through a buffered pipeline: EventBridge fanned into SQS so bursts of feeds didn’t cascade into failures, and every consumer Lambda processed messages idempotently using a canonical-URL SHA-256 as the key. I attached DLQs at each hop (EventBridge, SQS consumers, Step Functions tasks) and wrote compensation paths so partial successes (e.g., stored raw content but failed dedup) re-queued safely.

I modeled storage deliberately. In DynamoDB I used a primary table for articles with GSIs for state#published_at (queues and dashboards), cluster_id#published_at (duplicates), and tags#published_at (topic browsing). I enabled TTL for short-term session memory and configured PITR for recovery. For search, I provisioned OpenSearch Serverless with a BM25 collection for keyword queries and a k-NN vector collection for semantic near-duplicate detection. I cached embeddings by content hash to avoid recomputation and cut latency.

On the agent side, I started with direct orchestration so I could validate the pipeline deterministically. Step Functions called Lambdas (FeedParser → Relevancy → Dedup → Summarize → Guardrail → Decision), and a thin “agent shim” Lambda exposed the same interface I knew I’d use later. When I was ready, I deployed my Strands-defined Ingestor and Analyst Assistant agents to Bedrock AgentCore and flipped a feature flag so Step Functions invoked the agents instead. If AgentCore became unavailable or too slow, the same flag let me fall back instantly to direct Lambda orchestration.

I treated configuration and behavior as data. I moved feed lists, keyword taxonomies, similarity thresholds, guardrail strictness, and rollout switches into SSM Parameter Store, and I read them at runtime. I kept a small but explicit flags matrix: enable_agents (direct vs AgentCore), enable_opensearch (heuristic vs semantic dedup), enable_amplify (backend-only vs full stack), enable_guardrails_strict, and enable_digest_email. This let me ship incrementally without redeploys.

For Bedrock usage, I separated concerns. I used LLM calls to score relevance to my taxonomy and extract entities (CVE IDs, threat actors, malware, vendors/products) with confidences and rationales. I generated two summaries per item (executive two-liner and analyst card) and enforced a reflection checklist so outputs consistently included who/what/impact/source. I produced embeddings for semantic search and dedup, and I versioned prompts and model IDs in SSM, logging token usage per call. Every LLM output passed a JSON-Schema validation step; failures, PII findings, or suspicious CVE formats triggered Human Escalation automatically. I also kept a small “golden set” of seeded dupes and fake CVEs to regression-test prompts and thresholds.

Security and networking were explicit. I authenticated users through Cognito user and identity pools and authorized them with group roles (Analyst/Admin) mapped to least-privilege IAM policies. I stored secrets in Secrets Manager (and encrypted everything with KMS) and placed WAF in front of Amplify/API Gateway, with usage plans and rate limits. I used Gateway VPC endpoints for S3 and DynamoDB and added interface endpoints selectively (Bedrock/OpenSearch) where the security benefit outweighed their per-hour cost. I documented a PII policy: I kept raw HTML in a restricted S3 prefix, stored normalized/redacted text separately, applied tight access controls, and retained artifacts only as long as needed.

Observability and operations were top priority. I standardized on structured JSON logs with correlation IDs flowing from EventBridge through Step Functions, Lambdas, and agent tool calls, and I enabled X-Ray tracing end-to-end. I tracked SLOs and KPIs—duplicate detection precision, auto-publish precision, p95 latency per stage, and cost per article—and I wired CloudWatch alarms to anomalies and DLQs. I added daily and monthly cost monitors, and I wrote short runbooks for common incidents (OpenSearch throttling, SES sandbox, token bursts).

I also covered reliability and data protection. I enabled DynamoDB PITR, S3 versioning, and OpenSearch snapshots, and I documented RPO/RTO targets (≤15 minutes for metadata, ≤24 hours for search if I restored from snapshots). In a degraded state, I allowed the system to bypass semantic dedup and fall back to heuristic matching so ingestion never fully stalled.

On the product and API side, I clarified contracts. I exposed clean endpoints with pagination, filters, and error schemas. For exports, I generated XLSX in a worker pattern that wrote to S3 and returned pre-signed URLs, so large batches didn’t hit Lambda memory/timeouts. In the Amplify app I added a chat UI for natural-language queries with citations, a review queue with decision traces, and threaded commentary. I hardened the NL interface against prompt injection by allow-listing data sources, stripping HTML/JS from prompts, and refusing unsafe actions.

Finally, I shipped it as code. I organized Terraform into modules with remote state and environment isolation, hashed Lambda artifacts for deterministic deploys, and used canary releases for riskier changes. I tagged everything for cost allocation and preserved a full audit trail—who approved or rejected, which prompt/version ran, and the exact tool call sequence and parameters—for a defined retention window. The result read well in a demo and behaved like production: buffered, idempotent, observable, secure, and ready to toggle between deterministic pipelines and fully agentic orchestration.

Use of Strands

I used Strands as the authoring/orchestration layer to define two agents—Ingestor Agent and Analyst Assistant Agent—including their roles, instructions, and the tool bindings to my Lambda “tools” (FeedParser, RelevancyEvaluator, DedupTool, GuardrailTool, StorageTool, HumanEscalation, Notifier, QueryKB, etc.).

Strands packages those definitions and deploys them to Bedrock AgentCore so they can run at scale with standardized tool I/O, built-in observability, and clean A2A (agent-to-agent) patterns. In short: Strands is where I declare what each agent knows and which tools it can call; AgentCore is where they run.

I was using feature flags in my deployment to allow easy rollback as well as phased deployment of the features. Here is how my agents worked before and after deployment to Agentcore:

Before AgentCore (direct orchestration): Step Functions call my Lambdas directly in a deterministic pipeline (fetch → normalize → evaluate relevance/entities → dedupe → summarize/guardrail → decide publish/review/drop). This let me validate logic, data models, and infra without introducing another moving part. The “agent shim” simply proxied to those Lambdas so the Step Functions contract never changes.
After AgentCore (agentic orchestration): I flipped a flag and Step Functions (or API Gateway for chat) invokes the Strands-defined agents on Bedrock AgentCore. The Ingestor Agent plans and chooses which Lambda tools to call (ReAct + reflection), applies guardrails, clusters duplicates, and returns a triage decision; the Analyst Assistant Agent serves NL queries from Amplify, pulling from DynamoDB/OpenSearch, posting commentary, and even coordinating with the Ingestor via A2A for duplicate context. Functionally it’s the same tools, but now the agent decides when/why to call them.

Use of Bedrock

Bedrock underpins every intelligent step:
Reasoning + tool use (Agents): Both Strands-defined agents run on Bedrock AgentCore to plan, call tools, and maintain context (ReAct + reflection).
Relevance & entity extraction: LLM calls score relevance to your topic taxonomy and extract structured entities (CVEs, actors, malware, vendors, products), emitting JSON with confidence and rationale.
Summarization with reflection: The agent (or a summarizer tool) produces an executive 2-liner and an analyst card; a reflection checklist enforces “who/what/impact/source” and validates entity formatting.
Embeddings for semantic dedup/search: Bedrock embeddings vectorize normalized content; OpenSearch Serverless k-NN handles near-duplicate detection and semantic retrieval.
Guardrails support: While PII and schema checks run in Lambda, the LLM is steered to reduce sensationalism and format errors; suspect outputs route to review.
Conversational NL queries: The Analyst Assistant uses Bedrock to interpret questions, translate to DynamoDB/OpenSearch queries, and generate cited answers (and optionally initiate exports).

Human in the loop

When the Ingestor Agent (or the direct pipeline pre-AgentCore) isn’t fully confident—e.g., borderline relevance, suspected hallucinated CVE, PII detection—it escalates to review. Those items land in the Amplify review queue where an analyst can:

open the decision trace (what tools were called and why),
approve/reject or edit tags/summaries,
leave threaded commentary (stored in DynamoDB),
and provide thumbs up/down feedback that is logged for continuous improvement.

Approved items publish immediately; rejected items are archived with rationale for future training/tuning. The Analyst Assistant Agent also helps humans explore dup clusters, ask trend questions, and post comments via Natural Language.

Challenges and breakthroughs

Bursty feeds & reliability: Initial direct triggers caused cascading failures under load. Introducing SQS between stages, DLQs, and idempotency via URL hash stabilized the pipeline.
Near-duplicate detection: Title/URL heuristics weren’t enough. Pairing Bedrock embeddings with OpenSearch k-NN and clustering solved syndicated/rewritten stories; caching by content hash cut cost/latency.
Guardrails that matter: Early LLM runs occasionally hallucinated CVEs and included stray PII. A JSON Schema validator, PII filters, and a reflection checklist reduced errors and routed edge cases to review.
Agent flipover: Moving from Step Functions → Lambdas to AgentCore risked churn. A thin agent shim and a simple feature flag delivered a zero-drama cutover (and instant fallback).
Exports at scale: XLSX generation hit Lambda limits. Switching to an async export worker that writes to S3 and returns a pre-signed URL made large reports reliable.
Cost visibility: Token use and vector storage spiked during spikes. Adding token budgets, embedding caching, and cost per article metrics made FinOps actionable.
Kiro hooks in practice: Instrumenting prompts and tool calls with Kiro hooks gave clean traceability for demos and debugging.

Key learnings

Ship boring first: A deterministic pipeline (without agents) is the best baseline for correctness, tests, and rollbacks.
Agents as an overlay: Treat agents as pluggable orchestrators over stable tools; keep I/O contracts tight and versioned.
Feature flags are product features: enable_agents, enable_opensearch, guardrail levels, and digests let you canary safely and roll back instantly.
Reliability is a graph problem: Backpressure, retries, DLQs, and idempotency must be end-to-end, not per function.
Measure what you promise: SLOs (dup precision, auto-publish precision, p95 latency, cost/article) drive better architectural choices than gut feel.
Security posture is layered: Cognito authZ + least privilege, KMS everywhere, WAF, Secrets Manager rotation, and clear PII retention policies matter in real orgs.
Search is product, not plumbing: Hybrid BM25 + vectors, synonyms, and recency boosts directly impact analyst happiness.
Small golden datasets pay off: A handful of labeled dupes, fake CVEs, and PII cases catch regressions early and keep prompts honest.

Future Plans

Enrichment & intel quality: Integrate KEV/EPSS/NVD lookups, vendor advisories, and STIX/TAXII feeds; auto-normalize vendors/products; add IOC extraction and de-dup across entities, not just articles.
Evaluation & guardrails maturity: Build a small gold dataset (true dupes, fake CVEs, PII cases) and run scheduled evals; add prompt A/B testing, drift detection for embeddings, and policy-as-code for guardrails.
Agentic depth: Introduce planning memory (per-topic context), multi-turn self-verification (“second opinion” model), and a research sub-agent to cross-source claims before auto-publish.
Human workflow & governance: Add SLAs/priority queues, multi-approver rules for high-impact stories, granular roles/permissions, and full audit export (JSONL) to S3 for compliance.
Product UX: Faceted search (tags/entities/time/source), cluster views for dup families, inline diff of similar articles, saved queries, and per-team digests; async XLSX/CSV with presets.
Search relevance: OpenSearch synonyms for vendor/product aliases, recency boosting, hybrid BM25+vector reranking, and feedback-driven learning-to-rank.
Cost & FinOps: Track cost per processed article and per published item; autoscale OpenSearch collections; cache embeddings by hash; token budgets per source; nightly right-sizing reports.
Multi-tenancy & data boundaries: Partition DynamoDB/OpenSearch by tenant (PK prefix), isolate KMS keys, and add per-tenant throttles/quotas for fairness.
Platform & delivery: Canary deploys for Lambdas/agents, blue-green for Step Functions, schema registry + contract tests for tool I/O, and one-click backfill/replay tooling.
Security posture: Secrets Manager rotation, WAF rules for bot mitigation, DLP on raw S3 prefixes with automated quarantine, SBOM/supply-chain scanning, and optional private CA for mTLS between services.
Integrations: Slack/Teams notifications with approve/reject actions, Jira/ServiceNow ticket hooks for critical items, and webhooks for downstream dashboards.

Conclusion

Sentinel proved that you can take a messy, high-volume RSS firehose and ship a reliable, secure, and explainable pipeline that analysts actually want to use. The key was sequencing: build a buffered, idempotent backbone; define clear tool contracts; then layer on agentic behavior for planning and tool use. With Strands + Bedrock AgentCore, I kept autonomy where it helps (reasoning, tool selection) and guardrails where it counts (schemas, PII checks, human review). From here, the roadmap is about depth, not breadth: richer enrichment (KEV/EPSS/NVD), stronger evaluation loops, hybrid search relevance, and governance (SLAs, multi-approver flows). The system is already production-shaped—now it’s about making it smarter, cheaper, and harder to break.

Keeping your Streamlit app awake using Selenium and Github Actions

Benson King'ori — Fri, 29 Aug 2025 12:56:15 +0000

TLDR;

Streamlit apps sleep after a period of inactivity if hosted on Streamlit Community Cloud (free tier)
To wake your app up, you need to click a button
We can use Github actions + Selenium to automate this button clicking every couple of hours

Introduction
Step 1: Create Repo
Step 2: Create Python Script
Step 3: Create Github Workflow
Step 4: Commit and Push
Step 5: Run the workflow manually
Conclusion

Introduction

Streamlit apps hosted on the Community Edition (free tier) go to sleep after some period of inactivity. This used to be 24 hours during weekdays and 72 hours on weekends but it had been cut down to 12 hours and that number might even be lower now. Given how we have used Streamlit for project demos and portfolio pages, this is quite a concern. In this article, I explore how to use Github actions to run a script every 4 hours that keep your Streamlit app from going to sleep.

Step 1: Create Repo

On your Github account, create a new repo and pull it locally in your desired folder.

Step 2: Create Python Script

In your local repository, paste the following code in main.py file:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.common.exceptions import TimeoutException
import os

# Streamlit app URL from environment variable (or default)
STREAMLIT_URL = os.environ.get("STREAMLIT_APP_URL", "https://benson-mugure-portfolio.streamlit.app/")

def main():
    options = Options()
    options.add_argument('--headless=new')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1920,1080')

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

    try:
        driver.get(STREAMLIT_URL)
        print(f"Opened {STREAMLIT_URL}")

        wait = WebDriverWait(driver, 15)
        try:
            # Look for the wake-up button
            button = wait.until(
                EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Yes, get this app back up')]"))
            )
            print("Wake-up button found. Clicking...")
            button.click()

            # After clicking, check if it disappears
            try:
                wait.until(EC.invisibility_of_element_located((By.XPATH, "//button[contains(text(),'Yes, get this app back up')]")))
                print("Button clicked and disappeared ✅ (app should be waking up)")
            except TimeoutException:
                print("Button was clicked but did NOT disappear ❌ (possible failure)")
                exit(1)

        except TimeoutException:
            # No button at all → app is assumed to be awake
            print("No wake-up button found. Assuming app is already awake ✅")

    except Exception as e:
        print(f"Unexpected error: {e}")
        exit(1)
    finally:
        driver.quit()
        print("Script finished.")

if __name__ == "__main__":
    main()

Replace the value of the STREAMLIT_URL variable with your own app’s URL as a string.

Step 3: Create Github Workflow

Also in your local repository, create the file .github/workflows/wake.yml and paste the code below:

name: Wake Streamlit App

on:
  schedule:
    - cron: "0 */4 * * *"   # every 4 hours
  workflow_dispatch:         # allow manual trigger

jobs:
  wake:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: Run Selenium script
        run: python main.py

At the base of your repository, add a requirements.txt file with the following contents:

selenium
webdriver-manager

Step 4: Commit and Push

Run the following commands to add and commit your changes:

git add .
git commit -m “add files”
git push

Needless to say, you can modify the commit message as you wish

Step 5: Run the workflow manually

Go to your repository on Github and confirm that the changes have been made, i.e., your files have been pushed. On that repo, click on the Actions tab as can be seen below:

You should be able to see the Wake Streamlit App workflow on the right, right under All Workflows. Click on the Wake Streamlit App. On the right, you should see a button that says Run Workflow. Click it. Another green button with the same label appears. Click it too. You will now see the workflow is in progress. This takes about 2 minutes. After the time has lapsed, check if your Streamlit app is awake. If not, check the logs on Github and debug.

Conclusion

And there you have it, a simple and free way to keep your Streamlit app awake (hopefully forever). You can find the full code example in this repository here. There is also an option to modify this workflow so that it makes an empty commit to the repository that the Streamlit app is deployed from in that same branch. Perhaps you can explore this option as well if you are feeling adventurous. Empty commits do not necessarily wake your app up but they do reset the clock that is counting down idle time to set your app to sleep. Let me know if you found this article helpful.

Mastering Google Apps Script: Free Automation in Google Workspace

Benson King'ori — Sun, 03 Aug 2025 13:51:59 +0000

TLDR;

Google Apps Script is a serverless service by Google that facilitates automation of workflows in Google suite
It uses Javascript and the scripts can be run using triggers
It is limited by various factors such as lack of a package manager and execution time limits
Apps Script projects can be Bound Scripts, directly linked to a specific Google file, or Standalone Scripts, existing independently in Google Drive, with each type suited for different purposes
For practical usage without the theory, you can skip right onto the case study.

Introduction
Sample Use Cases
Triggers
Core Services
Bound vs Stand Alone Scripts
Handling Environment Variables
Project Management
Limitations
Case Study
Conclusion

Introduction

I was looking for a way to automate some processes in Google Workspace and thought this was a good use case to try out n8n or Zapier. However, I took a step back and wondered if there was a free solution within the Google suite. That’s how I stumbled upon Google Apps Script and decided to explore it. In this article, I go through the main features, limitations and use cases of Google Apps Script then I demonstrate using sample code how one would go about automating a certain monthly financial process.

What is Google Apps Script?

Google Apps Script is a cloud-based scripting platform based on JavaScript that lets you automate, integrate, and extend Google Workspace applications. It's "serverless," meaning you don't need to worry about hosting or infrastructure. Google handles it all. This makes it incredibly accessible.

Sample Use Cases

The big 3 use cases for Google Apps Script are:

Automation: The most common use. Automate repetitive tasks like sending templated emails, generating reports in Sheets from data, or organizing files in Drive.
Integration: Connect different Google services. For example, automatically create a Calendar event from a Google Form submission, or log Gmail attachments into a Google Sheet.
Customization: Extend the user interface of Google Workspace. You can create custom menus, dialogs, and sidebars in Sheets, Docs, and Forms to build custom workflows for users.

The above offer endless possibilities in both business and personal areas. A few of the ones that I thought of were:

Automatically saving email attachments to a folder and alerting the user, as well as updating a spreadsheet
Making daily API calls to a weather site to see if there is a torrential rain warning or cyclone and alerting the user via an email if there is
Creating custom menu in google sheets where a single click generates a report and sends it to clients in a customized mail merge kind of way
Google form validation
Automatically add invites from a certain email address into my calendar

Triggers

Triggers are what make your scripts run automatically in response to specific events.
Types of triggers:

Simple Triggers: Easy-to-use, built-in functions like onOpen() (runs when a document is opened) and onEdit() (runs when a cell is edited).
Installable Triggers: More powerful and flexible. These can be time-driven (e.g., run a script every morning at 9 AM) or event-driven (e.g., run a script when a Google Form is submitted). In order to automate your scripts, you will need to add a new trigger from the menu bar found on the left as can be seen below.

Once you go to that page, on the bottom right, you will see a button to add a trigger. Clicking that button opens the modal below.

The trigger can be time driven or calendar driven. The time driven option gives the following categories for timers:

Specific date and time
Minutes timer
Hours timer
Day timer
Week timer
Month timer

These timers allow you to now select how often the script should run, e.g., every 5 minutes for the minutes timer or Every Monday for the week timer.

Pitfalls to Watch Out For

Time-driven triggers can fail silently if the script takes too long or errors out.
Installable triggers require authorization—if not granted properly, they won’t run.
Google may throttle or delay time-based executions under heavy load or policy violations.
Always monitor the Executions panel for logs and failures.

Core Services

These are the built-in libraries that allow your script to interact with Google services. You don't need to import anything; they are just available. However, you may need to enable them or add them to your project.
Key services include:

SpreadsheetApp: For reading, writing, and formatting data in Google Sheets.
GmailApp: For reading, searching, and sending emails.
DocumentApp: For creating and editing Google Docs.
DriveApp: For managing files and folders in Google Drive.
UrlFetchApp: For connecting to external, third-party APIs on the internet.

Bound Vs Stand Alone Scripts

Bound Scripts: These are linked directly to a specific Google Sheet, Doc, or Form. They are best for scripts that are only meant to work with that one file.
Standalone Scripts: These exist as their own independent files in Google Drive. They are better for general-purpose scripts or for building web apps and add-ons.

Deployment Considerations

Bound Scripts are easier to deploy for quick file-specific automations.
Standalone Scripts are necessary for publishing web apps, libraries, or add-ons, and for handling broader integrations across multiple services and files.

Handling Environment variables

When working with sensitive data such as API keys or tokens, never hardcode credentials directly into your code. Doing so risks exposing them—especially if your script is shared or published as a web app. Instead, use the PropertiesService to securely store and access secrets.

This approach:

Keeps your credentials separate from your code logic.
Prevents accidental leaks in version control or shared scripts.
Makes it easier to manage and rotate secrets without editing source files. ## Step 1: Store the Secret Create a separate function to set your secret. You only need to run this function once manually from the script editor to save the key.

function storeApiKey() {
  // Get the script private properties store
  const scriptProperties = PropertiesService.getScriptProperties();

  // Set a key-value pair for your secret
  scriptProperties.setProperty('MY_API_KEY', 'your-secret-api-key-goes-here');

  Logger.log("API Key has been stored securely.");
}

Step 2: Retrieve the Secret in Your Code

In your main functions, you can then retrieve the key without ever exposing it in the script itself.

function makeApiCall() {
  // Get the script properties store
  const scriptProperties = PropertiesService.getScriptProperties();

  // Retrieve the stored secret by its key
  const apiKey = scriptProperties.getProperty('MY_API_KEY');

  // Now you can use the apiKey variable in your API call
  const url = `https://api.example.com/data?key=${apiKey}`;
  const response = UrlFetchApp.fetch(url);

  Logger.log(response.getContentText());
}

This method ensures your sensitive information is kept separate from your code, which is essential for security.

Project management

One Apps script project can have multiple files which can be triggered separately but cannot have different declarations of variable names.
All script files (.gs) within a single Apps Script project are executed in one shared global scope. Think of it as Google taking all your separate files, concatenating them into one large file behind the scenes, and then running it.
This is why you can't redeclare a variable with const or let in another file—from the engine's perspective, you're trying to declare the same variable twice in the same script. This global nature is also what makes calling functions between files so seamless.

Considerations for Splitting a Project into Multiple Files

Splitting your code is purely for organization and readability. It has no effect on how the code runs. Here are a few things to consider before you do it:

Logical Separation: Group related functions into the same file. For example, have one file for all functions that interact with Google Sheets (sheets.gs), another for Gmail logic (gmail.gs), and a main file for the primary workflow (main.gs).
Configuration: Keep global constants and configuration settings (like spreadsheet IDs, email addresses, or API keys stored in Properties Service) in a dedicated file (e.g., config.gs). This makes them easy to find and update.
Maintainability: For large projects, splitting the code makes it much easier to navigate, debug, and for other people to understand. A single 1,000-line file is much harder to work with than five 200-line files with clear purposes.

One Project vs. Multiple Projects

The decision to keep code in one project or split it into different projects depends on the tasks you are automating.
Keep it in one project if:

The scripts are part of a single, cohesive workflow (e.g., reading from a Sheet, processing the data, and sending an email).
The functions in different files need to call each other or share global variables.
The entire workflow can operate under a single set of permissions (e.g., the whole project needs access to both Sheets and Gmail).

Split it into multiple projects if:

The automations are completely unrelated (e.g., one script organizes your Drive, and another sends you a daily weather report).
The automations require different security permissions. Separating them ensures one script doesn't have access to services it doesn't need (e.g., one script only needs access to a specific Sheet, while another needs access to your entire Calendar).
They run on completely independent triggers and have no logical connection. ## How Different Files Interact Because all files share the same global environment, calling a function from another file is effortless. You just call it directly by name as if it were in the same file.

Limitations

1. Cannot Decrypt Password-Protected Files

A script can see a password-protected file in Google Drive, but it cannot open or read its contents. The Apps Script environment has no built-in mechanism to supply a password to decrypt a file, which is why a more capable environment like a Python script or Google Cloud Function is required for this task.

2. Limited Native File Processing

Apps Script cannot natively parse complex file formats like PDFs or Excel spreadsheets to extract data directly. While it can convert some files to Google Workspace formats (e.g., PDF to Google Doc), it doesn't offer granular control to read the raw data or structure from the original file itself.

3. Execution Time Limits

Scripts have a maximum execution time. For most standard Google accounts (including Gmail and Google Workspace), this limit is 6 minutes per run. For long-running tasks like processing hundreds of files or spreadsheet rows, your script may time out before it can finish.

4. Service Quotas and Rate Limits

To prevent abuse, Google imposes daily quotas and rate limits on the use of its services. For example, there are limits on how many emails GmailApp can send per day, how many API calls you can make, or how many triggers can run. For large-scale automations, you can hit these limits.

5. Sandboxed Environment and No Package Manager

Apps Script runs in a secure, sandboxed environment, which means:
You cannot use standard package managers like npm to import external JavaScript libraries.
You have no direct access to the server's file system or the ability to make arbitrary network connections.

6. Simple Trigger Restrictions

Simple triggers like onOpen(e) and onEdit(e) run in a restricted mode. They cannot access any service that requires user authorization. For example, an onEdit trigger cannot send an email or create a calendar event, which is a common source of confusion for new developers.

Case Study

The original idea I wanted to automate was that of investing. Every time I get my payslip, I usually save it to a certain folder and then calculate how much of a particular stock I can buy for that month then I go ahead and place the order via email. The below step-by-step guide will show you how to automate this entire workflow. Now, let’s get coding.

If you want to just straight into the code, find the repository here.

Step 1: Initial Setup (Do this first!)

Create a New Google Sheet. Name it "My Stock Portfolio".
Inside the sheet, create two tabs: Trading and Transactions.
Go to Extensions > Apps Script to open the script editor. This will create a new Apps Script project that is bound to your spreadsheet.
Get a Free API Key from Financial Modeling Prep. You'll need this for the stock price data.
Create a Google Drive Folder where your payslips and contract notes will be saved. Right-click the folder and get its ID from the URL.

Step 2: The Code (Create these files in your Apps Script project)

In the Apps Script editor, create the following files by clicking the + icon next to "Files". Copy and paste the code for each one.
Config.gs (Configuration)

// --- CONFIGURATION FILE ---
// Store all your personal settings here.


const CONFIG = {
 // --- Email Settings ---
 MY_EMAIL: "your_email@example.com", // Your primary email address
 BROKER_EMAIL: "broker@example.com",   // Your stockbrokers email address
  // --- Payslip Email Settings ---
 PAYSLIP_SENDER: "payslips@company.com",
 PAYSLIP_SUBJECT_CONTAINS: "Your Monthly Payslip",
  // --- Contract Note Email Settings ---
 CONTRACT_NOTE_SENDER: "contracts@broker.com",
 CONTRACT_NOTE_SUBJECT_CONTAINS: "Contract Note",


 // --- Drive Folder ---
 FINANCE_FOLDER_ID: "YOUR_GOOGLE_DRIVE_FOLDER_ID",


 // --- Financial Settings ---
 MONTHLY_SALARY: 10000,
 INVESTMENT_PERCENTAGE: 0.20, // 20%
 STOCK_TICKER: "AAPL",
 CDS_ACCOUNT: "CDS123456FI00"
};

Secrets.gs (API Key Management)

// --- API KEY MANAGEMENT ---
// Use this file to securely store and retrieve your API key.


/**
* Stores the API key in PropertiesService.
* Run this function ONCE MANUALLY from the editor after pasting your key.
*/
function storeApiKey() {
 const scriptProperties = PropertiesService.getScriptProperties();
 // !!! PASTE YOUR API KEY HERE !!!
 scriptProperties.setProperty('FINANCE_API_KEY', 'YOUR_FINANCIAL_MODELING_PREP_API_KEY');
 Logger.log("API Key has been stored securely.");
}


/**
* Retrieves the stored API key.
* @returns {string} The API key.
*/
function getApiKey() {
 const scriptProperties = PropertiesService.getScriptProperties();
 return scriptProperties.getProperty('FINANCE_API_KEY');
}

Main.gs (Triggers & Menus)

// --- MAIN SCRIPT FILE ---
// Contains the main triggers and UI functions.


/**
* Creates a custom menu in the spreadsheet when its opened.
*/
function onOpen() {
 SpreadsheetApp.getUi()
   .createMenu('Stock Trading')
   .addItem('📈 Place New Trade Order', 'showTradeDialog')
   .addToUi();
}


/**
* Main function to process incoming emails.
* Set up a time-driven trigger to run this every 5-10 minutes.
*/
function processAllEmails() {
 Logger.log("--- Starting email processing cycle ---");
 processPayslipEmails();
 processContractNoteEmails();
 Logger.log("--- Finished email processing cycle ---");
}

GmailProcessing.gs (Email Handling)

// --- EMAIL PROCESSING LOGIC ---


/**
* Processes payslip emails, saves the attachment, and sends a notification.
*/
function processPayslipEmails() {
 const query = `from:"${CONFIG.PAYSLIP_SENDER}" subject:("${CONFIG.PAYSLIP_SUBJECT_CONTAINS}") is:unread`;
 Logger.log(`Searching for payslips with query: "${query}"`);


 const threads = GmailApp.search(query);
 if (threads.length === 0) return;


 for (const thread of threads) {
   const message = thread.getMessages()[0]; // Process first message
   if (message.isUnread()) {
     // 1. Save attachment
     const attachment = message.getAttachments()[0];
     if (attachment && attachment.getContentType() === 'application/pdf') {
       const folder = DriveApp.getFolderById(CONFIG.FINANCE_FOLDER_ID);
       folder.createFile(attachment.copyBlob());
       Logger.log(`Saved payslip: ${attachment.getName()}`);
     }

     // 2. Get stock price and calculate investment
     const stockData = getStockPrice(CONFIG.STOCK_TICKER);
     let investmentInfo = "Could not retrieve stock price at this time.";
     if (stockData) {
       const investmentAmount = CONFIG.MONTHLY_SALARY * CONFIG.INVESTMENT_PERCENTAGE;
       const sharesToBuy = Math.floor(investmentAmount / stockData.price);
       investmentInfo = `The current price of ${CONFIG.STOCK_TICKER} is $${stockData.price.toFixed(2)}.
With 20% of your salary ($${investmentAmount.toFixed(2)}), you could buy approximately ${sharesToBuy} shares.`;
     }


     // 3. Send notification email
     const subject = "✅ Your Payslip Has Been Processed";
     const body = `Hello,\n\nYour payslip has been saved to Google Drive.\n\n${investmentInfo}\n\nThank you.`;
     GmailApp.sendEmail(CONFIG.MY_EMAIL, subject, body);

     // 4. Mark as read and check the box in the sheet
     thread.markRead();
     updateMonthlyChecklist();
   }
 }
}


/**
* Processes contract note emails.
*/
function processContractNoteEmails() {
 const query = `from:"${CONFIG.CONTRACT_NOTE_SENDER}" subject:("${CONFIG.CONTRACT_NOTE_SUBJECT_CONTAINS}") is:unread`;
 Logger.log(`Searching for contract notes with query: "${query}"`);
  const threads = GmailApp.search(query);
 if (threads.length === 0) return;


 for (const thread of threads) {
    const message = thread.getMessages()[0];
    if (message.isUnread()) {
       const attachment = message.getAttachments()[0];
       if (attachment) {
          const folder = DriveApp.getFolderById(CONFIG.FINANCE_FOLDER_ID);
          folder.createFile(attachment.copyBlob());
          Logger.log(`Saved contract note: ${attachment.getName()}`);
       }

       // As we cannot parse the PDF, we notify the user to update the sheet.
       const subject = "📝 Action Required: Log Your Recent Trade";
       const body = `Hello,\n\nA new contract note has been saved to your Drive.\n\nPlease open your 'My Stock Portfolio' spreadsheet and log the details of this transaction in the 'Transactions' tab.\n\nThank you.`;
       GmailApp.sendEmail(CONFIG.MY_EMAIL, subject, body);

       thread.markRead();
    }
 }
}


/**
* Finds the current month/year row in the 'Trading' sheet and checks the box.
*/
function updateMonthlyChecklist() {
 const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Trading");
 const data = sheet.getDataRange().getValues();
 const now = new Date();
 const monthYear = Utilities.formatDate(now, Session.getScriptTimeZone(), "MMMM yyyy");


 for (let i = 1; i < data.length; i++) {
   if (data[i][0] === monthYear) {
     sheet.getRange(i + 1, 2).check(); // Check the box in column B
     break;
   }
 }
}

StockTrading.gs (UI & Trading Logic)

// --- STOCK TRADING UI AND LOGIC ---


/**
* Shows the custom HTML dialog for placing a trade.
*/
function showTradeDialog() {
 const html = HtmlService.createHtmlOutputFromFile('Dialog.html')
     .setWidth(400)
     .setHeight(450);
 SpreadsheetApp.getUi().showModalDialog(html, 'Place a Stock Trade Order');
}


/**
* Fetches the current stock price to populate the dialog.
* This function is called from the client-side HTML.
*/
function getLiveStockPrice() {
 return getStockPrice(CONFIG.STOCK_TICKER);
}


/**
* Processes the trade order submitted from the dialog.
* @param {object} orderDetails An object from the dialog form.
*/
function placeTradeOrder(orderDetails) {
 const { tradeDirection, quantity, price } = orderDetails;
  const subject = `Trade Order: ${tradeDirection.toUpperCase()} ${quantity} ${CONFIG.STOCK_TICKER} @ ${price}`;
 const body = `
   Hello,


   Please execute the following trade order for my account (${CONFIG.CDS_ACCOUNT}):


   ----------------------------------
   Security Name:    ${CONFIG.STOCK_TICKER}
   Trade Direction:  ${tradeDirection}
   Number of Shares: ${quantity} Shares
   Price:            Market or MUR ${price}
   Validity:         Maximum 30 days
   ----------------------------------


   Thank you.
 `;


 try {
   // Send email to the broker and BCC self.
   GmailApp.sendEmail(CONFIG.BROKER_EMAIL, subject, body, {
     bcc: CONFIG.MY_EMAIL
   });

   // Log the transaction to the 'Transactions' sheet
   const transactionsSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Transactions");
   transactionsSheet.appendRow([new Date(), CONFIG.STOCK_TICKER, tradeDirection.toUpperCase(), quantity, price, "PLACED"]);

   return "✅ Success! Your trade order has been emailed to the broker and logged.";
 } catch (e) {
   Logger.log(`Failed to send trade email: ${e}`);
   return `❌ Error: Could not send the trade order. Please check the logs.`;
 }
}

APIs.gs (External API Calls)

// --- EXTERNAL API CALLS ---


/**
* Fetches the latest stock price from Financial Modeling Prep.
* @param {string} ticker The stock symbol (e.g., "AAPL").
* @returns {object|null} An object with price and volume, or null on error.
*/
function getStockPrice(ticker) {
 const apiKey = getApiKey();
 if (!apiKey) {
   Logger.log("API Key not found. Please run storeApiKey() first.");
   return null;
 }
  const url = `https://financialmodelingprep.com/api/v3/quote-short/${ticker}?apikey=${apiKey}`;
  try {
   const response = UrlFetchApp.fetch(url, { muteHttpExceptions: true });
   const responseCode = response.getResponseCode();
   const content = response.getContentText();


   if (responseCode === 200) {
     const data = JSON.parse(content);
     if (data && data.length > 0) {
       return { price: data[0].price, volume: data[0].volume };
     }
   }
   Logger.log(`API Error: Response code ${responseCode}. Content: ${content}`);
   return null;
 } catch (e) {
   Logger.log(`Failed to fetch stock price: ${e}`);
   return null;
 }
}

Dialog.html (Custom UI)

<!DOCTYPE html>
<html>
 <head>
   <base target="_top">
   <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
   <style>
     body { padding: 20px; font-family: sans-serif; }
     .loader {
       border: 4px solid #f3f3f3;
       border-radius: 50%;
       border-top: 4px solid #3498db;
       width: 20px;
       height: 20px;
       animation: spin 2s linear infinite;
       display: inline-block;
     }
     @keyframes spin {
       0% { transform: rotate(0deg); }
       100% { transform: rotate(360deg); }
     }
     #status { margin-top: 15px; font-weight: bold; }
   </style>
 </head>
 <body>
   <h4>Place Trade Order</h4>
   <p>Place a buy or sell order for <strong>AAPL</strong>.</p>

   <form id="tradeForm">
     <div class="form-group">
       <label for="tradeDirection">Action</label>
       <select class="form-control" id="tradeDirection" name="tradeDirection">
         <option value="Buy">Buy</option>
         <option value="Sell">Sell</option>
       </select>
     </div>


     <div class="form-group">
       <label for="quantity">Quantity (Number of Shares)</label>
       <input type="number" class="form-control" id="quantity" name="quantity" required>
     </div>


     <div class="form-group">
       <label for="price">Price (USD)</label>
       <div class="input-group">
         <input type="number" step="0.01" class="form-control" id="price" name="price" required>
         <div class="input-group-append">
           <button class="btn btn-outline-secondary" type="button" id="fetchPriceBtn">Get Live Price</button>
         </div>
       </div>
       <small id="priceLoader" style="display:none;">Fetching... <div class="loader"></div></small>
     </div>


     <button type="submit" class="btn btn-primary btn-block" id="submitBtn">Place Order</button>
   </form>


   <div id="status" class="alert" role="alert" style="display:none;"></div>


   <script>
     // Fetch live price when button is clicked
     document.getElementById("fetchPriceBtn").addEventListener("click", () => {
       document.getElementById("priceLoader").style.display = "block";
       google.script.run
         .withSuccessHandler(priceData => {
           document.getElementById("priceLoader").style.display = "none";
           if (priceData) {
             document.getElementById("price").value = priceData.price.toFixed(2);
           } else {
             alert("Could not fetch live price.");
           }
         })
         .getLiveStockPrice();
     });


     // Handle form submission
     document.getElementById("tradeForm").addEventListener("submit", function(e) {
       e.preventDefault();
       document.getElementById("submitBtn").disabled = true;
       document.getElementById("status").style.display = "block";
       document.getElementById("status").className = "alert alert-info";
       document.getElementById("status").innerText = "Placing order...";


       google.script.run
         .withSuccessHandler(response => {
           document.getElementById("status").innerText = response;
           if (response.includes("Success")) {
             document.getElementById("status").className = "alert alert-success";
             setTimeout(google.script.host.close, 3000); // Close dialog on success
           } else {
             document.getElementById("status").className = "alert alert-danger";
             document.getElementById("submitBtn").disabled = false;
           }
         })
         .placeTradeOrder(this);
     });
   </script>
 </body>
</html>

Step 3: How to Set It All Up

Update Config.gs: Fill in all your personal details in the Config.gs file.
Store Your API Key: In Secrets.gs, paste your API key from Financial Modeling Prep. Then, from the script editor, select the storeApiKey function from the dropdown menu and click Run. You only need to do this once.
Set Up Triggers:

In the script editor, go to the Triggers tab (clock icon).
Click + Add Trigger.
Choose function to run: processAllEmails.
Select event source: Time-driven.
Select type: Minutes timer.
Select interval: Every 10 minutes.
Click Save. You will be asked to authorize the script.

Prepare Your Trading Sheet: In the Trading tab of your spreadsheet, set up two columns:

Column A: Month (e.g., "August 2025", "September 2025")
Column B: Payslip Received (Format this column as checkboxes via Insert > Checkbox)

Prepare Your Transactions Sheet: In the Transactions tab, create these headers:Date, Ticker, Type, Quantity, Price, Status
Reload the Spreadsheet: Refresh your Google Sheet. You should now see a new "Stock Trading" menu at the top.

Step 4: Testing

Scenario 1: The Automated Payslip Workflow

This demonstrates the script's ability to react to incoming emails, save attachments, perform API calls, and update you.

How to Test / Stimulate It:

The script is looking for a new, unread email that matches the criteria in your Config.gs file.

If you receive your payslip in outlook like I do and are wondering how to set this up, create an outlook rule to always forward the email with your payslip to your personal gmail account.

To test this, you need to simulate receiving a payslip:

Important: For testing, temporarily change the PAYSLIP_SENDER in your Config.gs file to your own email address (e.g., const PAYSLIP_SENDER = "your_email@example.com";).
From that same email address, send a new email to yourself.
Subject Line: The subject must contain the phrase "Your Monthly Payslip".
Attachment: Attach any PDF file to the email.
Send the email. Once it arrives in your inbox, make sure it is marked as unread.

What the Script Does (The Demo):

Once you've sent the email, you can either wait for the 10-minute trigger to fire or manually run the processAllEmails function from the script editor to see the results immediately.
The script will find your unread "payslip" email.
It will save the PDF attachment to the Google Drive folder you specified.
It will make an API call to get the latest price for AAPL.
It will calculate how many shares you can buy with 20% of your $10,000 salary.
It will find the current month in your "Trading" sheet and check the box in the "Payslip Received" column.
Finally, it will mark the payslip email as read.

What You Receive / See:

You'll get a new email with the subject "✅ Your Payslip Has Been Processed" containing the stock price information.
The PDF attachment will appear in your designated Google Drive folder.
The checkbox for the current month in your "Trading" sheet will be checked.

Scenario 2: The Manual Stock Trading Workflow

This demonstrates the custom user interface you built into the spreadsheet for placing trades.

How to Test / Stimulate It:

This workflow is initiated manually by you.

Open your "My Stock Portfolio" Google Sheet.
A new menu item named "Stock Trading" should appear at the top.

Click on Stock Trading > 📈 Place New Trade Order.

What the Script Does (The Demo):

A custom dialog box titled "Place a Stock Trade Order" will appear.
You can click the "Get Live Price" button to have the script fetch and populate the current AAPL stock price.
Fill out the form: choose Buy or Sell, and enter a Quantity.
Click the "Place Order" button.
The script will compose an email with all the trade details and send it to your broker's email address.
It will BCC you on that email.
It will add a new row to your "Transactions" sheet to log that the order has been placed.

What You Receive / See:

A confirmation message will appear in the dialog box.

You will receive a BCC'd copy of the order email in your inbox.

A new row will be added to the "Transactions" sheet.

Scenario 3: The Automated Contract Note Workflow

This demonstrates how the script handles incoming trade confirmations.

How to Test / Stimulate It:

Similar to the payslip, you need to simulate receiving a contract note:

Change the CONTRACT_NOTE_SENDER in your Config.gs file to your own email address for the test.
Send a new email to yourself.
Subject Line: The subject must contain the phrase "Contract Note".
Attachment: Attach any PDF file.
Send the email and ensure it's unread.

What the Script Does (The Demo):

When the processAllEmails function runs, it will:

Find your unread "Contract Note" email.
Save the PDF attachment to your Google Drive folder.
Because the script cannot read the PDF's contents, it will send you a notification email.
It will mark the contract note email as read.

What You Receive / See:

You'll get a new email with the subject "📝 Action Required: Log Your Recent Trade", prompting you to manually update your "Transactions" sheet with the final details from the PDF.

The contract note PDF will be saved in your Google Drive folder.

Step 5: GitHub Integration with clasp

*Feel free to skip this step if you are not technical.
clasp is a command-line tool that lets you manage your Apps Script projects locally and push/pull them to/from GitHub.

Install Node.js: If you don't have it, install Node.js from nodejs.org.

Install clasp: Open your terminal or command prompt and run:

npm install -g @google/clasp

Login to Google:

clasp login

This will open a browser window for you to authorize clasp.
Enable the Apps Script API: Go to the Apps Script API page and turn it on.
Clone Your Project:
In your Apps Script editor, go to Project Settings (gear icon) and copy the Script ID.
In your terminal, navigate to your desired folder (e.g., cd Documents/GitHub) and run:

clasp clone "YOUR_SCRIPT_ID_HERE"

This will download all your .gs and .html files into a new folder.
Work with GitHub: You can now treat this folder as a standard Git repository.

cd your-project-name
git init
git add .
git commit -m "Initial commit of finance automation script"
# Add your remote and push to GitHub

Pushing Changes Back to Apps Script: After making changes locally, just run clasp push.
This setup provides a powerful, automated workflow for managing your finances, all orchestrated from within your Google Workspace.

Step 6 (Bonus): Visualization using Looker Studio

You can also take your stock tracking to the next level by building a portfolio dashboard using Looker Studio, with the “My Stock Portfolio” Google Sheet as the data source. This dashboard can display key metrics such as total value bought and sold over time, monthly performance, and even stock-specific trends. By connecting your sheet directly to Looker Studio and visualizing your data through bar charts, line graphs, or scorecards, you gain a real-time, interactive view of your portfolio’s evolution. It’s a great way to stay informed and make data-driven investment decisions.

The Apps script, however, cannot automate the making of the charts and so you will need to add, format and align them manually.

Conclusion

And there you have it! A simple project based introduction to Google Apps Script.

We’ve covered how to set up triggers, interact with Gmail, parse attachments, and store secrets securely—while also touching on important limitations. The biggest takeaway? You don’t need external tools to start automating tasks right inside your Google Workspace—Apps Script gives you a surprisingly powerful head start.
I am curious what you choose to automate first. Let me know in the comments. Also, let me know if I should deploy the automated stock ordering custom menu as a google sheet add on. It’s definitely a time saver for me.

I will get to n8n and Zapier in due time but for now, Google Apps Script serves me well. Till next time, have fun.

Frisque – Using AI agents for Due Diligence

Benson King'ori — Mon, 23 Jun 2025 17:40:20 +0000

TLDR;

Frisque uses django, ai agents, celery and rabbitmq to automate due diligence on startups. It takes text, pitch decks, financial spreadsheets and even video as input and outputs an investment memo. It was built for the Agent Development Kit Hackathon with Google Cloud on DevPost.

Introduction
The Problem
Our Solution
Architecture and Stack
Agentic AI – Orchestration in Action
Challenges and Revelations
Key Learnings
Future Plans
DYI
Conclusion

Introduction

In the dynamic world of Venture Capital (VC), conducting due diligence on potential startup investments is a critical yet cumbersome bottleneck. This process became painfully familiar through firsthand experience interning in VC, revealing the sheer volume of information, meticulous cross-referencing, and relentless pressure to identify both opportunity and risk within tight timeframes. Frisque was born directly from this experience, aiming to automate and augment these very tasks and challenges.

VC firms inherently dedicate significant time and resources to due diligence, as it is crucial for assessing a startup's viability and growth potential. Given that a fund's returns often originate from a small percentage of its investments, streamlining this process is absolutely crucial for success. Frisque aims to go beyond mere efficiency, enabling VCs to make smarter, faster, and more informed investment decisions.

The Problem

Having both spent time interning in the dynamic world of Venture Capital, we quickly became painfully familiar with a critical, yet cumbersome, bottleneck: due diligence. The sheer volume of information, the meticulous cross-referencing, and the relentless pressure to identify both opportunity and risk within a tight timeframe becomes incredibly apparent in such roles. Frisque was born out of this firsthand experience, directly addressing the very tasks and challenges we faced, aiming to automate and augment the work we were doing.

Venture Capital (VC) firms dedicate a significant amount of time and resources to conducting due diligence on potential startup investments. This process is critical for assessing a startup's viability and growth potential, but it is incredibly time-intensive. Given that a fund's returns often come from a small percentage of its investments (pareto principle), streamlining this process is absolutely crucial for success. It's about more than just efficiency; it's about making smarter, faster, and more informed investment decisions.

Our Solution

Frisque, aptly named from the French "Faux Risque" (false risk), is an AI-powered platform built to revolutionize the VC due diligence process by significantly reducing the time and effort VCs spend on initial assessments while dramatically improving the depth and breadth of insights.
At its core, Frisque is a web-based platform built on Django, uniquely leveraging Google's open-source Agent Development Kit (ADK) to create a sophisticated multi-agent AI system. This approach means Frisque isn't just one large AI, but a coordinated team of specialized AI "agents" working together, mirroring a human due diligence team.
Here's how Frisque's agentic system streamlines the process:

Comprehensive Input Collection: Analysts can initiate "scans" on target companies. They provide a wide array of inputs, including company names, website URLs, business plans, pitch decks, lean canvases, founder profiles, social media links, and even financial documents (like spreadsheets) and government registration documents. The system also allows users to select which specific types of scans they want to perform, such as Tech, Legal, or Financial analysis.
Intelligent Agent Orchestration: Once a scan is initiated, a Master Bot, or Orchestrator Agent, takes charge. It intelligently delegates specific sub-tasks to a team of specialized worker agents. This multi-agent by design approach is a core strength of Google's ADK, enabling complex coordination and task delegation.

Specialized Agents in Action:
The Tech Bot assesses a startup's technology stack, scalability, and potentially intellectual property.
The Legal Bot sifts through provided legal documents to identify basic red flags or critical phrases in contracts and registrations.
The Market Research Bot gathers crucial data on market size, industry trends, and competitor landscapes.
The Social Media Sentiment Bot analyzes public sentiment around the company and its founders from various social media profiles.
The Financial Bot performs basic analysis of financial statements, capable of detecting anomalies or inconsistencies in the data.

These agents utilize Large Language Models (LLMs), Natural Language Processing (NLP) tools, and can integrate with external APIs or custom tools as needed. A key learning was that ADK's inherent agency allows bots to choose their own tools, meaning we didn't need to explicitly direct them, which streamlined our development. We also adopted a "Pipeline / Assembly line architecture" or "Dumb Worker, Smart Master" pattern, where complex logic is handled by the master agent and a dedicated worker agent formats responses, effectively solving issues like prompt leakage and hallucination we initially encountered. This approach reinforces the benefits of a microservices design over a monolithic one for scalability and isolation.

Comprehensive Output Generation: The system synthesizes the findings from all the specialized agents into a comprehensive and actionable suite of outputs. This includes a structured Investment Memo (a go/no-go document), a summary Dashboard with key findings and scores, and if financial data is provided, basic financial projections. It also provides a valuable list of assumptions made, key questions to ask the startup, and all cited sources.
Real-time Updates and Notifications: To keep analysts informed throughout the process, Frisque provides real-time updates on scan progress directly on the results page using Django Channels (WebSockets). Users also receive both in-app notifications and email notifications once a scan is complete.

By leveraging Google's ADK and a modern stack including Django, PostgreSQL, Celery, RabbitMQ, and Google Cloud services like Google Cloud Storage and Vertex AI, Frisque is designed to be modular, scalable, and deployment-ready. This project is also a contribution to the Agent Development Kit Hackathon with Google Cloud, highlighting our use of Google Cloud technologies and the open-source ADK

Architecture and Stack

Frisque's architecture and technology stack are designed for modularity, scalability, and efficient AI-powered due diligence. It aims to support asynchronous workloads and intelligent processing.
Here's a breakdown of the key components:

Backend Framework (Django): Frisque is a web-based platform built on Django. Django provides a self-contained framework for the application's backend. Its ORM (Object-Relational Mapper) simplifies database interactions by managing models for users, companies, and scan jobs. This structure allows for quick integration with other technologies, such as Docker, for consistent development and deployment.

AI Agents (Google Agent Development Kit - ADK): The platform leverages Google's Agent Development Kit (ADK) to create its multi-agent AI system. ADK is an open-source, code-first framework designed for building and deploying sophisticated AI agents. It is "Multi-Agent by Design," enabling complex coordination and delegation of tasks within a team of agents. The Agent Starter Pack provides an easier way to quickly set up, customize, and deploy agents. This approach supports modular and scalable development, breaking down intricate problems into manageable sub-tasks handled by specialized agents.

Asynchronous Task Queue (Celery) and Message Broker (RabbitMQ): Frisque uses Celery for background task processing. When a scan is initiated, a Celery task is dispatched to handle it asynchronously. This allows for scheduling and managing complex, time-consuming operations outside of the main web request flow. RabbitMQ (or Redis) serves as the message broker for Celery, facilitating communication between the application and the worker processes.

Containerization (Docker and Docker Compose): Docker is used for containerization, ensuring that the application and all its dependencies are packaged into isolated units. Docker Compose simplifies the management of multi-container Docker applications for local development. This setup provides reproducibility across different environments, making it easy to get the development environment up and running consistently. All development commands are designed to be run inside the web container for a consistent workflow.

Database (PostgreSQL): PostgreSQL is the chosen database for storing structured data. This includes details of target companies and scan job metadata.

Object Storage (Google Cloud Storage - GCS): Google Cloud Storage (GCS) is integrated for storing unstructured data. This includes uploaded documents like pitch decks and financial spreadsheets, as well as generated reports and memos.

Real-time Communication (Django Channels): Django Channels, utilizing WebSockets, enables real-time updates and notifications. This allows the scan results page to display live progress updates and provides in-app notifications upon scan completion.

Infrastructure as Code (Terraform): Terraform is used for provisioning Google Cloud Platform (GCP) resources. This ensures that the cloud infrastructure is managed consistently and repeatably.

Cloud Platform (Google Cloud Platform - GCP): The entire system is designed to leverage Google Cloud Platform services for deployment and scalability. This includes potential use of Vertex AI for Agent Engine and LLM hosting, Cloud Run for serverless agent deployment, and Cloud SQL for managed PostgreSQL. Frisque is also a contribution to the Agent Development Kit Hackathon with Google Cloud.

This comprehensive stack allows Frisque to efficiently process complex due diligence tasks, manage data, and provide real-time insights to users.

Agentic AI – Orchestration in action

Frisque's power lies in its agentic AI system, meticulously designed to replicate and enhance the collaborative nature of a human due diligence team. This sophisticated structure is made possible by leveraging Google's open-source Agent Development Kit (ADK), a framework built to develop, evaluate, and deploy sophisticated AI agents and multi-agent systems. ADK is inherently "Multi-Agent by Design," which means it excels at enabling complex coordination and delegation of tasks within a hierarchy or team of agents.
When an analyst initiates a "scan" on a target company within Frisque, a comprehensive process of intelligent orchestration begins.

The Orchestrator Agent (Master Bot): At the core of this system is a Master Bot, acting as the Orchestrator Agent. Its primary role is to receive the initial scan request and intelligently delegate specific sub-tasks to a team of specialized worker agents. This delegation is crucial for breaking down intricate due diligence problems into manageable parts.

Specialized Agents in a Pipeline: Frisque employs a diverse set of specialized worker agents, each with a distinct focus. These include the Tech Bot, Legal Bot, Market Research Bot, Social Media Sentiment Bot, and Financial Bot. A key learning during development was the adoption of a "Pipeline / Assembly line architecture" or "Dumb Worker, Smart Master" pattern. In this architecture, the complex logic and coordination are handled by the master agent, while a dedicated worker agent is specifically responsible for formatting the responses. This separation of concerns proved vital in solving initial challenges like prompt leakage and hallucination, reinforcing the benefits of a microservices design for scalability and isolation.

Intelligent Tool Selection and Inquiry: A significant aspect of the agents' intelligence lies in their inherent agency. The ADK's design allows bots to choose their own tools without explicit direction from the developer. This means agents can intelligently decide which resources to use for their tasks, whether it's utilizing Large Language Models (LLMs), Natural Language Processing (NLP) tools, integrating with external APIs, or even using other agents as tools. This self-directed tool selection, and the ability to inquire further after obtaining initial results, streamlines the development process and enhances the depth of research. For instance, the Market Research Bot might autonomously decide to use web search tools to gather market size data or a sentiment API to analyze social media.

Synthesis and Output: As each specialized agent completes its analysis, it gathers, processes, and analyzes information based on its function and the provided inputs. The Orchestrator then synthesizes these findings from all the specialized agents into comprehensive and actionable outputs. This culminates in a structured Investment Memo, a summary Dashboard with key findings and scores, and potentially basic financial projections. The system also provides a valuable list of assumptions made, key questions to ask the startup, and all cited sources.

This orchestrative, multi-agent approach allows Frisque to efficiently process complex due diligence tasks, manage vast amounts of data, and provide real-time, insightful analyses to VC firms.

Challenges and Revelations

One of the first and most critical challenges was agent hallucination. Agents were generating incorrect or fabricated information. Closely related was prompt leakage. This occurred due to difficulties in system integration. Initially, the master agent was responsible for both task delegation and response formatting. This design inadvertently led to the agents' tendency to hallucinate and expose prompts in unintended ways.

We fixed the above by creating a new agent to format the final output before it is returned by the master agent. Even in this we had to be explicit in terms of the fields we required in the output in both the master agent and the formatting agent. Otherwise we would get errors of missing fields.

Another inherent challenge in building multi-agent systems, particularly with complex interactions, involves designing and debugging their orchestration. Ensuring the consistency and accuracy of Large Language Model (LLM) calls across various agent tasks, while also managing their associated costs, proved challenging. The overall quality and availability of input data for target companies also directly impacted the effectiveness of the agents. Finally, effectively quantifying and training agents to achieve the depth of insight expected by experienced Venture Capitalists was a significant undertaking

Key Learnings

Embracing a Pipeline/Microservices Architecture for Agentic Systems: Our development journey revealed that complex multi-agent systems, especially those dealing with detailed outputs, can suffer from agent hallucination and prompt leakage. This was a profound revelation, teaching us the crucial importance of a pipeline or assembly line architecture. By introducing a new, dedicated agent solely for formatting the final output, we achieved a clearer separation of concerns. This "Dumb Worker, Smart Master" pattern proved far more effective for managing complex logic, allowing agents to specialize. This experience solidified our conviction that a microservices approach is generally superior to a monolith for deployment, offering benefits like enhanced scalability, technology flexibility, reduced single points of failure, and faster deployments. We saw how individual management of agents became possible, allowing us to explore other technologies if needed.
Leveraging Google Agent Development Kit (ADK) for Multi-Agent Orchestration: ADK, as an open-source, code-first framework, became the backbone of Frisque, empowering us to build, evaluate, and deploy sophisticated AI agents. Its "Multi-Agent by Design" principle was instrumental for enabling the complex coordination and task delegation within our system. We learned to fully utilize ADK's flexible orchestration capabilities, including both workflow agents for predictable pipelines (like SequentialAgent, ParallelAgent, and LoopAgent) and LLM-Driven Dynamic Routing for adaptive behaviors. The integrated developer experience, complete with a command-line interface (CLI) and a visual Web UI, significantly aided our development, allowing us to run agents, inspect execution steps, and debug interactions in real-time. The built-in observability and debugging tools, which log agent decisions, tool usage, and trace delegation paths, were invaluable for understanding and refining our agents' behavior.
Precision in Prompting and Agent Tooling: A critical insight gained was that agents within ADK do not need explicit direction on which tools to use. Simply providing the prompt is sufficient, as ADK already exposes the available tools to the agent, and the agent's selection of tools is part of its inherent agency. However, we also learned the critical importance of being explicit in terms of required output fields (both in the master agent's instructions and the formatting agent's directives) to prevent errors and ensure consistent data.
The Paramount Importance of Data Quality: The effectiveness of our AI agents in due diligence directly correlated with the quality and availability of input data. This highlighted the absolute necessity of establishing robust data management processes and infrastructure for input collection and storage. We chose Google Cloud Storage (GCS) for securely housing uploaded documents and generated reports, with PostgreSQL maintaining structured data and references.
Balancing AI Capabilities with Human Expectation: Quantifying and training agents to achieve the depth of insight expected by experienced Venture Capitalists proved a significant undertaking. Our learning here was the value of an iterative and specialized approach. By developing distinct agents for different domains—such as Tech Bot, Legal Bot, Market Research Bot, Social Media Sentiment Bot, and Financial Bot—we could address specific analytical tasks. This modularity, combined with a basic scoring mechanism, is our path to incrementally achieving sophisticated VC-level insights, understanding that AI augments, rather than replaces, human judgment in complex financial decisions

Future Plans

Collapse the results into a downloadable pdf document
Add more ai agents
Add email notification for when a scan is done
Allow selective scans, e.g., security, sentiment analysis, social media, legal, etc
Create a scan history and dashboard pages
Integrate MCP, A2A and other integrations
Scoring of startups for investments purposes

DYI

Please find the code here
Follow the results in the README to reproduce the project. No api keys or envs needed.

Conclusion

The comprehensive technology stack employed by Frisque allows it to efficiently process complex due diligence tasks, manage data, and provide real-time insights to users. The process of due diligence mirrors that of fundraising. Marc Andressen once compared fundraising rounds to removing the layers of an onion and we hope Frisque can help make this less tear-worthy for VCs.

Data Engineering Concepts: A project based introduction

Benson King'ori — Wed, 14 May 2025 09:14:41 +0000

I recently finished the Data Engineering Zoomcamp by DataTalks Club. For my certification, I was required to undertake a capstone project that would culminate in a dashboard showing insights from the data I had processed in my pipeline.

Instead of a step-by-step guide (which can be easily found in the project’s README), this article explores data engineering concepts from a high-level view, explaining the decisions I made and the trade-offs I considered.

Data Sourcing and Problem Definition
Containerization
Infrastructure-as-Code (IaC)
Orchestration vs Automation
Data Lake and Data Warehouse
Analytics Engineering and Data Modeling
Batch vs Streaming
Exposure: Visualization and Predictions
Conclusion

1. Data Sourcing and Problem Definition

Before coding, I sourced the data and defined the problem. I chose the LinkedIn Job Postings dataset from Kaggle due to its richness and descriptive documentation.

Problem Statement:

How can data from LinkedIn job posts (2023–2024) help us make informed decisions on a career path?

I then went ahead to break it down into the following issues, each of which would be addressed by a chart in my dashboard:

Which job titles offer the highest salaries?
Which companies, industries, and skills are the most lucrative?
What percentage of companies offer remote work?
What are the highest salaries and average experience levels?
Which countries have the most job postings?

These were, of course, not MECE-compliant. (MECE - mutually exclusive and collectively exhaustive)

2. Containerization

Before I could think of extracting my data, I took a high level and long term view of my project and considered aspects such as collaboration and reproducibility. Whereas it is true that I could easily just create the pipelines and files needed on my local machine with no packaging whatsoever, this would pose a challenge to anyone who would be looking to evaluate or reproduce my project. I, hence, decided to use docker containers to package my project for replication either locally or even in the cloud.

Docker containers also have other advantages such as being lightweight, easily replicable thus allowing scalability via horizontal scaling and load balancing, increases project maintainability since Dockerfiles simplify environment management, isolation between the containers prevents dependency conflicts and it supports version control via versioned images.

3. Infrastructure as Code (IaC)

I would be using GCP (Google’s cloud wing) for my data lake, data warehouse and dashboard hosting and so I needed a reliable way to interact with the cloud. Infrastructure-as-Code (IaC) is the practice of managing and provisioning computing infrastructure—such as servers, networks, databases, and other resources—through machine-readable configuration files rather than through manual processes.

The use of IaC tools simplifies the process of cloud infrastructure management and allows for scalability, version control, testability and automation. Apart from provisioning infrastructure, IaC tools can be used for other management activities such as enabling APIs in GCP and many more. It also allows reusability of resources since it avoids creating new resources if the defined ones already exist.

Terraform is the IaC tool that I used due to how simple it is. I made it modular and included the use of variables and outputs to integrate terraform into my project’s workflow. An alternative to terraform is AWS Cloudformation which is used in AWS setups.

4. Orchestration vs Automation

A data workflow is a sequence of automated data processing steps that specifies what are the steps, inputs, outputs, and dependencies in a data processing pipeline. Data workflows are also called DAGs (Directed Acyclic Graphs). Directed means they have direction, Acyclic means there are no cycles. There may be loops but no cycles are allowed. The difference between a loop and a cycle in this case is that in a loop, we know the starting and ending point. The loop ends based on whether a certain condition is met but a cycle has none. DAGs are run using tools / engines for orchestration like Apache Airflow, Luigi, Prefect, Dagster, Kestra, etc. Smaller workflows can be run using make and or cron jobs but this is usually done locally.

In software engineering and data management, an orchestrator is a tool that automates, manages, and coordinates various workflows and tasks across different services, systems, or applications. Because an orchestrator allows everything to run smoothly without the need for manual intervention, it is easy to confuse orchestration with automation.

Whereas automation refers to the execution of individual tasks or actions without manual intervention, orchestration goes beyond automation by managing the flow of multiple interconnected tasks or processes. Orchestration defines not only what happens but also when and how things happen, ensuring that all tasks (whether automated or not) are executed in the correct order, with the right dependencies and error handling in place. While automation focuses on individual tasks, orchestration ensures all those tasks are arranged and managed within a broader, cohesive system. This matters if you need to reliably handle complex processes with many interdependent steps.

Use cases for automation include automated testing after code commits, automated backups and automated email notifications. Use cases for orchestration include data pipeline orchestration, CI/CD pipeline orchestration and cloud infrastructure orchestration.

Advantages of workflow orchestration include:

Scalability
Error handling and resilience
Improved monitoring and control
Process standardization
Faster time to value since no need to reinvent the wheel

What’s the Difference?

Category	Automation	Orchestration
Scope	Single task execution	Coordination of multiple tasks
Focus	Efficiency of individual actions	Dependency management, error handling
Example	Automated backups	CI/CD pipelines, data pipeline scheduling

For my workflow orchestration tool, I chose Apache Airflow because it is the most common in the industry. I built Airflow using a docker-compose.yml file and Dockerfile which installs google sdk (a way to interact with GCP). I then created a dag that had multiple steps which include downloading, unzipping and uploading the data to the created gcs bucket.

Why I Used Apache Airflow

Industry standard for DAG orchestration
Allows complex workflows
Supports retries, alerts, and dependency management
Easily containerized using docker-compose.yml and custom Dockerfile

5. Data Lake vs Data Warehouse

Data Lake

A data lake is a centralized repository that allows you to store structured, semi-structured, and unstructured data at any scale, in its raw, native format until it's needed for analysis.

Features of a data lake:

Allows ingestion of structured and unstructured data
Catalogs and indexes data for analysis without data movement
Stores, secures and protects data at an unlimited scale
Connects data with analytics and ML tools

Why do we need a data lake:

Companies realized the value of data
Allows for quick storage and access of data (contingent on tier if S3)
It is hard to always be able to define the structure of data at the onset
Data usefulness is sometimes realized later in the project lifecycle
R&D on data products requires huge amounts of data
The need for cheap storage of Big Data

Cloud providers of data lakes include Google Cloud Storage by GCP, S3 by AWS and Azure Blob by Azure.

Dangers in a data lake:

Conversion into a data swamp (disorganized, inaccessible, and untrustworthy data lake)
No versioning
Incompatible schemas for same data without versioning
No metadata associated
Joins not possible

Data Warehouse

A data warehouse is a centralized, structured repository designed to store, manage, and analyze large volumes of cleaned and organized data from multiple sources to support business intelligence (BI), reporting, and decision-making. This is where we have the partitioning and clustering capabilities.

Feature	Data Lake	Data Warehouse
Data Type	Raw (structured + unstructured)	Refined (structured only)
Purpose	Storage for future use	Fast analytics & reporting
Design	Schema-on-read	Schema-on-write
Example Tool	Google Cloud Storage	BigQuery, Redshift, Snowflake

Data warehouse cloud providers include GCP BigQuery, Amazon Redshift and Snowflake by Azure.

Additional Topics about Data Warehousing

Unbundling

Data warehouse unbundling is the process of breaking apart a traditional, monolithic data warehouse into distinct, independently scalable components. In practice, this involves decoupling ingestion, storage, processing and compute allowing the following:

Scale the the two independently
Adopt best-of-breed tools because of modularity leading to better performance, agility and innovation.
Improve agility and maintenance

Not all data warehouses are unbundled so be sure to check out if the one you want to use is.

OLAP vs OLTP:

Online analytical processing (OLAP) and online transaction processing (OLTP) are data processing systems that help you store and analyze business data. You can collect and store data from multiple sources—such as websites, applications, smart meters, and internal systems. OLAP combines and groups the data so you can analyze it from different points of view. Conversely, OLTP stores and updates transactional data reliably and efficiently in high volumes. OLTP databases can be one among several data sources for an OLAP system. Both online analytical processing (OLAP) and online transaction processing (OLTP) are database management systems for storing and processing data in large volumes.

The primary purpose of online analytical processing (OLAP) is to analyze aggregated data, while the primary purpose of online transaction processing (OLTP) is to process database transactions. You use OLAP systems to generate reports, perform complex data analysis, and identify trends. In contrast, you use OLTP systems to process orders, update inventory, and manage customer accounts. A data warehouse is an OLAP solution.

Feature	OLTP	OLAP
Purpose	Manage and process real-time transactions/business operations	Analyze large volumes of data to support decision-making
Data Updates	Short, fast updates initiated by users	Data periodically refreshed with scheduled, long-running batch jobs
Database Design	Normalized databases for efficiency and consistency	Denormalized databases using star/snowflake schemas for analytical queries
Space Requirements	Generally small (if historical data is archived)	Generally large due to aggregating large datasets
Response Time	Milliseconds – optimized for speed	Seconds or minutes – optimized for complex queries
Backup and Recovery	Frequent backups required for business continuity	Data can be reloaded from OLTP systems in lieu of regular backups
Productivity	Increases productivity of end-users and transaction handlers	Increases productivity of analysts, executives, and decision-makers
Data View	Detailed, day-to-day business transactions	Aggregated, multi-dimensional view of enterprise data
Example Applications	Order processing, payments, inventory updates	Trend analysis, forecasting, executive dashboards
User Examples	Customer-facing staff, clerks, online shoppers	Business analysts, data scientists, senior management
Data Structure	Row-based storage	Columnar storage (in most modern OLAP systems like BigQuery, Redshift, etc.)
Examples by Provider	Google Cloud SQL, Amazon Aurora, Cloud Spanner	BigQuery, Amazon Redshift, Snowflake

6. Analytics Engineering and Data Modeling

What Is Analytics Engineering?

Analytics engineering is a field that seeks to bridge the gap between data engineering and data analysis. It introduces good software engineering practices (such as modularity, version control, testing, documentation and DRY) to the efforts of data analysts and data scientists. This is done by the use of tools such as dbt, dataform, aws glue and sqlmesh.

Data Modeling:

Data modelling is the process of defining and organizing the structure of data within a system or database to ensure consistency, clarity, and usability. It involves creating abstract representations (models) of how data is stored, connected, and processed. There are three levels of data models:

Conceptual - High-level view of business entities and relationships (dimensions tables)
Logical - Defines the structure and attributes of data without database-specific constraints. Measurements, metrics or facts (Facts tables)
Physical - Maps the logical model to actual database schemas, tables, indexes, and storage.

ETL vs ELT

We can either use ELT or ETL when transforming data. The letters represent the same words (Extract, Load, Transform) but the order matters.

Feature	ETL	ELT
Data Volume	Small	Large
Transformation Time	Before loading	After loading
Flexibility	Lower	Higher
Cost	Higher compute	Lower overall

In terms of data modelling, I used cloud dbt because it is easy to use and allows for integration with Github. It does have a limit of one dbt project for the non-premium account so I did have to delete a previous project.

Instead of having data duplication by uploading all my data from the GCS bucket into BigQuery, I used dbt to create external tables which reference data without having to load it into BigQuery’s native storage. The trade off here is that performance in areas such as access and querying may be a little slow due to on-the-fly reading.

I also used dbt to partition and cluster my tables in bigQuery. Partitioning splits a table into segments (partitions) based on the values of a specific column (usually date/time or integer range). Each partition stores a subset of the table’s data, and queries can skip entire partitions that aren’t relevant. The same query processes less data for a partitioned table than for a non-partitioned table thus saving both time and money for queries that are frequently run. You can have a maximum of 4000 partitions in a table.

Clustering, on the other hand, organizes rows within a partition (or unpartitioned table) based on the values in one or more columns. It enables fine-grained pruning of data during query execution and optimizes filter operations, joins and aggregations by organizing data on disk. You can specify up to 4 columns to cluster by; They must be top level and non-repeated fields. Big query performs automatic reclustering for newly added data at no cost.

Feature	Partitioning	Clustering
Granularity	Coarse (splits table into partitions)	Fine (organizes rows within partitions/tables)
Basis	One column (DATE/TIMESTAMP/INTEGER)	Up to 4 columns (any type)
Performance	Skips entire partitions	Skips blocks of rows within a table
Cost Efficiency	Reduces scan by entire partitions. Cost is predictable	Reduces scan via pruning but cost benefit varies
Storage Layout	Logical partitioning (physically separated partitions)	Physical sorting within storage blocks
Best Used For	Time-series or log data	Frequently filtered or grouped columns with repetition
Limitations	Max 4000 partitions per table	Max 4 clustering columns; no nested/repeated fields

7. Batch vs Streaming

In data engineering and big data, there is usually large amounts of data being generated all the time. A data engineer will thus need to decide whether to process the data as it comes (streaming) or batch it up and process it as intervals.

How to decide between streaming and batch

A good heuristic to follow is to only use streaming when there is an automated response to the data at the end of the pipeline instead of just a human analyst looking at the data (that'd be over engineering). As such, use cases for streaming include fraud detection, hacked account detection and surge pricing (uber).

Batch is best for use cases where data is generated in large volumes but not continuously and can be processed in intervals. You can even use micro batching (15 and 60 minute batches) in case you have a lot of data but not enough to justify streaming.

Streaming uses a pub-sub model where publishers publish data and subscribers read and process this data. Data is transmitted in packets known as topics and each topic stores it’s own timestamp.

The use cases for streaming in analytical data is low (which is the main data that data engineers mostly use). Streaming is more like owning a server, website or rest API rather than a batch pipeline or offline process. It is much more complex and some organizations even have different names for batch and streaming data engineers. (e.g., at Netflix, Data engineers handle batch processing whereas SWE, data handle stream data processing).

In terms of technology, Apache spark is used for batch processing, Apache Kafka is used for streaming and Apache Flink supports both batch and stream processing but it was built for stream processing.

8. Exposure: Visualization and Predictions

In data engineering - especially in the context of modern data tooling like dbt - an exposure refers to the end-use or downstream dependency of data models that shows where and how the data is being used outside of the transformation layer. Example exposures can be assets such as dashboards, machine learning models, external reports and APIs.

My exposure was, of course, the Looker Studio dashboard (shown below).

I only used my final fact table in the dashboard as I had condensed all my previous staging and dimension tables into it using dbt.

Some of the insights I got were that:

Software development is the highest paying industry.
Top-paying skills: Sales, IT, Management, Manufacturing
Average required experience: 6 years

Conclusion

I learned a lot during the Zoomcamp and capstone project. The hands-on nature, real-world tooling, and community support made the journey insightful and practical.

If you're interested in data engineering, I highly recommend joining a live cohort of the DataTalks Club Zoomcamp to get the full experience and earn certification.

This article was just a high-level tour of the data engineering landscape—feel free to dig deeper into any concept that intrigued you.

Bon voyage!

Terraform on AWS: An introductory guide

Benson King'ori — Wed, 05 Feb 2025 09:04:01 +0000

Terraform Overview

When people hear the term terraform, they often think of terraforming planets—a concept popularized by scientists and visionaries like Elon Musk, who envisions making Mars habitable. In this case, terraform means developing a rock or dead planet that is inhabitable so that it can have the necessary conditions to be able to sustain life. Terraforming Mars would involve generating an atmosphere, introducing water sources, and fostering plant life to create conditions where humans could survive. Similarly, in the world of software development, HashiCorp’s Terraform follows the same principle—except instead of reshaping planets, it transforms cloud platforms like AWS, GCP, and vSphere, or on premise resources, into structured environments where applications can thrive. Just as planetary terraforming establishes the foundation for life, Terraform as Infrastructure-as-Code (IaC) lays the groundwork for scalable and automated infrastructure where software can run seamlessly.

What is Terraform?

Terraform is an Infrastructure as Code (IaC) tool developed by HashiCorp. It allows users to define cloud and on-premises infrastructure using human-readable configuration files. The tool provides a consistent workflow to provision, manage, and automate infrastructure across its lifecycle.

Why Use Terraform?

Simplicity – All infrastructure is defined in a single file, making it easy to track changes.
Collaboration – Code can be stored in version control systems like GitHub for team collaboration.
Reproducibility – Configurations can be reused for different environments (e.g., development and production).
Resource Cleanup – Ensures unused resources are properly destroyed to avoid unnecessary costs.

What Terraform is NOT

Not a Software Deployment Tool – It doesn’t manage or update software on existing infrastructure.
Cannot Modify Immutable Resources – Some changes (e.g., VM type) require destroying and recreating the resource.
Does Not Manage External Resources – Terraform only manages what is explicitly defined in its configuration files.

Terraform Workflow

Terraform Installed Locally – The CLI runs on a user’s machine.
Uses Providers – These connect Terraform to cloud services (AWS, Azure, GCP, etc.).
Authentication Required – API keys or service accounts authenticate access to cloud platforms.

Terraform Installation

Go to this link and follow the command that match your system’s specification

Key Terraform Commands

terraform init – Downloads provider plugins and initializes the working directory.
terraform plan – Shows what changes Terraform will make before applying them.
terraform apply – Provisions the defined infrastructure.
terraform destroy – Removes all resources defined in the Terraform configuration.

Terraform Files and Their Generation

Terraform uses several files to manage your infrastructure state, configuration, and dependencies. Below is an overview of the key files, what they represent, and which Terraform command triggers their creation or update:

Configuration Files (*.tf files):
- Purpose: These files (such as main.tf, variables.tf, outputs.tf, etc.) are written by you to define your infrastructure. They describe the resources you wish to provision and how they interrelate.
- When They Are Created: You create these manually. They form the blueprint for Terraform to understand and manage your environment.
Terraform State File (terraform.tfstate):
- Purpose: This file tracks the current state of your infrastructure. It maps your configuration to the real-world resources, ensuring that Terraform can determine what changes need to be made.
- When It Is Generated/Updated:
  - After terraform apply: When you run terraform apply, Terraform provisions your infrastructure based on your configuration. During this process, it creates or updates the terraform.tfstate file with the current state of the resources.
  - After terraform destroy: Similarly, when you destroy resources, the state file is updated to reflect that the resources no longer exist.
Terraform Lock File (.terraform.lock.hcl):
- Purpose: This file locks the versions of the provider plugins used in your configuration to ensure consistency and prevent unexpected changes from newer versions.
- When It Is Generated/Updated:
  - After terraform init: Running terraform init downloads the required provider plugins and creates the .terraform.lock.hcl file. This ensures that every team member or CI/CD pipeline uses the same provider versions.
Terraform Directory (.terraform):
- Purpose: This hidden directory stores downloaded provider plugins, module sources, and backend configuration. It is essential for Terraform's operation.
- When It Is Generated:
  - After terraform init: The .terraform directory is automatically created when you initialize your Terraform working directory using terraform init.
Plan Output File (optional):
- Purpose: If you choose to save the execution plan to a file (using the -out flag with terraform plan), this binary file captures the set of changes Terraform intends to make.
- When It Is Generated:
  - After terraform plan -out=<filename>: Running this command generates a plan file that can later be applied using terraform apply <filename>. This is useful for reviewing changes or automating deployment workflows.

Configure aws credentials locally

This is important if you want to be able to access your aws account and resources on your terminal or inside your code using an sdk such as boto3.
There might be other ways to do this but I will list 2 here:
aws configure
Export the credentials as environment variables

1. aws configure

To use this, you will need to install aws-cli which you can do from here.

I found that sudo apt install aws-cli or pip3 install aws-cli worked just as well for me.

You can confirm that you have installed it by checking it’s version as below:

aws --version

To configure your credentials locally, you will need to create an IAM user and give them some permissions.

Log onto your aws console
Navigate to the IAM section
Create a new user and grant them the required permissions
Download the access key and secret key as csv of that user and store it securely. I prefer this rather than just copying them from the console. Just make sure not to commit them publicly. For example, if you’re working on a repository that has a remote version in github/gitlab/bitbucket etc then consider adding the csv to your git ignore before adding, committing and pushing changes

Now, run the following command to configure your credentials locally.

aws configure

You will be prompted to enter the access key id, secret access key, region, and output format (choose json, text, or table, default is json). Once you fill them, it creates 2 files in the ~/.aws directory: credentials and config. The credentials file contains the access key and secret key. The config file contains the region and output format.

To test if you have access to your aws account from your local terminal, create a dummy s3 bucket then run the following command to list your buckets:

aws s3 ls

2. Export the credentials as environment variables

Run the following command on your terminal, replacing the stringed text with your actual values:

export AWS_ACCESS_KEY_ID="your-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret-access-key"
export AWS_DEFAULT_REGION="your-region"

Terraform, Boto3, and AWS CLI will automatically use these from the environment variables or the files in the ~/.aws directory

Managing resources on terraform

Create a main.tf file in the folder you are working in and save the following code in the file:


terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "5.85.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "your-region"
}

# Variable definitions
variable "aws_region" {
  description = "AWS region for resources"
  type        = string
  default     = "your-region"
}

resource "aws_s3_bucket" "example" {
  bucket = "my-tf-test-bucket-${random_id.bucket_suffix.hex}" # Make bucket name unique

  tags = {
    Name        = "My bucket"
    Environment = "Dev"
  }
}

# Add a random suffix to ensure bucket name uniqueness
resource "random_id" "bucket_suffix" {
  byte_length = 4
}

Replace “your-region” with your actual region.

Now we can run the terraform commands.

1. Initialize terraform in your folder

Run the following command:

terraform init

It initializes the backend and provider plugins. It also creates a lock file .terraform.lock.hcl to record the provider selections. It also creates a .terraform folder. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

2. Plan

Run the following command to see any changes that are required for your infrastructure:

terraform plan

It generates an execution plan based on the code in main.tf. At the end of the output there is a summary that looks like the below:

Plan: 2 to add, 0 to change, 0 to destroy.

Changes will be suggested which you can agree to by running the apply command explained below. You can save the plan by using the out flag:

terraform plan -out=filepath-to-save-file

3. Apply

To apply the changes suggested run the command:

terraform apply

You will be prompted to confirm by typing yes. Be careful as this does create the resources thus you will incur costs on your aws account unless you have cloud credits or are using the free tier.

After typing yes, go to the console and navigate to the s3 section. Check if you have a new bucket created. Alternatively, you can run the following command to list your buckets:

aws s3 ls

If you had previously saved your plan to a file, please run the following command to apply that plan:

terraform apply “filename”

4. Delete

Run the following command to delete the resources provisioned by terraform

terraform destroy

Your resources marked for deletion will be listed and you will again be prompted for confirmation. Confirm by typing yes.

Summary

This guide walks you through setting up AWS credentials locally using both the AWS CLI configuration and environment variables, and demonstrates how to manage AWS resources with Terraform. You learned how to write a basic Terraform configuration to create an S3 bucket, initialize your project, preview changes with a plan, apply those changes, and ultimately destroy the resources when they are no longer needed. This systematic approach to infrastructure management not only ensures consistency and repeatability but also aligns with modern DevOps best practices.

Deploy Your Site in Seconds Using AWS Amplify

Benson King'ori — Sun, 12 Jan 2025 05:27:43 +0000

In today’s fast-paced digital world, deploying a web application quickly and reliably is crucial. AWS Amplify provides a seamless way to get your site online in seconds. To demonstrate its power, I built Choicepool, an interactive web app designed to simplify two-way door decision-making through fun games like coin flips, dice rolls, and rock-paper-scissors. The project combines interactivity with ease of deployment, showcasing how AWS Amplify can power robust web applications.

This post will introduce the Choicepool project, compare various hosting options, highlight the benefits of AWS Amplify, and guide you through deploying your site using Amplify and its integrations.

About Choicepool

Choicepool is a simple yet engaging web app built with HTML, CSS, and JavaScript. It allows users to input their choices and randomly pick one through gamified experiences. By leveraging games, Choicepool makes decision-making both efficient and fun. The app was deployed using AWS Amplify, illustrating how simple and fast deployment can be when using the right tools.

You can find choicepool here

Hosting Options for Web Apps

When deploying a web app, choosing the right hosting provider is vital. Below are common hosting options, along with their pros and cons:

1. GitHub Pages

Pros:
- Free for public repositories.
- Easy to deploy static websites.
- Well-integrated with GitHub repositories.
Cons:
- Limited to static sites; no backend support.
- Lacks advanced scalability options.

2. Netlify

Pros:
- Simple CI/CD for static and serverless apps.
- Built-in features like form handling and serverless functions.
Cons:
- Costs can increase with higher traffic.
- Less powerful integration with backend services.

3. Vercel

Pros:
- Optimized for React and Next.js apps.
- Automatic builds and global CDN.
Cons:
- Limited support for backend integrations.
- Pricing tiers can become costly for large teams.

4. AWS Amplify

AWS Amplify is a game-changer for hosting web apps, offering seamless deployment and integration with other AWS services.

Pros:
- Ease and Speed of Deployment: Amplify makes deployment fast, whether through uploading zipped files or connecting a Git repository.
- Powerful Integrations: Supports services like Amazon S3 for file storage, DynamoDB for databases, and Lambda for serverless functions.
- Scalability: Automatically scales with traffic demands.
- Built-in CI/CD Pipelines: Automates builds and deployments directly from your Git repositories.
- Rich Feature Set: Includes hosting, authentication, analytics, and AI/ML capabilities via other AWS services.
Cons:
- Initial learning curve for those new to AWS services.
- Costs can increase with extensive feature usage.

Step-by-Step Guide to Hosting with AWS Amplify

Amplify simplifies the deployment process, making it accessible to both beginners and experienced developers. Here’s how you can deploy your site:

1. Prepare Your Web App

Ensure your web app is ready for deployment. This involves:

Testing your app locally.
Ensuring all assets (CSS, JavaScript, images) are included.

2. Create an AWS Account

If you don’t already have an AWS account, create one at AWS. Navigate to the AWS Management Console and search for Amplify.

3. Deploy Your Site

Option 1: Upload a Zipped File

Compress your project folder into a .zip file.
Go to the Amplify console and select Host a Web App.
Upload your zipped file.
Amplify will handle the rest, generating a live URL for your site.

Option 2: Deploy from GitHub, GitLab, or Bitbucket

In the Amplify console, select Host a Web App.
Connect your GitHub, GitLab, or Bitbucket account.
Choose the repository and branch you want to deploy.
Amplify will build and deploy your app.

4. View Your Hosted Site

Once deployed, Amplify generates a live URL for your site. You can share this link or configure a custom domain.

Enhancing Deployments with Amazon Q Developer

Amazon Q Developer, a powerful tool for AWS users, enhances Amplify’s deployment capabilities by integrating intelligence into the CI/CD pipeline.

Advantages of Amazon Q Developer

Automated Insights: Identify potential issues during deployment.
Optimized Resource Allocation: Suggests configurations to reduce costs.
Simplified Integration: Easily integrates with other AWS services.

How to Gain Access

Amazon Q Developer is available through the AWS Management Console. To use it:

Navigate to the AWS Marketplace.
Search for Amazon Q Developer.
Follow the instructions to enable it for your account.

Why AWS Amplify Stands Out

Amplify-hosted sites are highly versatile and capable. As demonstrated by Choicepool, a simple app built entirely with HTML, CSS, and JavaScript, Amplify can host interactive applications with minimal setup.

Examples of Amplify’s Power:

API Calls to ML Models: With JavaScript, you can make API calls to AWS SageMaker endpoints, enabling advanced AI capabilities in your app.
Database Integrations: Amplify supports direct integration with AWS DynamoDB, allowing real-time data storage and retrieval.
Standalone Apps: Amplify handles hosting and scaling, so even complex apps can run independently without additional backend infrastructure.

What’s Next for Choicepool

AWS Amplify continues to push the boundaries of what’s possible with web app hosting. For Choicepool, future updates might include:

Sound effects and animations for an even more interactive experience.
Support for multiple languages to reach a broader audience.
User preference tracking through Amplify’s built-in analytics and backend integrations.

Conclusion

AWS Amplify is a powerful tool for developers seeking fast, scalable, and feature-rich hosting. By enabling rapid deployment and seamless integration with AWS services, Amplify allows developers to focus on building impactful applications. Whether it’s a simple decision-making app like Choicepool or a complex AI-driven platform, Amplify ensures your project is ready to reach the world in seconds.

Ready to deploy your site? Head to AWS Amplify and get started today!

Deep Fake, Easily Made

Benson King'ori — Thu, 22 Feb 2024 14:33:53 +0000

Refacer is an open source library that allows easy replacement of faces in a video. In this article, I will detail how to use it in order to do exactly that.

Prerequisites

A laptop/desktop device (I suppose even a tablet could work)
A Github Account

You will also need to prepare the files you will need in the refacing process -

Source Video: the original video that you wish to clone; for this tutorial, we’ll use a scene from the “The Harder They Fall.” For the initial test run, I would suggest you use a short video that is a minute-long or even shorter.
Target Face Images: Images of the faces you intend to manipulate. These could be screenshots from the video or other sources. For this tutorial, my target face was that of Cherokee Bill played by LaKeith Stanfield.
Replacement Faces Images: Images of the faces that will replace the original faces in the target video. Please make sure you have the consent to use these pictures / faces from the person. For this tutorial I used my own face.

Refacing

To create the deep fake, we will use the refacer repository available on Github here. Before using the app make sure you have read the disclaimer at the end of this project or on the GitHub repository. You can access the Refacer using a Google colab notebook but since that did not work for me, this tutorial will guide you through the longer route - running the project locally.

Setup

I will assume that you are working on linux or any other unix-based system (as is the Godly thing to do as a developer).

Navigate to the directory you want to store the project in (I recommend creating a new folder altogether)
Download this file inswapper_128.onnx and place it inside the folder you just created.
In the terminal, navigate to the folder you just created
Clone the Rafacer repository from GitHub using the below command: git clone https://github.com/xaviviro/refacer.git
Navigate to the refacer directory by using the command cd refacer if you are on linux/mac ~~or chdir refacer if you are on windows~~
Open the requirements.txt file and replace gradio==3.33.1 with gradio==3.36.1 and save & close
Install packages: pip install -r requirements.txt
Run the app: python app.py
Finally, open your web browser and navigate to the following address: http://127.0.0.1:7680

The Refacer Interface

On the specified port, you should see an interface that looks like the one below:

The main sections are:

Original Video Upload: Upload the source video to this section.
Target Faces Placement: Place the faces in the video that you want to replace; up to five faces are supported.
Replacement Faces Placement: Position the faces that will replace the corresponding faces uploaded to section (2).
Output File Display: The resulting file will be displayed here.

Upload the required files

Upload the respective files to their designated sections and click “Reface” (The big orange button at the bottom of the page). This action initiates a process that will “reface” all frames in the video using the provided faces. Please note that this process may take some time.

You can check the terminal to see progress being made. I would also recommend that if you have a light-weight device, you could close all other running apps and leave the device for refacing.

Accessing your video

Your output/refaced video will be in the /out folder. For the full path, please check your terminal. You can also view the refaced video on the browser.

Sharing

Please be sure to share your refaced video like I did here and if you can, tag me!

Till next time comrades, may the force be with you!

Feature Selection On Zindi Starter Notebook

Benson King'ori — Wed, 07 Feb 2024 13:32:02 +0000

On the path to making a predictive model, we are sometimes faced with the choice to cherry pick amongst our list of features. (If the term cherrypick still gives you nightmares from your adventures in gitland then here’s a fitbump 👊). Perhaps this is because of the high dimensionality of our data or just in the cyclic model hyperparameter finetuning. Regardless of the reason, learning the different ways you can employ to decide which features to use and which to not use can end up improving model performance or even reducing computational time and complexity.

EDA, Cleaning and Preprocessing

For this exercise, I employed the Financial Inclusion in Africa Competition in Zindi because of 2 reasons:

Readily available data
A starter notebook that deals with EDA and basic data cleaning

As such, the very next step from these offerings in the competition is to do feature engineering. If you want to join in for a follow-along kind of reading, feel free to download the data and starter Notebook from Zindi here (You might need to create a Zindi account though).

The starter notebook makes use of the following beautiful function that is used to transform both the test and the train datasets.

# function to preprocess our data from train models
def preprocessing_data(data):

    # Convert the following numerical labels from interger to float
    float_array = data[["household_size", "age_of_respondent", "year"]].values.astype(float)

    # categorical features to be onverted to One Hot Encoding
    categ = ["relationship_with_head",
             "marital_status",
             "education_level",
             "job_type",
             "country"]

    # One Hot Encoding conversion
    data = pd.get_dummies(data, prefix_sep="_", columns=categ)

    # Label Encoder conversion
    data["location_type"] = le.fit_transform(data["location_type"])
    data["cellphone_access"] = le.fit_transform(data["cellphone_access"])
    data["gender_of_respondent"] = le.fit_transform(data["gender_of_respondent"])

    # drop uniquid column
    data = data.drop(["uniqueid"], axis=1)

    # scale our data into range of 0 and 1
    scaler = MinMaxScaler(feature_range=(0, 1))
    data = scaler.fit_transform(data)

    return data

Unfortunately, after the transformations, we have a long list of 37 features. Although we can use all of them it is advisable to select the best features which would help us best predict the target variable.

After the processing, we are left with two numpy array sets.

# preprocess the train data 
processed_train = preprocessing_data(X_train)
processed_test = preprocessing_data(test)

I converted the arrays to dataframes and then saved them to CSV files for easy processing in another notebook.

# Save to csv
processed_train = pd.DataFrame(processed_train)
processed_test = pd.DataFrame(processed_test)

processed_train.to_csv('data/preprocessed_train.csv', index = False)
processed_test.to_csv('data/preprocessed_test.csv', index = False)

I could have simply used them in the starter Notebook but I wanted to have a saved version of my pre-processed data for future experimentation with feature Engineering and other techniques that would better model performance.

Feature Selection

In another notebook, I imported the required libraries as well as the datasets:

Afterwards, I started the different ways to select features.

1. Univariate Statistics

Statistical tests can be used to select those features that have the strongest relationship with the output variable.

The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features.

Many different statistical test scan be used with this selection method. For example the ANOVA F-value method is appropriate for numerical inputs and categorical data. This can be used via the f_classif() function.

from sklearn.feature_selection import SelectKBest
from numpy import set_printoptions
from sklearn.feature_selection import f_classif

# feature extraction
test = SelectKBest(score_func=f_classif, k=4)
fit = test.fit(X, y)
# summarize scores
set_printoptions(precision=3)
print(fit.scores_)
features = fit.transform(X)
# summarize selected features
print(features[0:5,:])

We will use the chi method to select the 10 best features using this method in the example below.

Alternatives to ch-squared and ANOVA F-value (all imported from sklearn.feature_selection)

Mutual Information: Measures the mutual dependence between two variables.
Information Gain: Measures the reduction in entropy achieved by splitting data on a particular feature.
Correlation Coefficient: Measures the linear relationship between two numerical variables.
Distance Correlation: Measures the dependence between two random variables.
ReliefF: Computes feature importance based on the ability to distinguish between instances of different classes.

2. Feature Importance

Bagged decision trees like Random Forest and Extra Trees can be used to estimate the importance of features.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

3. Recursive Feature Elimination

The Recursive Feature Elimination (or RFE) works by recursively removing attributes and building a model on those attributes that remain.

It uses the model accuracy to identify which attributes (and combination of attributes) contribute the most to predicting the target attribute.

You can learn more about the RFE class in the scikit-learn documentation.

The example below uses RFE with the logistic regression algorithm to select the top 10 features. The choice of algorithm does not matter too much as long as it is skillful and consistent.

4. Principal Component Analysis (PCA)

Principal Component Analysis(or PCA) uses linear algebra to transform the dataset into a compressed form.

Generally this is called a data reduction technique. A property of PCA is that you can choose the number of dimensions or principal component in the transformed result.

In the example below, we use PCA and select 3 principal components.

Learn more about the PCA class in scikit-learn by reviewing the PCA API. Dive deeper into the math behind PCA on the Principal Component Analysis Wikipedia article.

PCA Usefulness:

Dimension reduction: When dealing with datasets containing a large number of features, PCA can help reduce the dimensionality while preserving most of the variability in the data. This can lead to simpler models, reduced computational complexity, and alleviation of the curse of dimensionality. An example is here where we have 37 features
Data exploration/visualization: PCA can be used to visualize high-dimensional data in lower-dimensional space (e.g., 2D or 3D) for exploratory data analysis and visualization. This can help uncover patterns, clusters, and relationships between variables.
Noise reduction: PCA identifies and removes redundant information (noise) in the data by focusing on the directions of maximum variance. This can lead to improved model performance by reducing overfitting and improving generalization
Feature Creation: PCA can be used to create new composite features (principal components) that capture the most important information in the original features. These components may be more informative or less correlated than the original features, potentially enhancing the performance of machine learning algorithms.
Reducing computational Complexity: In cases where the original dataset is large and computationally expensive to process, PCA can be used to reduce the size of the dataset without sacrificing much information. This can lead to faster training and inference times for machine learning models.
Addressing multicollinearity in the features: PCA can mitigate multicollinearity issues by transforming correlated features into orthogonal principal components. This can improve the stability and interpretability of regression models.

5. Correlation Matrix With HeatMap

A correlation matrix is a square matrix that shows the correlation coefficients between pairs of variables in a dataset. Each cell in the matrix represents the correlation coefficient between two variables.
The correlation coefficient ranges from -1 to 1, where:

1 indicates a perfect positive correlation,
0 indicates no correlation, and
-1 indicates a perfect negative correlation.

A heatmap is a graphical representation of the correlation matrix, where each cell's color indicates the strength and direction of the correlation between two variables.
Darker colors (e.g., red) represent stronger positive correlations, while lighter colors (e.g., blue) represent stronger negative correlations.
The diagonal of the heatmap typically shows correlation values of 1, as each variable is perfectly correlated with itself.
By visualizing the correlation matrix as a heatmap, you can quickly identify patterns of correlation between features.

In the heatmap, you can look for clusters of high correlation (e.g., dark squares) to identify groups of features that are highly correlated with each other.
Once identified, you can decide whether to keep, remove, or transform these features based on their importance to the model and their contribution to multicollinearity.

#get correlations of each features in dataset
corrmat = X.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(37,37))
#plot heat map
g=sns.heatmap(X[top_corr_features].corr(),annot=True,cmap="coolwarm")

Conclusion

In Conclusion, feature selection is an important part of the machine learning process which can have a myriad of benefits. There are a multitude of ways to go about it but depending on your particular use case, be it supervised or unsupervised, some may be better than others. For instance, in supervised learning as was our case above, we compared the features against their usefulness in determining a target variable. However, in unsupervised learning, there is no target variable. As such, most feature selection algorithms that are useful here compare the variables to each other, helping cull out the ones with high multicollinearity.

Another important matrix in determining the Feature selection algorithm to use is the data types of your features. In general, different data types need different selection algorithms. Please see below:

Have a happy time exploring the other algorithms and may the force be with you!

DEV Community: Benson King'ori

Resurrecting Google Reader for the modern web using Kiro

TLDR;

Table of Contents

Introduction

Kiro and its features

Spec driven development

Vibe coding

Agent hooks

Steering docs

MCP

Google Reader

Watcher

How we built it

Similarities Between Watcher and Google Reader

Differences Between Watcher and Google Reader

Lessons learnt

Conclusion

AWS Cloud Resume Challenge - my attempt

TLDR;

Table Of Contents

Introduction

The Problem

My Design

Implementation

Security

Costs

What typically stays free / very low-cost for this project

Where cost risks can appear

Issues encountered

Permissions & Least Privilege (IAM)

Testing Strategy (Unit + Integration)

CI/CD Failures Due to DynamoDB Region / AWS Region Configuration

SNS Subscription Emails Going to Spam

Lessons learnt

Future Work

Conclusion

Building “Sentinel”: multi-agent cybersecurity news triage and publishing system on AWS

TLDR;

Table Of Contents

Introduction

Problem statement

My solution

The architecture

Use of Strands

Use of Bedrock

Human in the loop

Challenges and breakthroughs

Key learnings

Future Plans

Conclusion

Keeping your Streamlit app awake using Selenium and Github Actions

TLDR;

Table of Contents

Introduction

Step 1: Create Repo

Step 2: Create Python Script

Step 3: Create Github Workflow

Step 4: Commit and Push

Step 5: Run the workflow manually

Conclusion

Mastering Google Apps Script: Free Automation in Google Workspace

TLDR;

Table Of Contents

Introduction

What is Google Apps Script?

Sample Use Cases

Triggers

Pitfalls to Watch Out For

Core Services

Bound Vs Stand Alone Scripts

Deployment Considerations

Handling Environment variables

Step 2: Retrieve the Secret in Your Code

Project management

Considerations for Splitting a Project into Multiple Files

One Project vs. Multiple Projects

Limitations

1. Cannot Decrypt Password-Protected Files

2. Limited Native File Processing