DEV Community: hridyesh bisht

Serverless Computing in Kubernetes: A Developer’s Guide

hridyesh bisht — Sun, 10 Aug 2025 10:40:46 +0000

Serverless computing allows you to focus on writing business logic without managing infrastructure. While often confused with Functions as a Service (FaaS), serverless is broader. It includes event-driven execution, auto-scaling, stateless workloads, and billing based on usage rather than uptime.

This guide explores serverless from a developer's point of view using a coffee shop application deployed on Kubernetes. You also learn about cover setup, advanced use cases, observability, deployment strategies, and production readiness.

For this blog, consider a coffee shop app with the following services:

order-service (handles orders)
payment-service (processes payments)
inventory-service (manages beans and stock)

Combine Serverless and Kubernetes?

Kubernetes is designed to run containers continuously. This works well for workloads that must always be available. However, many workloads are event-driven and only need to run when triggered.

OpenFaaS extends Kubernetes to run workloads only when needed, scaling them down to zero when idle. This approach saves costs, improves resource efficiency, and accelerates development.

OpenFaaS does not replace Kubernetes, it enhances it with:

Scale to zero: No pods running = zero resource cost during inactivity.
Function templates: Developers write the logic; OpenFaaS handles packaging, networking, scaling, and observability.
Standard Kubernetes integration: Works with any Kubernetes distribution; no special infrastructure required.
Event-driven triggers: Supports HTTP, Kafka, MQTT, cron, and more.
Built-in monitoring: Integrates with Prometheus and Grafana out of the box.

For example, in the coffee shop app, the payment-service is only used during checkout. Running it all day wastes resources. With OpenFaaS:

The payment-service function spins up only when a customer checks out.
During busy hours (e.g., morning rush), it automatically scales to handle high demand.
After hours, it scales down to zero, using no resources.

Serverless in Kubernetes

Kubernetes is not inherently serverless, but open-source projects like OpenFaaS bring serverless capabilities to it. These platforms provide abstraction layers over Kubernetes primitives like Pods, Services, and Deployments.

To run a serverless function on Kubernetes, the following components are typically required:

A container image containing your function or application
A container registry to store the image
A Pod to run the container
A Service or Ingress to expose it
An autoscaler (e.g., HPA, KEDA) to handle scale
ConfigMaps and Secrets to store configuration and credentials

Note: OpenFaas supports Serverless 2.0 out of the box.

OpenFaaS

OpenFaaS enables developers to run functions and microservices on Kubernetes using Rancher or containerd. It supports:

Build templates for languages like Python, Go, Node.js
Scale to zero using Prometheus or Kubernetes HPA v2
Event triggers: HTTP, cron, Kafka, SQS, MQTT
CLI and web UI for deployment and monitoring
Secrets management
OpenFaaS Cloud for CI/CD and team-based management

Note: OpenFaaS also offers faasd, a minimal single-node alternative to Kubernetes using containerd and CNI.

For the coffee-shop app, in a real coffee shop:

An order is placed by the user
order-service validates and sends the request to inventory-service
inventory-service checks stock and updates it
payment-service processes the transaction

You could write a function to check inventory:

def handle(req):

    order \= json.loads(req)

    if order\['quantity'\] \> inventory.get\_stock(order\['item'\]):

        return "Out of stock"

    inventory.decrement(order\['item'\], order\['quantity'\])

    return "Order accepted"

And trigger it through HTTP or MQTT when new orders arrive.

Architecture of OpenFaaS

OpenFaaS is composed of the following layers:

Gateway: Handles all incoming requests
Watchdog: Converts HTTP to stdin/stdout and back
Function containers: Stateless business logic
Autoscaler: Monitors metrics and adjusts replicas
Connector SDK: Connects external events (e.g., MQTT, Kafka)
Prometheus: Collects metrics for observability
faas-cli: CLI tool for development and CI/CD

The function deployment flow with OpenFaas is:

Write your function using a template (e.g., Python, Go, Node.js).
1. Write handler logic for your function.
Package it as a container image.
Deploy to Kubernetes using faas-cli.
Invoke it through HTTP, MQTT, cron, or a message queue.
Prometheus gathers metrics; logs go to stdout.
Scale automatically based on usage or metrics.

Interacting with OpenFaaS

You can manage OpenFaaS functions in three ways:

faas-cli (recommended for scripting and CI)
Web UI (good for demos or quick insights)
REST API (custom app integration)

OpenFaaS supports various trigger mechanisms:

HTTP (default)
MQTT (great for IoT devices)
Apache Kafka
cron (time-based)
AWS SQS
MinIO
RabbitMQ

Most of these use the connector-sdk, allowing custom event bridges.

Accessing the OpenFaaS Gateway

After installing OpenFaaS (using Helm or arkade), you can access the Gateway. An HTTP API and UI that manages all deployed functions.

Forward the OpenFaaS Gateway to your local machine

kubectl rollout status \-n openfaas deploy/gateway

kubectl port-forward \-n openfaas svc/gateway 8080:8080 &

This exposes the Gateway at http://127.0.0.1:8080.

Note: If the port becomes unavailable later, rerun the port-forward command.

Some of the key features are:

Deploy New Function: From the store or using custom Rancher images
Invoke Functions: Test your functions manually with input data
Monitor Logs and Metrics: Includes basic Prometheus metrics and live logs
Manage Deployments: Delete or update existing functions

OpenFaaS CLI

The faas-cli is the primary developer interface for building, deploying, and managing OpenFaaS functions. It communicates directly with the Gateway.

Use faas-cli --help to learn about available options for each command. You can also find help for some of the commands in the OpenFaaS documentation.

For example, each store tracks daily espresso counts. An OpenFaaS function reads an MQTT message, then pushes usage stats to a central dashboard.

def handle(req):

    count \= int(req)

    if count \> 100:

        return "Daily threshold reached\!"

    return "Usage normal"

Function and Template Stores

OpenFaaS simplifies function development with two built-in stores:

Function Store

The Function Store is a curated catalog of ready-to-deploy serverless functions. These functions follow reusable patterns such as:

Image conversion
Sentiment analysis
Slack notifications
PDF generation

For example, in a coffee ordering app. You can search for functions that relate to coffee logic, like coffee-order, inspect their behavior, and deploy instantly. A deployed function could receive order data (e.g., drink type, size, customer name) and respond with a formatted confirmation receipt.

Template Store

The Template Store provides scaffolding to build your own functions using supported languages and frameworks (e.g., Python, Flask, Node.js, Go).

Templates handle:

HTTP input and response setup
Boilerplate build and deploy logic
Language-specific structure

For example, you could scaffold a payment-service function using a Python template and extend it to:

Parse JSON order data
Validate payment information
Return a payment confirmation status

Templates are extensible, you can add packages like jinja2 for HTML rendering or numpy for calculations by modifying the template's dependency file.

Observability: Prometheus + Grafana

OpenFaaS integrates with Prometheus by default to enable real-time observability.

For example, you can track:

Number of coffee orders processed
Payment success vs. failure rate
Low-stock alerts for ingredients

To access Prometheus (hidden by default for security), use port-forwarding:

kubectl \-n openfaas port-forward deployment/prometheus 9090:9090 &

Each function automatically exposes a /metrics endpoint. Prometheus scrapes this and Grafana can visualize metrics on dashboards.

Create Your First Function

OpenFaaS offers templates that scaffold functions, handling HTTP entry, code wiring, and build scripts automatically.

You can source templates from:

OpenFaaS official repo
OpenFaaS incubator or community stores
Custom template repos

To build a function from scratch, you’ll:

Choose a template (e.g., python3-flask-debian)
Generate your function using the CLI
Edit logic and dependencies
Build, push, and deploy to OpenFaaS

Templates are pulled from the Template Store. You can use community-curated templates or your own custom versions. Each function includes:

lang: the template type
handler: the path to your business logic
image: the container image to publish

Once set up, the CLI can build, push, and deploy your function in a single command. This creates a Kubernetes deployment behind the scenes, ready to accept HTTP requests.

You can run each step individually or use faas-cli up:

faas-cli up -f order-service.yml

This does the following:

builds the container locally via Rancher.
pushes the image to your registry.
deploys via OpenFaaS API → Kubernetes → Pod.

Templating with Jinja2

For functions that return HTML, Jinja2 can render dynamic content using variables. In the coffee app, a receipt template could include placeholders for:

Customer name
Coffee type
Timestamp

This improves user-facing responses without hardcoding the output.

Note: To include large or compiled packages (e.g., NumPy or Flask), use a Debian-based template like python3-debian. These templates support native compilation and pip installs that Alpine-based templates might not.

Controlling HTTP Responses in OpenFaaS

When you need precise control over status codes, headers, and response types (like JSON or binary), OpenFaaS offers flexible templates, especially python3-flask and python3-http. These allow you to build rich APIs with familiar HTTP semantics.

For example, consider the payment-service in a coffee shop. It needs to return:

A 201 Created status for successful payments
A 400 Bad Request for invalid inputs
Custom headers with order IDs and trace IDs
Structured JSON for frontend integration

These templates allow you to define all of the above without additional tools.

Serving Static Sites and Microservices

With HTTP-based templates, you can also serve static content or build lightweight services.

For example, a function in Coffee Shop menu could serve:

menu.html for your store’s website.
Promotional flyers as PDFs.
Static assets such as HTML, CSS, or JSON.

Functions with Secrets

To protect sensitive operations like payment validation or admin APIs, OpenFaaS supports secret management.

You can integrate common HTTP API authentication methods:

API Token in Header: A shared API key is sent in the request header.
HMAC (Hash-based Message Authentication Code): Used by providers like GitHub, PayPal, and Stripe to sign payloads with a shared key.
OAuth2: Delegates authentication to a third-party identity provider.

For example, your payment-service might require an API key passed as a header. The function reads the key from a mounted secret and compares it with the request input. This ensures only trusted clients can access sensitive endpoints such as payment processing or refunds.

Asynchronous Invocations

High-traffic periods, such as the morning coffee rush, can cause latency spikes. OpenFaaS supports asynchronous function calls to mitigate this.

For example, when rendering large receipt PDFs or syncing inventory with external systems, your function can be invoked asynchronously.

Async calls return an immediate acknowledgment while processing jobs in the background. You can optionally send results to a callback endpoint.

Autoscaling Functions

OpenFaaS supports both horizontal scaling and scale-to-zero based on real-time demand.

The minimum (initial) and maximum replica count can be set at deployment time by adding a label to the function.

com.openfaas.scale.min: by default, this is set to 1, which is also the lowest value and unrelated to scale-to-zero.
com.openfaas.scale.max: the current default value is 20 for 20 replicas.
com.openfaas.scale.factor: by default, this is set to 20% and has to be a value between 0-100 (including borders).

For example, if you want a function to have at least 5 replicas at all times, but to scale up to 15 when under load, set it as follows in your stack.yml file:

labels:

  com.openfaas.scale.min: 5

  com.openfaas.scale.max: 15

Horizontal Scaling

You can configure functions with:

Minimum replicas (for readiness)
Maximum replicas (to conserve resources)
Scale factor to control how fast functions scale out

For example, during peak morning hours, the coffee shop scales order-service from 2 to 10 replicas to meet demand. In off-peak hours, the function scales back down.

Scale-to-Zero and Cold Starts

You can enable cold starts by setting minimum replicas to zero. This reduces idle costs for functions like inventory-audit that run infrequently.

Kubernetes is also called “eventually consistent” and requires some tuning to get the cold-start. Cold starts in Kubernetes can take 1–2 seconds without tuning. Keep 1–5 replicas to avoid delays or use asynchronous calls to hide scaling latency.

TLS and Production Readiness

TLS is optional for local testing because kubectl port-forward already provides an encrypted tunnel. For production:

Install Ingress with TLS.
Use cert-manager for certificate management.
Route traffic over HTTPS.

Once set up, you can log in with the CLI using a secure gateway.

Advanced Use Cases

Custom HTTP Responses

Using templates like python3-http or python3-flask, you can control:

HTTP status codes (e.g., 201 Created, 500 Internal Server Error)
Custom headers (e.g., Content-Type)
JSON-formatted responses for frontend apps

For example, your function could return {"error": "Insufficient balance"} with a 402 Payment Required code.

Binary Data Handling

To support raw byte input/output (e.g., uploading a receipt image), enable RAW_BODY: True in the function’s environment.

For example, in a coffee shop's self-ordering kiosk, a function could:

Receive a JPEG from a camera
Convert it to grayscale
Return the processed image as a binary payload

Serving Static Pages

You can serve a micro-site using the python3-http template.

For example, a function named homepage could return static HTML pages like /about.html or /menu.html.

Combining OpenFaaS and MQTT for Edge Use Cases

MQTT (Message Queuing Telemetry Transport) is a lightweight, pub-sub messaging protocol designed for unreliable or constrained networks. It’s ideal for edge use cases like IoT and retail.

Some benefits of integrating OpenFaas and MQTT are:

Low bandwidth and power usage
Decouples producers and consumers
Buffers messages locally when offline
Reliable delivery once reconnected

In edge computing scenarios, OpenFaaS and MQTT work together:

MQTT brokers handle sensor data (e.g., temperature, order count).
OpenFaaS functions are triggered by these MQTT events.
Responses are logged, alerts are triggered, or orders are adjusted.

Note: For more information, refer to https://programmerprodigy.code.blog/2025/07/09/microservices-at-edge-with-k3s-and-fleet/

Balancing Containers and OpenFaaS

Choosing between a traditional cloud native app and a serverless approach with OpenFaaS is not not an "either-or" choice. The most effective cloud-native solutions often combine both to balance their strengths.

In a coffee shop app, Kubernetes container workloads are ideal for services that must always be available.

The core order-service runs continuously to ensure customers can place orders anytime.
For event-driven or infrequently used workloads, such as payment-service and inventory-service, OpenFaaS offers a more efficient, cost-effective option. It can scale these services to zero when idle, reducing unnecessary resource use.

A hybrid approach delivers the best of both worlds:

Optimize costs by running resource-intensive services only when needed.
Improve resource efficiency by reducing idle workloads.
Accelerate development by breaking down logic into small, manageable functions.
Scale intelligently to handle unpredictable traffic spikes without over-provisioning.

The key is to use the right tool for each job. Kubernetes provides persistence and control for always-on workloads, while OpenFaaS adds event-driven, scalable, and cost-efficient capabilities. Together, they enable a resilient, adaptable, and optimized cloud-native architecture.

Microservices at Edge with K3s and Fleet

hridyesh bisht — Thu, 10 Jul 2025 14:29:07 +0000

Imagine you're building a coffee shop application with three core microservices:

order-service: Handles customer orders.
payment-service: Processes transactions.
inventory-service: Tracks beans, milk, cups, and other inventory.

Now scale this across hundreds or thousands of physical store locations—each with its own on-site device like a Raspberry Pi or Intel NUC. How do you maintain consistency, ensure reliability, and keep deployments secure across all sites?

That’s where K3s (a lightweight Kubernetes distribution for edge) and Rancher Fleet(GitOps engine) are designed to manage thousands of clusters to help you.

In this guide, you learn how to build a resilient, scalable edge infrastructure using:

K3s to run Kubernetes clusters on resource-constrained devices,
Fleet to manage deployments across hundreds or thousands of clusters,
OpenFaaS and MQTT for lightweight event-driven automation and telemetry,
And strategies for handling air-gapped stores, remote access, logging, and storage.

What Is a Container Image?

Before diving into Kubernetes or K3s, it’s essential to understand container image

A container image is a lightweight, portable unit that includes everything needed to run a service that includes:

Application code
Runtime (like Python, Java, or Node.js)
System libraries and binaries
Configuration and dependencies

For example, in the coffee shop app each service: order-service, payment-service, and inventory-service is built as a container image:

order-service: Python + Flask + order logic
payment-service: Node.js + Stripe SDK
inventory-service: Go + SQLite + inventory tracker

Kubernetes (K8s) is an open-source system that automates the deployment, scaling, and management of containerized applications. It provides a robust, extensible platform for orchestrating containers across clusters of machines, simplifying the management of distributed, cloud-native systems.

Note: For more information, refer to Introduction to Container Images and Orchestration.

Understanding Edge Computing

Edge computing refers to placing compute resources as close as possible to the data source or end-user. In our case, this means deploying services directly into each coffee shop rather than relying on centralized cloud data centers.

Benefits of Edge Computing:

Real-time inventory decisions: Milk running low, auto order more.
Low-latency UX: Instant response at self-service kiosks.
Data sovereignty: Payment data stays local to comply with regulations.

Challenges at the Edge: While beneficial, edge environments present unique challenges:

Resource constraints: Devices at the edge, such as those in your coffee shops, often have lower compute and memory capacity.
Intermittent connectivity: Edge devices may not maintain a constant 24/7 internet connection.
Remote management difficulty: You cannot manually SSH into hundreds of individual store servers to update software.

K3s: Kubernetes Optimized for the Edge

K3s is a lightweight, CNCF-certified Kubernetes distribution built specifically for edge and IoT environments. Developed by Rancher, K3s simplifies cluster setup while remaining fully compatible with Kubernetes tooling and APIs. It’s ideal for resource-constrained devices—such as Raspberry Pis or Intel NUCs deployed in coffee shops or retail branches.

Key K3s Features for Edge Deployments:

Single binary distribution (~100MB) for fast installs and minimal overhead.
Embedded components: containerd (container runtime), runc, and kubectl.
Supports multiple storage backends: SQLite (via Kine), embedded etcd, and external SQL (MySQL/PostgreSQL).
Works on ARM and x86_64 platforms.
Pre-configured defaults: Includes Flannel for networking, Traefik for ingress, and metrics-server for observability.

How K3s Optimizes for the Edge

Unified Binary: Embeds everything needed—container runtime, CRI tools, control plane in one downloadable file. Makes updates easy via binary replacement.
Kine (SQLite integration): Reduces overhead by emulating etcd with SQLite, ideal for single-node clusters without consensus requirements.
Minimal System Requirements: Can run in 512MB RAM, uses minimal CPU.

Comparing K3s and Kubernetes

Capability	K3s	Upstream Kubernetes
Installation	Single binary, HTTP tunnel	kubeadm, multiple binaries
Control plane	Embedded etcd or SQLite	etcd only
Runtime	Embedded containerd + runc	External container runtime
System requirements	~500MB RAM, low CPU	1GB+ RAM, higher CPU
HA options	SQLite, etcd, SQL	etcd only
Targets	Edge, IoT, hobbyists	Data centers, cloud

There are several personas or users of Kubernetes. Refer how K3s may affect each of them:

Developers: Write standard manifests. No changes needed.
Platform Engineers: Use pre-bundled defaults to reduce setup time.
SREs/SecOps: Need to understand K3s-specific HA, upgrades, and bootstrap.

K3s renames Kubernetes control plane nodes to "servers" and worker nodes to "agents". It uses an HTTP tunnel for simplified server-agent communication, Flannel for networking by default, and Traefik for ingress.

Server: Control plane node
Agent: Worker node

K3s Architecture

K3s is purpose-built for edge and resource-constrained environments. Unlike upstream Kubernetes, which requires multiple binaries and external dependencies, K3s is distributed as a single binary. Internally, K3s embeds several core components:

containerd and runc: Used as the default container runtime.
Embedded SQLite or etcd: For storing Kubernetes cluster state. SQLite is ideal for single-node clusters; etcd enables high availability.
Built-in defaults: Includes Flannel (CNI), Traefik (Ingress), metrics-server, and Helm controller.

K3s uses a simplified bootstrapping process where agents (workers) connect to the control plane (server) via an encrypted tunnel (k3s-agent uses reverse tunnels).

Note: In smaller stores, a single-node K3s cluster can run all three services—order, payment, and inventory—on one device.

High Availability (HA) with K3s

If you want your cluster to tolerate failures (e.g., a power outage at one device), K3s supports HA using embedded etcd. This mode is recommended for clusters that run critical systems and must remain available even during node failures.

For resilience, K3s supports two HA modes:

Embedded etcd (Raft consensus): Suitable for production. Requires 3 or 5 server nodes.
External SQL DB (PostgreSQL/MySQL): Lightweight but harder to scale.

What's up with k3sup?

k3sup (pronounced "ketchup") is a community tool that simplifies K3s installation and joining agents to a cluster over SSH, making remote setup trivial. k3sup can also update your kubectl configuration file.

Why Use k3sup?

Remote installation via SSH: great for air-gapped or low-access environments.
One-line cluster provisioning:ideal for bootstrapping multiple stores quickly.
kubectl auto-merge: automatically adds new K3s clusters to your KUBECONFIG file.

Fleet: Centralized Multi-Cluster Management

Fleet is Rancher’s GitOps controller purpose-built to manage thousands of Kubernetes clusters—ideal for widespread edge deployments like your coffee shop application. While K3s runs a lightweight cluster at each store, Fleet offers centralized visibility, version control, and deployment across all of them.

Why Use Fleet with K3s?

Central control, local autonomy: You manage manifests, Helm charts, and configurations from a single Git repository. Each K3s store fetches its configuration independently.
Designed for scale: Fleet supports up to 1 million clusters and uses lightweight agents with minimal overhead.
Pull-based sync: Suitable for remote stores with dynamic IPs or unreliable connectivity.

How Fleet Works?

Git Repository as Source of Truth You store application definitions (YAML, Helm, Kustomize, or OCI bundles) in a Git repo. These include:
- fleet.yaml for configuration
- Manifests for each microservice (order.yaml, payment.yaml, inventory.yaml)
Fleet Manager (Controller) The Fleet controller runs in a central management cluster (on-prem or in the cloud). It watches Git repositories and generates bundles for deployment.
Fleet Agents in Edge Clusters Each K3s cluster runs a Fleet agent. This agent:

Connects back to the Fleet manager securely.
Pulls the correct workload bundle.
Applies the manifests locally.
Reports status.

Coffee Shop Deployment: fleet.yaml

defaultNamespace: coffee-shop  
  targets:  
  \- name: edge-shops  
    clusterSelector:  
      matchLabels:  
        location: edge  
    yaml:  
      \- path: manifests/order.yaml  
      \- path: manifests/payment.yaml  
      \- path: manifests/inventory.yaml

Benefits of Using Fleet in Edge Deployments

Resilient by Design: If a cluster goes offline, Fleet retries syncing when it's back online. No manual recovery needed.
Declarative GitOps Workflow: All changes are Git-driven and reproducible. Every store gets the same, tested configuration.
Security & Separation: Since clusters pull workloads from Fleet, they don't need inbound access or public IPs.

Remote Access

Because edge locations (coffee shops) often lack:

Static IPs
VPNs
SSH access

Fleet enables secure pull-based management, where the cluster reaches out to Fleet rather than requiring a central controller to initiate communication.

Alternatives and Tradeoffs

Method	Pros	Cons
Fleet (GitOps)	Secure, scalable, automatic retries	Requires pre-registration
VPN	Secure tunneling	Complex to set up at scale
SSH	Quick setup, compatible with k3sup	Hard to scale, brittle
Inlets	Easy tunneling of HTTP/TCP	Requires cloud relay or tunnel server

When managing 1,000+ coffee shops, you can't SSH into each branch to make changes. Most shops lack static IPs or stable internet. Instead of managing clusters by pushing changes from a central control plane, Fleet agents run inside each K3s cluster and initiate outbound connections to the central Fleet controller. This helps you:

Avoid the need for public or routable IP addresses.
Work over common outbound ports (e.g., HTTPS).
Enable auto-recovery when clusters reboot or reconnect

For example, in coffee shop app:

A barista at "Store 042" restarts their Raspberry Pi.
It boots K3s, pulls the latest manifests from Fleet via Git.
Fleet ensures order-service, payment-service, and inventory-service run correctly without manual intervention.

Air-Gapped Environment

Many remote stores may lack internet access. K3s and Fleet support full air-gapped deployments.

In these setups, you prepare everything ahead of time using USB drives or local staging networks.

Resource	Where to Load	Notes
K3s binary	Flash drive or staging laptop	Install manually
Container images	/var/lib/rancher/k3s/agent/images/	Use ctr images export and import
fleet.yaml and app manifests	Local Git repo clone or OCI registry mirror	Load from USB or portable device
Fleet agent	Bundled with app or installed offline	Uses local bundle and syncs when connected

Note: For air-gapped shops, you can use a portable Git repo or OCI registry mirror hosted temporarily on a laptop or USB stick.

Fleet still works offline by syncing from this local source. Later, when connectivity returns, these nodes can reconnect and pull updates as usual.

What Is MQTT and Why Use It at the Edge?

MQTT (Message Queuing Telemetry Transport) is a lightweight, publish-subscribe messaging protocol designed for constrained devices and unreliable networks—making it ideal for edge computing.

For example, in the Coffee Shop app, each store has:

A temperature sensor inside a fridge.
An espresso machine that counts daily cycles.
A grinder that logs wear or usage hours.

MQTT enables these devices to publish events (like temp=97°F) to a local broker running in the store (e.g., Mosquitto). Meanwhile, your backend services (e.g., an OpenFaaS function or logging collector) subscribe to relevant topics (/store104/fridge/temp).

Why MQTT at the Edge?

Offline Resilience: Devices buffer messages even when the internet is unavailable.
Loose Coupling: Producers don’t need to know who the consumers are.
Asynchronous: Ideal for non-blocking sensor data or logs.
Bandwidth Efficient: Optimized for unreliable or metered networks.

When connectivity returns, MQTT relays buffered messages to:

A central cloud MQTT broker (e.g., EMQX, HiveMQ, or AWS IoT)
A data lake, time-series database, or dashboard via Kafka or Prometheus exporters

Using OpenFaaS for Functions at the Edge

OpenFaaS (Function-as-a-Service) is a serverless framework designed to run lightweight, containerized functions on Kubernetes—including K3s at the edge.

Why Use It at the Edge?

Reduces the overhead of running full-blown apps
Fast deployment via Kubernetes CRDs
Supports HTTP, MQTT, and CRON as event sources
Integrates with container registries and GitOps workflows

For example, in a coffee Shop app consider store 14 has a Raspberry Pi that:

Receives real-time fridge temperature via MQTT.
Triggers a Python-based function when data is published.

def handle(req):  
    temp \= float(req)  
    if temp \> 95:  
        return "Warning: Too hot\!"  
    return "Temperature OK"

Other Edge Functions You Can Run:

Detect low inventory thresholds and alert inventory-service.
Sanitize incoming sensor data and write to local disk.
Detect anomalies and forward to a machine-learning model.

OpenaaS wraps this as a function, builds a container image, and deploys it via Kubernetes CRDs. Supports MQTT, HTTP, and CRON-based triggers.

Integrating MQTT and OpenFaaS

Integrating MQTT and OpenFaaS provides a powerful, event-driven model for edge computing. MQTT acts as the event transport layer, while OpenFaaS serves as the event processor. Together, they create a reactive system that processes local data in real time and only syncs to the cloud when necessary.

Consider Store 14's Fridge Monitoring in a coffee shop app:

Sensor detects fridge temperature and publishes a message:
- Topic: /store104/fridge/temp
- Payload: {"temp": 97.4, "unit": "F"}
Mosquitto (local MQTT broker) receives the message.
OpenFaaS MQTT Connector subscribes to the topic, reads the payload, and invokes a temperature-check function.
The function evaluates the temperature:
- Logs a warning if it's too high.
- Optionally triggers an alert (e.g., send webhook, write to disk, store in SQLite).
If the internet is offline, logs and events are buffered locally.
- MQTT persists the message.
- OpenFaaS writes logs to a mounted persistent volume.
When connectivity is restored:
- Logs or alerts can be forwarded to a centralized logging system or metrics database.

Logging and Storage in Edge Scenarios

In edge environments—like individual coffee shop branches—connectivity isn't guaranteed. Devices must operate independently for logging, metrics collection, and temporary data storage. Here's how to plan for these constraints.

Local Storage Considerations

Each store’s K3s device should have storage for:

System: K3s binary, operating system, configuration files.
Runtime: Container logs, telemetry, ephemeral workloads.
Persistent Data: Customer orders, payment events, inventory state.

This ensures store operations can continue even if cloud connectivity is lost.

Logging Options

K3s includes metrics-server for lightweight resource monitoring. For advanced use cases:

Prometheus/Grafana can run on each edge cluster for real-time observability.
Use Loki or Elasticsearch to forward logs when reconnected.
Local logs can be written using Kubernetes emptyDir or persistent volumes.

Note: For more information, refer to Understanding Observability with OpenTelemetry and Coffee.

Syncing Logs and Events

Logs and events generated at the edge can be buffered and forwarded later:

Use MQTT to publish sensor data or application events to a central broker.
Store logs on local disk or persistent volume.
Sync to the cloud (object storage, log aggregation system) when a connection becomes available.

Best Practices

Cache locally: Ensure services like inventory-service and payment-service write to disk.
Use queues: MQTT or Redis queues help avoid data loss when offline.
CSI Plugins: Use Kubernetes-compatible storage interfaces suited for edge devices.
Backups: Use the 3-2-1 rule: 3 copies, 2 local (disk + USB/flash), 1 remote (cloud or data center).

Addressing Common Edge Challenges with K3s and Fleet

The combined power of K3s and Fleet helps you overcome typical edge computing hurdles:

Connectivity: K3s nodes might go offline, but Fleet handles retries and synchronizes changes when network connectivity is restored.
- For scenarios where continuous connectivity is a concern, MQTT can be used to buffer and forward messages from edge devices to a central network.
Upgrades: K3s offers a simplified one-binary upgrade process, and Fleet centrally manages the redeployment of your applications across clusters.
Storage: At the edge, you need to consider storage for the operating system, K3s itself, logged data, and container images.
- Local Data: For local logs or transactions, you can use persistent volumes (if available) or write data to local disk and sync to the cloud when connected.
- Concerns: Be mindful of latency, capacity, and reliability when designing your storage solution.
- Backups: Employ the 3-2-1 backup method: three copies of data, on two different media, with one copy off-site.

Simplifying Microservices with Istio and Service Mesh Architecture

hridyesh bisht — Thu, 03 Jul 2025 07:04:23 +0000

As apps shift from monoliths to microservices, managing service-to-service communication becomes complex. Developers must handle traffic routing, retries, timeouts, load balancing, TLS encryption, metrics, and logs for each service. This leads to duplicated code and operational complexity.

Service mesh is an infrastructure layer that manages service-to-service communication transparently. Istio, a popular open-source service mesh, addresses these challenges by deploying Envoy proxies as sidecars to your pods. The proxies intercept traffic and apply consistent policies without requiring code changes.

I use a coffee shop microservices app as a running example, since I love coffee. The app includes:

order-service (handles customer orders)
payment-service (processes payments)
inventory-service (manages coffee inventory)

Core Capabilities of Istio

Istio provides powerful capabilities for traffic management, security, and observability within a microservices environment.

Capability	Description
Traffic Management	Define routing, fault injection, and load balancing using CRDs like `VirtualService`, `DestinationRule`, and `ServiceEntry`
Traffic Routing	Flexible traffic routing configurations using `VirtualService` and `DestinationRule` resources.
Resiliency	Enforce timeouts, retries, circuit breaking, and failover without changing app logic
Mesh Extension	Integrate external services, Virtual Machines (VMs), and custom Envoy configurations Apply strong authentication (AuthN) and authorization (AuthZ) using mTLS and identity-based policies
Security	Apply strong authentication (AuthN) and authorization (AuthZ) using mTLS and identity-based policies
Observability	Export telemetry (metrics, traces, logs) using integrations like SigNoz, Prometheus, Jaeger, and Kiali

Installing Istio

Istio provides flexible installation options to support different environments and use cases. You can install Istio using:

istioctl CLI
Helm charts
Istio Operator (for GitOps and declarative installs)

Istio includes predefined installation profiles optimized for different scenarios. Each profile configures control and data plane components through the IstioOperator resource.

istioctl profile list

Istio provides different profiles:

Profile	Description
default	Production-ready; installs control plane and ingress gateway.
demo	Best for demos and learning; enables tracing, logging, ingress, and egress.
minimal	Control plane only; no gateways.
external	Used in remote clusters for multi-cluster mesh; installs nothing.
empty	Baseline config for custom setups.
preview	Includes experimental features.
ambient	Sets up sidecar-less ambient mesh (Alpha; not for production).

To install Istio using the Istio CLI, you can use the --set flag and specify the profile like this:

istioctl install \--set profile=demo

Note: Use the demo profile during development to enable full telemetry. For production, switch to default to improve performance and security.

Get full configuration of a profile:

istioctl profile dump demo

Compare two profiles:

istioctl profile diff demo default

Combining Helm and Operator

You can use the IstioOperator resource alongside Helm:

Use Helm to install base components.
Use IstioOperator to apply profile-level and mesh-level configurations.

This modular setup is useful in environments where GitOps or CI/CD pipelines manage different aspects of configuration.

Istio Architecture

Istio consists of:

Data Plane: Lightweight Envoy proxies injected as sidecars to each pod. These proxies handle all ingress and egress traffic for the pod.
Control Plane: The Istiod component configures and manages the behavior of Envoy proxies by pushing policies and configuration dynamically.

The overall architecture of an Istio-based application.

Image credits: Istio documentation.

For example, each pod in the Coffee Shop app has a sidecar Envoy proxy that intercepts all traffic. This enables Istio to provide seamless:

Traffic routing
mTLS encryption
Metrics and tracing

Sidecar

Manually modifying manifests to add sidecars is error-prone and not scalable. Istio supports two approaches:

Manual Sidecar: This method involves using the istioctl CLI to manually inject sidecars into YAML manifests. You can use the CLI to inject sidecars into your YAML manifests. Run the following command:

istioctl kube-inject \-f deployment.yaml | kubectl apply \-f \-

Automatic Sidecar: This is the recommended approach for most use cases. Istio uses a Mutating Admission Webhook to inject sidecars into all pods created in a namespace labeled with istio-injection=enabled.

Note: All new pods in that namespace get sidecars injected automatically.

To enable for a namespace, run:

kubectl label namespace coffee-shop istio-injection=enabled

Routing Traffic Through Sidecars

Istio uses iptables rules or CNI plugins to transparently route traffic through the Envoy sidecar.

An init container sets up the iptables rules before the application starts. This ensures:

Outbound traffic from the app is redirected to the Envoy sidecar
Inbound traffic hits the sidecar before reaching the app

For example, a request from order-service to payment-service is transparently routed via sidecars.

Configuring Envoy Proxies

When an app makes a call to another service, that call is now intercepted by its sidecar. The job of configuring the proxies with all the information they need to handle both incoming and outgoing traffic falls to the Istio control plane.

The Istio control plane configures all Envoy proxies dynamically using xDS APIs:

Feature	Delivered via Istio Control Plane
Route discovery	`VirtualService` and `DestinationRule` updates
Load balancing	Weighted or subset-based routing
Retry and timeout	Policy enforcement without app changes
mTLS and security	Dynamic certificate provisioning

Istio automates all sidecar configurations according to the current mesh topology. Each time services are added, removed, or updated, Istio ensures that the latest configurations—network, routing, or security policies—are distributed to the appropriate sidecars.

For example, order-service wants to call payment-service:

Istio discovers endpoints of payment-service
Pushes routing config to order-service’s proxy
Applies retries, timeouts, load balancing, and TLS

Traffic Management

Istio's traffic management relies on Custom Resources (CRDs) like VirtualService, DestinationRule, and ServiceEntry to define fine-grained routing, resiliency, and fault injection policies

Ingress and Egress Gateways

Istio uses the Gateway resource to manage how Envoy proxies handle inbound and outbound traffic. Unlike Kubernetes Ingress, Istio Gateways provide richer Layer 7 routing.

Gateway Type	Purpose	Example Scenario
Ingress Gateway	Accepts external traffic into the mesh	Client → Ingress Gateway → order-service
Egress Gateway	Manages outbound traffic to external APIs	payment-service → Egress Gateway → Payment API

Example: Gateway and VirtualService Configuration

apiVersion: networking.istio.io/v1beta1

kind: Gateway

metadata:

  name: coffee-ingress

spec:

  selector:

    istio: ingressgateway

  servers:

    \- port:

        number: 80

        name: http

        protocol: HTTP

      hosts:

        \- "order.coffee.com"

VirtualService with Gateway

To route external traffic to internal services, a Gateway must be used in conjunction with a VirtualService. If a VirtualService is not bound to a Gateway, Envoy returns an HTTP 404, indicating no route has been defined.

Create a corresponding VirtualService that binds to the Gateway:

apiVersion: networking.istio.io/v1beta1

kind: VirtualService

metadata:

  name: order-route

spec:

  hosts:

  \- "\*"

  gateways:

  \- coffee-gateway

  http:

  \- route:

    \- destination:

        host: order-service.default.svc.cluster.local

        port:

          number: 80

Apply the configuration:

kubectl apply \-f order-route.yaml

Test the route:

curl \-v http://$GATEWAY_IP/

Note: Usually, istio-ingressgateway service is exposed using the Kubernetes LoadBalancer type, which assigns an external IP to receive HTTP(S) traffic.

How the LoadBalancer Kubernetes service type works depends on how and where you run the Kubernetes cluster.

Platform	LoadBalancer Behavior
AWS, GCP, Azure	Provisions a cloud load balancer and assigns external IP.
Minikube	Requires minikube tunnel to simulate external access.

For example, In coffee shop app, gateways are essential to expose services like order-service, payment-service, and inventory-service to the outside world or external systems.

Traffic Routing and Resiliency

Istio allows flexible traffic routing configurations using VirtualService and DestinationRule resources.

Resource	Purpose
VirtualService	Defines traffic routing rules to one or more destinations.
DestinationRule	Configures policies for routed traffic, such as load balancing and TLS.
ServiceEntry	Adds external services to the mesh registry.

For example, consider a coffee-shop app with these services:

Web-frontend: the UI for customers.
Customer-service: handles customer profiles.
- Two versions of customer-service: v1 and v2.

You can define different subsets of a service, typically based on labels in the pod spec (e.g., version: v1 or version: v2 for a customer-service). Pods are labeled with version: v1 or version: v2. You set subsets in a DestinationRule:

apiVersion: networking.istio.io/v1beta1

kind: DestinationRule

metadata:

  name: customer-service

spec:

  host: customer-service.default.svc.cluster.local

  subsets:

  \- name: v1

    labels:

      version: v1

  \- name: v2

    labels:

      version: v2

Routing Traffic with VirtualService

In the VirtualService, you can specify the traffic matching and routing rules that decide which destinations traffic is routed to.

Note: To generate some traffic, open a separate terminal window and start making requests to the GATEWAY_IP in an endless loop:

export GATEWAY_IP=$(kubectl get svc \-n istio-system istio-ingressgateway \-ojsonpath='{.status.loadBalancer.ingress\[0\].ip}')

while true; do curl http://$GATEWAY_IP/; done

Weight-Based Routing: Distributes traffic across different subsets of the same service based on assigned weights (e.g., 70% to v1 and 30% to v2).

apiVersion: networking.istio.io/v1beta1

kind: VirtualService

metadata:

  name: customer-service

spec:

  hosts:

  \- customer-service.default.svc.cluster.local

  http:

  \- route:

    \- destination:

        host: customer-service.default.svc.cluster.local

        subset: v1

      weight: 70

    \- destination:

        host: customer-service.default.svc.cluster.local

        subset: v2

      weight: 30

Match-Based Routing: Routes traffic based on specific conditions, such as HTTP headers (e.g., User-Agent) or URI paths.

http:

\- match:

  \- headers:

      user-agent:

        regex: ".\*Firefox.\*"

  route:

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v1

\- route:

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v2

Redirect and Rewrite: Redirects traffic (HTTP 301) to a different URI or hostname, or rewrites path prefixes before forwarding. Note that redirect and destination fields are mutually exclusive.

http:

\- match:

  \- uri:

      exact: /v1/hello

  redirect:

    uri: /v2/hello

    authority: hello.default.svc.cluster.local

Rewrite path prefix:

http:

\- match:

  \- uri:

      prefix: /v1/api

  rewrite:

    uri: /v2/api

  route:

  \- destination:

      host: customer-service.default.svc.cluster.local

The redirect and destination fields are mutually exclusive. If we use the redirect, there is no need to set the destination.

Mirroring Traffic: Sends a copy of live traffic to another service version (e.g., mirroring 100% of traffic sent to v1 to v2). This "fire and forget" mechanism is useful for testing and debugging with production traffic.

http:

\- route:

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v1

    weight: 100

  mirror:

    host: customer-service.default.svc.cluster.local

    subset: v2

  mirrorPercentage:

    value: 100.0

Header Manipulation: Allows you to add, set, or remove request and response headers, either for individual destinations or all destinations within a VirtualService.

http:

\- headers:

    request:

      set:

        debug: "true"

  route:

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v2

    weight: 30

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v1

    headers:

      response:

        remove:

        \- x-api-key

    weight: 70

In the above example, you set a request header debug: true for all traffic sent to the host. You are removing a response header called x-api-key. So, whenever the traffic reaches the subset v1, the response from the service will not include the x-api-key header.

AND Matching: Rules can combine multiple conditions using AND logic (e.g., matching a URI prefix and a specific header).

match:

\- uri:

    prefix: /v1

  headers:

    user:

      exact: debug

OR Matching: Rules can combine multiple conditions using OR logic (matching either a URI prefix or a header).

match:

\- uri:

    prefix: /v1

\- headers:

    user:

      exact: debug

If the first match does not evaluate to true, the algorithm moves to the second match field and tries to match the header. If you omit the match field on the route, it will continually evaluate to true.

Note: When using either of the two options, make sure you provide a fallback route if applicable. That way, if traffic doesn’t match any of the conditions, it could still be routed to a “default” route.

Resiliency Patterns

Istio enables the application of resiliency policies at the network layer, reducing the need for application code changes

Note: Both retries and timeouts happen on the client side.

Timeouts : If a request exceeds the timeout, Envoy responds with HTTP 408.

http:

\- route:

  \- destination:

      host: customer-service.default.svc.cluster.local

      subset: v1

  timeout: 5s

Retries: If the first pod fails, Envoy retries with a different healthy endpoint.

retries:

  attempts: 3

  perTryTimeout: 2s

  retryOn: gateway-error,connect-failure,reset

Circuit Breaking with Outlier Detection: This prevents cascading failures by automatically rejecting requests to overloaded or unhealthy services.

Istio implements circuit breaking using outlier detection, a passive health-checking mechanism. Envoy doesn't actively probe services but observes runtime metrics such as failure rate, latency, and connection health.

apiVersion: networking.istio.io/v1beta1

kind: DestinationRule

metadata:

  name: customer-service

spec:

  host: customer-service

  trafficPolicy:

    outlierDetection:

      consecutive5xxErrors: 1

      interval: 1s

      baseEjectionTime: 3m

      maxEjectionPercent: 100

consecutive5xxErrors: Number of consecutive 5xx responses before ejection.
interval: How often Envoy checks pod health.
baseEjectionTime: Initial duration a pod remains ejected. This increases with repeated failures.
maxEjectionPercent: Caps the percentage of pods that can be ejected.

When thresholds are met, Envoy temporarily removes the unhealthy pod from the load-balancing pool. Over time, the pod is gradually reintroduced if it recovers.

Failure Injection: This allows you to simulate network failures or delays. This helps validate your service's resilience and fallback mechanisms.

Istio supports two types of fault injection in the VirtualService:

Abort: Simulate HTTP errors by terminating requests with a specified status code.
Delay: Introduce artificial latency before forwarding requests.

Abort 30% of requests:

fault:

  abort:

    percentage:

      value: 30

    httpStatus: 404

Note: If you omit the percentage field, all matching requests will be aborted.

Inject delay to 5% of requests:

fault:

  delay:

    percentage:

      value: 5

    fixedDelay: 3s

Fault injection only affects services matched by the VirtualService. It does not impact other consumers.

Extending the Istio Mesh

Istio provides mechanisms to bring external services and Virtual Machines (VMs) into the mesh, and to customize Envoy proxies.

Bringing External Services into the Mesh

Istio tracks internal services automatically. To include external or non-Kubernetes services, use the ServiceEntry custom resource. This allows you to manage traffic and apply policies like retries, timeouts, mirroring, and fault injection to external endpoints.

For example, the Coffee Shop microservices application:

payment-service needs to call an external payment API (mesh-external)
rewards-service communicates with an internal legacy database (mesh-internal)

MESH_EXTERNAL: Used for services outside the mesh (e.g., www.googleapis.com), typically with resolution: DNS.

apiVersion: networking.istio.io/v1beta1  
kind: ServiceEntry  
metadata:  
  name: googleapis-svc-entry  
spec:  
  hosts:  
  \- www.googleapis.com  
  location: MESH_EXTERNAL  
  resolution: DNS  
  ports:  
  \- number: 443  
    name: https  
    protocol: TLS

location: MESH_EXTERNAL: Specifies the service is outside the mesh.
resolution: DNS: Istio uses DNS to resolve the host.

MESH_INTERNAL: Used for services within the mesh that do not have DNS, requiring resolution: STATIC and explicit IP addresses. The hosts field is optional with STATIC resolution. You can also use workloadSelector for endpoint selection.

apiVersion: networking.istio.io/v1beta1  
kind: ServiceEntry  
metadata:  
  name: legacy-loyalty-db  
spec:  
  addresses:  
  \- 192.192.192.192/24  
  ports:  
  \- number: 27018  
    name: mongodb  
    protocol: MONGO  
  location: MESH_INTERNAL  
  resolution: STATIC  
  endpoints:  
  \- address: 10.0.0.2  
  \- address: 10.0.0.3

Note: The hosts field is optional when using STATIC resolution.

Outbound Traffic Policy: The REGISTRY_ONLY outbound traffic policy can be configured to ensure traffic is only allowed to known services registered in the mesh.

Configure Mesh to Registry-Only:

istioctl install \--set profile=demo \--set meshConfig.outboundTrafficPolicy.mode=REGISTRY_ONLY

Confirm Configuration:

kubectl get cm \-n istio-system istio \-o yaml | grep outboundTrafficPolicy

Centralized Egress via Gateway

Use an Egress Gateway to manage and monitor all outbound traffic. This setup enables centralized TLS termination, access control, and observability.

Required resources:

AuthorizationPolicy
Gateway
VirtualService
DestinationRule

Onboarding VMs into the Mesh

VMs can join the mesh using the WorkloadEntry and WorkloadGroup resources. Istio treats VMs similarly to Kubernetes pods, assigning identities based on namespace and service account.

The general procedure for onboarding a VM can be summarized by the following steps:

Install the Istio sidecar using .deb or .rpm
Define a WorkloadGroup:

  apiVersion: networking.istio.io/v1beta1  
  kind: WorkloadGroup  
  metadata:  
    name: barista-vm  
    namespace: coffee-shop  
  spec:  
    metadata:  
      labels:  
        app: barista-service  
    template:  
      serviceAccount: barista-account

Configure the east-west gateway: ./samples/multicluster/gen-eastwest-gateway.sh \--single-cluster | istioctl install \-y \-f \-
Expose istiod: kubectl apply \-n istio-system \-f ./samples/multicluster/expose-istiod.yaml
Generate and copy configs: istioctl x workload entry configure \-f barista.yaml \-o ./output-dir
Place files in the correct locations and start the sidecar on the VM.

Communication in extending to VM:

Services in the cluster can reach VMs using DNS.
VMs can access services inside Kubernetes using mesh DNS.

An east-west gateway is necessary to enable communication between the sidecar that will be running on the VM and istiod, the Istio control plane (see the Istio documentation).

To Install the East-West Gateway and Expose Istiod

Install the east-west gateway: A./samples/multicluster/gen-eastwest-gateway.sh --single-cluster | istioctl install -y -f -
- If you list the pods in the istio-system namespace you’ll notice the istio-eastwestgateway instance was created.
Expose istiod though the east-west gateway: kubectl apply -n istio-system -f ./samples/multicluster/expose-istiod.yaml

For example, consider

apiVersion: networking.istio.io/v1beta1  
kind: WorkloadGroup  
metadata:  
  name: barista-vm  
  namespace: coffee-shop  
spec:  
  metadata:  
    labels:  
      app: barista-service  
  template:  
    serviceAccount: barista-account

Customizing and Extending Envoy Proxies

Istio automatically generates the Envoy configuration for each proxy. However, for advanced use cases, you can customize this configuration and extend Envoy's functionality.

Envoy's configuration is structured into several key components:

Listeners: Network locations (IP and port) where Envoy listens for incoming connections and requests. Istio generates multiple listeners for each sidecar.
Filters: Ordered lists of processing logic that a request flows through (Listener, Network, and HTTP filters). The router filter is typically the last HTTP filter and is responsible for routing traffic.
Routes: URI/path-based traffic routing rules defined within the route configuration. These rules match incoming requests and specify where traffic should be sent.
Clusters: Groups of similar upstream hosts (destinations or servers), analogous to Kubernetes Services, that accept traffic.
Endpoints: Concrete IP:port pairs within a cluster, representing the specific addresses where traffic can be sent.

For example, when a request reaches coffee-frontend:

Envoy listens on port 15001.
Filters inspect and process the request.
Routing sends it to the barista-service cluster.
One of the barista pods (endpoint) handles the request.

You can inspect the Envoy configuration using the istioctl proxy-config command. For example:

istioctl proxy-config clusters coffee-frontend-xyz \--namespace coffee-shop

The EnvoyFilter resource allows you to customize portions of the auto-generated Envoy proxy configuration by patching existing settings. This enables updating values, adding or removing filters, or creating new listeners and clusters.

Application Scope: EnvoyFilter resources can be applied at three levels: globally (affecting all proxies in the mesh), per namespace, or to specific workloads.
Patch Location (applyTo): You can target specific configuration sections, such as LISTENER, HTTP_FILTER, NETWORK_FILTER, or CLUSTER.
Patch Target (match): The scope can be narrowed using context (e.g., SIDECAR_INBOUND, SIDECAR_OUTBOUND, GATEWAY), listener properties, route configuration, or cluster properties.
Patch Action (patch): Defines how the patch is applied, with operations like MERGE, ADD, REMOVE, INSERT_BEFORE, or INSERT_AFTER.

Example to patch with EnvoyFilter

\- applyTo: EXTENSION_CONFIG  
  patch:  
    operation: ADD  
    value:  
      name: custom-metrics  
      typed_config:  
        "@type": type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm  
        config:  
          root_id: metrics-root  
          vm_config:  
            vm_id: metrics-vm  
            runtime: envoy.wasm.runtime.v8  
            code:  
              remote:  
                http_uri:  
                  uri: http://wasm-module-uri

Extending Envoy with WebAssembly

Envoy's functionality can be extended using custom filters written in different languages:

C++: Offers native, high-performance extensions but requires rebuilding Envoy.
Lua: Script-based, suitable for simpler use cases.
WebAssembly (Wasm): Enables run-time loaded plugins compiled from languages like Rust, Go, or AssemblyScript. Wasm plugins run in a sandboxed virtual machine (VM), providing isolation and memory safety.

Wasm allows dynamic extensibility of the Envoy data plane without needing to rebuild Envoy or manually modify its configurations. Istio's istio-agent handles the distribution of Wasm plugins, fetching them from registries and mounting them into Envoy's file system.

For example, in the Coffee Shop app: Use a WASM filter to collect metrics on espresso orders handled by barista-service. This plugin runs inside the Envoy proxy and logs telemetry data.

In Istio, Wasm enables customization of the Envoy data plane without rebuilding or manually modifying Envoy configurations. It introduces dynamic extensibility to the mesh.

Wasm Plugin Deployment Workflow:

Compile Plugin
- Use your SDK to generate .wasm file
Publish to Registrydocker build -t registry.io/barista-metrics:v1 . docker push registry.io/barista-metrics:v1
Deploy with WasmPlugin

apiVersion: extensions.istio.io/v1alpha1  
kind: WasmPlugin  
metadata:  
  name: barista-metrics  
  namespace: coffee-shop  
spec:  
  selector:  
    labels:  
      app: barista-service  
  url: oci://registry.io/barista-metrics:v1  
  pluginConfig:  
    trackEspresso: true  
    debug: true

Plugin Source Options:

oci://: OCI registry
http://: Direct HTTP URL
/path/to/local: Local file path

Using WasmPlugin is preferred over EnvoyFilter as it simplifies deployment.

Security

Istio enhances security by enforcing strong authentication (AuthN) and authorization (AuthZ) policies.

Authentication

Istio issues X.509 SPIFFE-compliant certificates to each pod, based on Kubernetes ServiceAccounts.

mTLS: Ensures both client and server verify each other’s identities.
Certificates: Automatically rotated and managed by Istio agent using SDS.

Automated Identity Provisioning: Istio automates workload identity through these components:

Component	Role
Istio Agent	Runs in the sidecar, manages certificates and bootstraps Envoy
SDS	Envoy’s Secret Discovery Service; fetches certs dynamically
Istiod	Acts as the Certificate Authority (CA); issues and rotates certs

When a sidecar starts, the Istio agent sends a Certificate Signing Request (CSR) to istiod. Once verified, istiod returns a signed certificate. This identity is used for secure communication between services. Certificates are rotated automatically.

For example, in the coffee shop microservices app:

barista-service runs with a sidecar.
Istio agent requests a certificate for the barista-service account.
CA authenticates the request and returns a signed certificate.
barista-service uses this identity for secure communication.

Mutual TLS (mTLS)

Mutual TLS ensures encrypted and authenticated communication. Both client and server validate each other using their certificates. Envoy sidecars handle this process transparently.

PeerAuthentication (Inbound): Configures the mTLS mode for incoming traffic to a service or workload.

apiVersion: security.istio.io/v1beta1

kind: PeerAuthentication

metadata:

  name: default

spec:

  mtls:

    mode: STRICT

DestinationRule (Outbound): Configures the mTLS mode for outgoing traffic from a service or workload. This also applies to outgoing traffic through an egress gateway.

trafficPolicy:  
 tls:  
   mode: ISTIO_MUTUAL

mTLS Modes Overview:

Mode	Description
PERMISSIVE	Accepts both plain text and mTLS connections (default for onboarding)
STRICT	Only mTLS connections are allowed
ISTIO_MUTUAL	Uses Istio-managed certificates for mTLS (recommended default)
SIMPLE	One-way TLS (client verifies server)
MUTUAL	mTLS using custom certificates
PASSTHROUGH	Routes encrypted TLS traffic without termination
AUTO_PASSTHROUGH	Automatically forwards TLS based on SNI (no VirtualService required)

Note: You can apply mTLS at mesh, namespace, workload, or port level and these modes apply to both Ingress and Egress gateways.

Example: Set STRICT mTLS for payment-service:

apiVersion: security.istio.io/v1beta1

kind: PeerAuthentication

metadata:

  name: payment-service-mtls

  namespace: coffee-shop

spec:

  selector:

    matchLabels:

      app: payment-service

  mtls:

    mode: STRICT

Request Authentication (User Authentication)

Use RequestAuthentication to verify JWT tokens from end users. If a token is invalid or missing, the request is rejected. Valid tokens yield an authenticated principal for policy enforcement.

Example: Require JWT for customer-service:

apiVersion: security.istio.io/v1  
kind: RequestAuthentication  
metadata:  
  name: customer-jwt  
  namespace: coffee-shop  
spec:  
  selector:  
    matchLabels:  
      app: customer-service  
  jwtRules:  
  \- issuer: "https://auth.coffeeshop.com"  
    jwksUri: "https://auth.coffeeshop.com/.well-known/jwks.json"

Authorization (Access Control)

Use the AuthorizationPolicy resource to enforce fine-grained control over what services or users can access. Policies use service identities (via mTLS) and user identities (via JWT).

For example, only allow authenticated users to call customer-service:

apiVersion: security.istio.io/v1

kind: AuthorizationPolicy

metadata:

  name: require-jwt

  namespace: coffee-shop

spec:

  selector:

    matchLabels:

      app: customer-service

  action: ALLOW

  rules:

  \- from:

    \- source:

        requestPrincipals: \["\*"\]

Match Conditions: Rules can match requests based on:

Field	Description
`from`	Source identity: service accounts, IPs, JWT principals
`to`	Operation match: HTTP methods, ports, paths
`when`	Additional conditions: headers, claims, IPs

Example: Allow DELETE only from admin-service:

rules:  
\- from:  
  \- source:  
      principals: \["spiffe://cluster.local/ns/coffee-shop/sa/admin-service"\]  
  to:  
  \- operation:  
      methods: \["DELETE"\]  
      paths: \["/customers/\*"\]

Action Types in AuthorizationPolicy:

Action	Purpose
`ALLOW`	Permit matching requests
`DENY`	Block matching requests
`CUSTOM`	Delegate evaluation to a custom extension
`AUDIT`	Log matching requests without enforcing access decisions

Note: Istio evaluates policies in this order: CUSTOM → DENY → ALLOW

Best Practices

Start with a DENY-all policy and incrementally allow access using ALLOW rules.
Assign dedicated ServiceAccounts per workload to ensure identity isolation.
Use STRICT mTLS once workloads are mesh-ready.
Combine PeerAuthentication, RequestAuthentication, and AuthorizationPolicy for zero-trust enforcement.

Observability

Observability is essential for understanding and operating microservices in production. Istio provides out-of-the-box observability by capturing telemetry at the network layer through sidecar proxies.

Istio enables deep insights across services by capturing:

Metrics: Quantitative measurements such as request latency or error rates.
Traces: End-to-end request flow across services.
Logs: Context-rich records for debugging.

These signals work together. For example, a spike in latency (metrics) leads you to a specific service call (trace), and the logs explain the failure.

Note: For more information, refer to Understanding Observability with OpenTelemetry and Coffee.

For the coffee shop example,the coffee shop app has three microservices:

order-service
payment-service
inventory-service

Each service includes an injected Envoy sidecar that automatically collects and exposes telemetry.

Setup for Observability

Install Istio using the demo profile to enable full telemetry:

istioctl install \--set profile=demo \-y

This profile enables 100% trace sampling—ideal for development. In production, reduce sampling to 1% to balance overhead.

Envoy sidecars expose Prometheus scrape endpoints. Metrics can also be accessed via each pod’s Envoy admin dashboard.

Tracing and Logs with SigNoz

SigNoz is an OpenTelemetry-compatible observability tool that integrates seamlessly with Istio:

helm install signoz signoz/signoz -n platform

You can use the SigNoz UI to:

Search for traces by service (e.g., order-service)
Visualize trace duration and latency
Correlate logs and spans

Refer to SigNoz Installation Guide for setup instructions.

Optimization and Advanced Deployments

In large meshes, every sidecar receives service discovery updates for all mesh services. This can lead to:

Excessive configuration updates
Increased startup time for proxies

To limit this, use the Sidecar resource to restrict which services a workload can see.

For example, a Sidecar resource can restrict outbound traffic (egress) from a coffee-frontend workload to only communicate with order-service and payment-service within its namespace.

apiVersion: networking.istio.io/v1beta1  
kind: Sidecar  
metadata:  
  name: coffee-frontend  
  namespace: coffee-shop  
spec:  
  workloadSelector:  
    labels:  
      app: coffee-frontend  
  egress:  
  \- hosts:  
    \- "order-service.coffee-shop.svc.cluster.local"  
    \- "payment-service.coffee-shop.svc.cluster.local"

This configuration allows coffee-frontend to communicate only with order-service and payment-service, reducing its load.

Multi-Cluster Deployments

Deploying Istio across multiple Kubernetes clusters offers benefits such as high availability, failover capabilities, and organizational separation.

Network Models:
- Single Network: Pods across different clusters can communicate directly.
- Multiple Networks: East-west gateways are used to facilitate communication between clusters.
Control Plane Models:
- Single Control Plane: A single Istiod instance manages all clusters. While simpler, it represents a single point of failure.
- Per-Cluster Control Plane: Each cluster has its own Istiod instance, providing better high availability and isolation.
Mesh Models:
- Single Mesh: A unified trust domain and configuration across all clusters.
- Multi-Mesh Federation: Separate meshes can share trust bundles (root certificates), define shared ServiceEntry resources, and apply AuthorizationPolicy for secure cross-mesh communication.
Tenancy Models:
- Soft Tenancy: Achieves isolation at the namespace level.
- Hard Tenancy: Provides isolation at the cluster level with separate meshes.

Locality-Aware Load Balancing:

Use localityLbSetting to steer traffic based on geographic proximity (region, zone, sub-zone).

Failover Example: If all endpoints in us-west are unavailable, traffic fails over to us-east.

trafficPolicy:

  localityLbSetting:

    enabled: true

    failover:

      \- from: us-west

        to: us-east

Weighted Distribution: This routes 50% of the traffic to local zone us-west1-a, 30% to the neighboring zone, and 20% to a remote zone.

trafficPolicy:

  localityLbSetting:

    enabled: true

    distribute:

      \- from: "us-west1/us-west1-a/\*"

        to:

          "us-west1/us-west1-a/\*": 50

          "us-west1/us-west1-b/\*": 30

          "us-east1/us-east1-a/\*": 20

Wrapping Up: Istio as Your Mesh Barista

Istio helps transform your distributed microservices from a tangle of complexity into a well-orchestrated system. Whether it's fine-grained traffic routing, observability across services, enforcing security through mTLS, or extending the mesh with WebAssembly.

Like a well-run coffee shop, every component of your system needs to collaborate in real time. Istio acts as the operations manager behind the scenes, ensuring communication flows smoothly, issues are detected early, and only trusted interactions are allowed.

Understanding Observability with OpenTelemetry and Coffee

hridyesh bisht — Wed, 04 Jun 2025 16:39:38 +0000

Solutions are increasingly built using microservices architecture, leading to complex distributed systems. Monitoring these systems becomes challenging due to the diversity of tools, protocols, and data formats.

This blog focuses on:

Explaining the basics of OpenTelemetry, its role in observability, and the current state of observability in the industry.
Explaining how to instrument code and identify when to use manual and automatic instrumentation.
Discussing the OpenTelemetry Collector and Connector, which are responsible for processing and forwarding telemetry data.

What is OpenTelemetry?

OpenTelemetry addresses these challenges by providing a unified framework for collecting, processing, and exporting telemetry data, enabling you to gain deep insights into their apps’ behavior.

For this guide, consider a modern coffee shop app with the following microservices:

Order Service: Handles customer orders.
Payment Service: Processes payments.
Inventory Service: Manages stock levels.
Notification Service: Sends order confirmations.

OTel Collector -> Observability Backend" width="800" height="297">

Each service operates independently, possibly written in different languages and deployed across various environments. When a customer places an order, the request traverses multiple services, making it essential to have a comprehensive observability solution to monitor and troubleshoot the system effectively.

Key Benefits of OpenTelemetry:

Unified Instrumentation: Instrument your code once and send telemetry data to multiple backends without re-instrumentation.
Vendor-Neutral: Avoid vendor lock-in by using standard APIs and protocols. Since you can switch platforms without having to re-instrument your entire solution.
Unified telemetry: Combines tracing, logging, and metrics into a single framework enabling correlation of all data and establishing an open standard for telemetry data.
- Linking these parameters helps you make better decisions.
Community-Driven: Benefit from a vibrant open-source community contributing to continuous improvements.
Improved Correlation: Easily correlate data across different telemetry signals for better insights.

Three pillars of Observability

Telemetry is collected via instrumentation and flows through a pipeline that enriches, batches, and stores it for later analysis. Most observability tooling revolves around three categories of telemetry: logs, metrics, and traces.

While they share architectural similarities, such as instrumentation, ingestion, storage, and visualization. Each type presents unique challenges and is best suited to answer different types of questions.

Logs

Logs are immutable, timestamped records of discrete events. Each log entry typically contains a message and optional structured metadata. However, coming up with a standardized log format is no easy task, since different pieces of information are critical for different types of software.

You can also build logging agents and protocols to forward logs to a central location for efficient storage. For example, consider a user placing an order in a microservices-based coffee shop app. The order-service logs a line like:

{  
  "timestamp": "2025-06-01T08:43:12Z",  
  "level": "INFO",  
  "service": "order-service",  
  "message": "New order placed: latte",  
  "order\_id": "ORD-20250601-001"  
}

Metrics

Metrics help you understand a high-level view of the current state of your system. A metric is a single numerical value derived by applying a statistical measure to a group of events.

In other words, metrics represent an aggregate. This is useful because their compact representation allows us to graph how a system changes over time.

Different Metric Types:

Counters: Total number of orders placed
Gauges: Current number of in-progress orders
Histograms: Distribution of order preparation times
Summaries: Quantiles of response times

Coffee Shop Example*:* The order-service emits a metric would be displayed.

orders_placed_total{beverage=”latte”} 1560

A Prometheus dashboard may show a sharp spike in latte orders, suggesting a promotional campaign is working or an anomaly is occurring.

Traces

To understand the larger context in distributed solution, you must identify other related events, such as the specific requests or transactions that initiated the log entry and the sequence of services or microservices involved in processing that request across the system.

Traces visualize the full journey of a single request across services. A trace consists of multiple spans, each representing a step in the request’s lifecycle. This makes it possible to reconstruct the journey of requests in the system.

Coffee Shop Example*:* A user places an order. The request flows through UI -> order-service -> payment-service -> inventory-service.

order-service -> payment-service -> inventory-service." width="800" height="326">

Each service adds a span with trace and span IDs, allowing you to view:

Total request duration
Which service caused a delay
Any failed steps in the chain

Problems with the Current Observability Approach

Logs, metrics, and traces typically live in separate systems, with different formats and tooling. This fragmentation forces you to jump between dashboards and correlate data manually. Even with shared metadata like timestamps or service names, stitching information together remains time-consuming and error-prone.

Coffee Shop Example: Imagine a spike in failed order-service requests. You check metrics and see a high error rate. You then switch to logs, scan for failures, and try to match logs with trace IDs. Without consistent context, root cause analysis becomes guesswork.

Lack of Built-in Instrumentation in Open Source Software

Many open source libraries expose hooks but do not include native telemetry support. You must build and maintain custom adapters.

Problems this causes:

Version Compatibility: Library updates may break adapters.
Telemetry Loss: Converting data between formats can degrade signal quality.
Engineering Overhead: Teams spend time wiring telemetry instead of building features.

Coffee Shop Example*:* If the inventory-service uses a third-party stock manager with no OpenTelemetry support, you must manually instrument it or depend on its observability hooks.

What OpenTelemetry is NOT

OpenTelemetry simplifies telemetry collection and export, but it doesn’t offer end-to-end observability out of the box. It’s a toolkit, not a monitoring platform.

Not OpenTelemetry’s Job	Description
Data storage	OpenTelemetry exports data; it doesn’t store it. You’ll need systems like SigNoz, Prometheus, or Elasticsearch.
Visualization	No dashboards or charts are included. Use tools like Grafana, Jaeger, or Datadog.
Alerting	OpenTelemetry doesn’t generate alerts. Integrate it with systems that support alert rules.
Monitoring out-of-the-box	It doesn’t auto-instrument everything or provide prebuilt dashboards. You must configure and integrate.
Performance optimization	It helps identify bottlenecks, but doesn’t tune your app.

OpenTelemetry standardizes how you collect logs, metrics, and traces. It enables observability, but doesn’t deliver it on its own. You still need storage, visualization, alerting, and analysis platforms to complete the picture.

Signals in OpenTelemetry

OpenTelemetry organizes observability data into three core signals:

Signal	Purpose
Traces	Capture the lifecycle and flow of a request across services.
Metrics	Measure system and app performance over time.
Logs	Record discrete events and state changes in the app.

Each signal is independent but can be correlated to provide richer observability. OpenTelemetry’s architecture ensures signal consistency and interoperability across programming languages through its official OpenTelemetry Specification.

OpenTelemetry Specification Components

Common Terminology: Ensures a consistent vocabulary across implementations.
API Specification: Provides language-agnostic interfaces to generate telemetry (traces, metrics, logs). APIs are backend-agnostic and enable portable instrumentation.
- For more information, refer to Tracing API, Metrics API, and OpenTelemetry Logging.
SDK Specification: Defines how SDKs process, sample, and export telemetry. Ensures consistent behavior across languages.
- For more information, refer to Tracing SDK, Metrics SDK, and Logs SDK.
Semantic Conventions: Standardizes names and attributes for telemetry data (e.g., HTTP status codes, DB queries).
OpenTelemetry Protocol (OTLP): Describes a vendor-neutral transport protocol to send telemetry

Why separate API from SDK?

The API–SDK split improves modularity, portability, and vendor neutrality:

Library safety: A shared library (e.g., database driver) can safely include only the API, avoiding heavy SDK dependencies and avoiding conflicts in user apps.
Portability: You can ship apps with OpenTelemetry APIs baked in, and let platform teams decide which SDK/exporter to use later.
Flexibility: You can write your own SDK or replace components (e.g., use a custom sampler or exporter).

OpenTelemetry API vs SDK

Feature	OpenTelemetry API	OpenTelemetry SDK
Purpose	Defines interfaces to generate telemetry	Implements logic to process and export telemetry
Responsibility	Exposes functions to create spans, metrics, logs	Manages batching, sampling, context, and export
Language-specific?	Yes	Yes
Included by default?	Yes, lightweight	No, must be explicitly added and configured
Default behavior	No-op	Active when configured
Used by	App and library developers	DevOps, SREs, platform engineers
Stability	High	May evolve with backends and exporter needs
Customizable	No	Yes (exporters, processors, samplers)

For example, consider the following scenarios:

Scenario	Best choice
Open-source library with tracing support	API only (lightweight, no deps)
Production microservice exporting to Grafana	API + SDK + OTLP Exporter
CLI tool needing optional debug tracing	API (enabled conditionally with SDK)

How to Instrument Code with OpenTelemetry

OpenTelemetry supports multiple instrumentation approaches to capture telemetry from apps. Understanding these methods helps choose the right approach based on your app’s complexity, development stage, and observability goals.

OpenTelemetry classifies instrumentation into three categories, often overlapping in practice:

Category	Effort	Control	Customization	Code Changes
Automatic Instrumentation (Zero-Code)	Low	Limited	Minimal	None
Instrumentation Libraries	Moderate	Medium	Moderate	Minimal to moderate
Manual Instrumentation (Fully Code-Based)	High	Full	Full	Extensive

OpenTelemetry provides three ways to capture telemetry from your app:

1. Automatic Instrumentation

Auto-instrumentation in .NET 8 is available via the OpenTelemetry .NET Auto-Instrumentation Agent, which instruments common libraries like ASP.NET Core, HttpClient, and SQL clients at runtime.

Ideal use case: Use this to quickly add observability to .NET services without modifying source code.

Example: orders-service (.NET 8, ASP.NET Core)

Download and install the auto-instrumentation binaries .NET Auto-Instrumentation GitHub
Run the app with the auto-instrumentation profiler

set OTEL_SERVICE_NAME=orders-service
set OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
set CORECLR_ENABLE_PROFILING=1
set CORECLR_PROFILER_PATH=C:\otel-dotnet auto\OpenTelemetry.AutoInstrumentation.Native.dll
dotnet run

What it captures:

HTTP requests and responses
Outgoing HTTP/gRPC calls
SQL queries via ADO.NET

Pros:

No code changes required
Fast onboarding
Works well for ASP.NET Core, Entity Framework, and HttpClient

Cons:

Limited to supported libraries
Less control over span names and metadata

2. Library-Based Instrumentation

Library-based instrumentation uses the OpenTelemetry SDK and prebuilt instrumentations like AddAspNetCoreInstrumentation.

Ideal use case: You want to customize configuration and capture high-value signals without full manual control.

Example: menu-service (.NET 8, ASP.NET Core)

Install NuGet packages:

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

Configure in Program.cs:

using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .SetResourceBuilder(
                ResourceBuilder.CreateDefault()
                    .AddService("menu-service"))
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddOtlpExporter(otlp =>
            {
                otlp.Endpoint = new Uri("http://otel-collector:4317");
            });
    });

var app = builder.Build();
app.MapGet("/", () => "Hello from Menu Service");
app.Run();

What it captures:

Inbound ASP.NET Core request spans
Outbound calls (HttpClient, gRPC)
Custom span and resource metadata

Pros

Easy to configure
Integrates well with DI and hosting model
Supports enrichment and filtering

Cons

Requires adding code/configuration
Less flexible than full manual instrumentation

3. Manual Instrumentation

Manual instrumentation lets you define custom spans for critical business logic (e.g., awarding loyalty points or calculating discounts).

Ideal use case: You need to trace domain-specific logic not covered by auto or library-based methods.

Example: loyalty-service (.NET 8 Worker Service)

Install packages:

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Trace
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

Configure tracing in Program.cs:

using OpenTelemetry.Trace;
using OpenTelemetry.Resources;
using System.Diagnostics;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("loyalty-service"))
            .AddSource("LoyaltyService")
            .AddOtlpExporter(options =>
            {
                options.Endpoint = new Uri("http://otel-collector:4317");
            });
    });

var app = builder.Build();

// Start a custom span manually
var source = new ActivitySource("LoyaltyService");

using var activity = source.StartActivity("AwardLoyaltyPoints", ActivityKind.Internal);
activity?.SetTag("customer.id", "cust-123");
activity?.SetTag("points.awarded", 20);

// Simulate business logic
Console.WriteLine("Loyalty points awarded.");

What it captures:

Custom spans for logic like point calculations
Rich metadata (tags)
Correlation with other telemetry (metrics/logs)

Pros:

Full control over telemetry
Capture domain-specific operations
High value for debugging or performance tuning

Cons:

Requires development effort
Must manage span lifecycle correctly
Potential for inconsistent usage without guidelines

Overlaps and Clarifications

Instrumentation libraries sometimes provide automatic instrumentation after import, blurring the line between zero-code and code-based.
Under the hood, all approaches use some form of libraries.
Zero-code is broad and quick; libraries add customization; manual is full control.

Recommended Approach and Strategy

Start with automatic instrumentation to gain immediate insight with minimal effort.
Add instrumentation libraries where you need more coverage or framework-specific tracing.
Use manual instrumentation for critical business logic or custom metrics requiring fine-grained control.

Why use OpenTelemetry Collector?

The OpenTelemetry Collector is a vendor-agnostic, standalone service that simplifies telemetry management in production. It decouples telemetry generation from ingestion and export, offering the following benefits:

The Collector provides three core capabilities:

Function	Description
Receive	Accepts telemetry from apps, agents, or other Collectors via OTLP or other supported protocols.
Process	Filters, enriches, transforms, batches, or samples telemetry data.
Export	Sends processed data to one or more observability backends.

Key benefits

Without Collector	With Collector
Apps must export data directly to each backend	Central point of control for all telemetry
Risk of tight coupling to backend protocols	Decouples app logic from backend details
Difficult to enforce consistent processing	Apply transformations consistently
No central routing or batching	Route and batch data efficiently

Understanding OpenTelemetry Protocol (OTLP)

OTLP is the native telemetry transport used across OpenTelemetry. It standardizes how telemetry is serialized, transmitted, and received.

Key benefits:

Unified: Handles traces, metrics, and logs in one format.
Vendor-neutral: Reduces backend lock-in and removes custom exporters.
Efficient: Uses gRPC and Protobuf for high-performance streaming.
Extensible: Schema evolves without breaking compatibility.
Integrated: Collector and most observability tools support OTLP out of the box.

OTLP transport options

Transport	Encoding	Use case
gRPC	Protobuf	Default for performance and bi-directional streaming
HTTP/1.1	JSON	Debugging, human-readable payloads
HTTP/2	Protobuf	Efficient, firewall-friendly alternative to gRPC

Example: Pre-OTLP vs OTLP

Before OTLP	With OTLP
Prometheus exporter, Zipkin exporter, Fluentd plugin	One OTLP exporter and one Collector instance
Multiple exporters in each service	Centralized, simplified telemetry pipeline

In OpenTelemetry, logs are a critical signal for observability. Any data that isn’t a trace or metric is categorized as a log. Events, for instance, are specialized log entries.

Unlike traces and metrics, which OpenTelemetry implements via dedicated APIs and SDKs, logging is designed to integrate with existing logging frameworks in various programming languages. Instead of requiring a brand-new logging API, OpenTelemetry provides a Logs Bridge API that links traditional logging systems with telemetry signals such as traces and metrics.

How Logging Works in OpenTelemetry

You instrument logging using the Logs Bridge API, which connects popular logging frameworks (like Serilog, ILogger, or log4net in .NET) to OpenTelemetry’s pipeline.

Key Components

LoggerProvider: Factory for creating loggers.
Logger: Used to create log entries (LogRecord).
LogRecord: Represents a single log entry with metadata.
LogRecordExporter: Sends logs to destinations like the OpenTelemetry Collector.
LogRecordProcessor: Processes logs before they’re exported.

LogRecord Structure

A LogRecord typically includes:

timestamp: When the log occurred.
trace_id, span_id: Links to a trace/span for correlation.
severity_text: e.g., INFO, WARNING, ERROR.
body: The log message or structured content.
attributes: Custom metadata (e.g., user.id, http.method).

Example Use Case: Coffee app has a /get_coffee endpoint. When a coffee request fails due to a missing ID, the app logs this event.

logger.error(“Missing coffee ID”, extra={“http.status_code”: 400, “coffee_id”: None})

This log entry can be linked to the trace of the request, helping correlate the failure with upstream service calls and backend metrics.

Collector Configuration

The OpenTelemetry Collector decouples telemetry generation from backend concerns. It processes logs, traces, and metrics independently.

Collector Pipeline Example

receivers:
  otlp:
    protocols:
      grpc:


processors:
  batch: {}


exporters:
  logging:
    loglevel: debug


service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]

Collector Deployment Topologies

The OpenTelemetry Collector supports multiple deployment models, allowing you to tailor observability pipelines based on your architecture and scalability needs. Each topology serves different use cases—from tightly coupled microservices to centralized processing in large-scale environments.

Sidecar Deployment : OpenTelemetry Collector runs as a sidecar alongside each application instance. This setup is common in containerized environments like Kubernetes, where the Collector is injected into each Pod.

Advantages:

Low latency: The Collector runs on the same host or Pod, reducing network overhead for exporting telemetry data.
Isolation: Each service has a dedicated Collector instance, ensuring telemetry data stays service-specific and avoids cross-contamination.
Simplified trace correlation: Local logs, traces, and metrics can be more easily linked.

Ideal for microservices architectures where services operate independently and require individual telemetry pipelines.

Node Agent Deployment: a single Collector instance runs per host or node. This is typically implemented as a Kubernetes DaemonSet or similar system service in virtual machine environments.

Advantages:

Centralized control per node: One Collector handles telemetry for all services on the same node.
Resource-efficient: Fewer Collector instances are required compared to the sidecar model.
System metrics access: Can collect host-level metrics (CPU, memory, disk, etc.) in addition to application telemetry.

Ideal Use Case:

Suitable for clusters with many lightweight services that share node resources.
Often used to monitor node-level infrastructure and runtime metrics alongside service-level data.

Standalone or Gateway Deployment: The Collector runs as a dedicated service, often behind a load balancer. Applications send telemetry data remotely to this central Collector (typically over OTLP).

Advantages:

Scalability: A centralized Collector cluster can scale independently from application workloads.
Simplified configuration management: Telemetry pipelines and transformations are managed in one place.
Decoupling from application logic: Developers don’t need to worry about backend changes or exporter configurations.

Ideal Use Case:

Best suited for large-scale systems with high telemetry volume.
Useful for teams that want to offload all processing from applications and maintain a consistent observability architecture.

Benefits of OpenTelemetry Collectors:

Separation of concerns: Developers emit logs; operators manage pipelines.
Centralized management: All configuration is in one place.
Resource efficiency: Offloads processing from app.
No redeployments needed: Change pipelines without touching app code.

SigNoz with OpenTelemetry

SigNoz is a powerful observability platform built specifically for OpenTelemetry. It provides a seamless experience for collecting, storing, visualizing, and querying telemetry data, without vendor lock-in.

With OpenTelemetry, you collect signals (logs, metrics, and traces) from the coffee shop services. These signals are sent to the OpenTelemetry Collector, which processes and forwards them to SigNoz.

In our coffee shop microservices example, SigNoz plays the role of the observability backend, giving your team full visibility into traces, metrics, and logs generated by the app. Here’s how SigNoz helps the coffee shop:

Traces: Visualize how an order moves through the system, from frontend-service to payment-service and inventory-service. Identify latency bottlenecks or failed calls.
Metrics: Monitor key service-level indicators like espresso_orders_per_minute, latency, and error_rate without writing custom dashboards.
Logs: Correlate logs with trace IDs and span IDs to troubleshoot order failures (e.g., inventory out-of-stock or payment declined).

For more information, refer to:

Introduction to Container Images and Orchestration

hridyesh bisht — Wed, 07 May 2025 08:09:02 +0000

As modern apps shift toward microservices and cloud-native architectures, containers have become the standard for packaging and deploying software. However, running containers in production requires more than just building images—it demands scalable orchestration and intelligent management.

This blog introduces container images. It explains why orchestration is essential in production environments. The blog also explores Kubernetes, the industry-standard platform for container orchestration.

What Are Container Images?

A container image packages app code, its runtime, libraries, and all dependencies in a predefined, portable format. This enables consistent deployment across various environments.

Container runtimes—such as containerd, runC, and CRI-O—use these prebuilt images to create and run one or more containers. While these runtimes are effective on a single host, they lack scalability and fault tolerance required for production environments.

In production scenarios, apps must meet several critical requirements:

Fault tolerance: Automatically recover from failures.
Scalability: Adjust resources based on demand.
Efficient resource utilization: Optimize hardware usage.
Service discovery: Enable components to find each other dynamically.
External accessibility: Expose services to external clients.
Seamless updates and rollbacks: Deploy new versions without downtime.

Managing containers manually or through scripts becomes impractical as the number of containers grows. This is where container orchestrators come into play.

What Is a Container Orchestrator?

A container orchestrator automates the deployment, scaling, networking, and management of containers across multiple hosts. It treats a group of systems as a single cluster, providing:

High availability: Ensures services are always accessible.
Distributed workloads: Balances tasks across nodes.
Optimized resource allocation: Efficiently utilizes system resources.
Automated health checks and restarts: Maintains application health.

Common features of orchestrators include:

Cluster management: Combine multiple hosts into a unified cluster.
Container scheduling: Deploy containers based on resource availability.
Service discovery: Enable communication across containers, regardless of the host.
Storage binding: Attach persistent storage volumes to containers.
Load balancing: Distribute traffic across containers.
Security policies: Control access to containerized applications.
Resource optimization: Automatically manage and scale resources based on demand.

Popular container orchestrators and services:

Kubernetes: Open-source and cloud-agnostic; the industry standard for container orchestration.
Amazon ECS: A fully managed service by AWS for running Docker containers at scale.
Amazon EKS: A managed Kubernetes service by AWS.
Azure Kubernetes Service (AKS): Microsoft's managed Kubernetes offering.
Google Kubernetes Engine (GKE): Google's managed Kubernetes service.
HashiCorp Nomad: A flexible orchestrator for containers and other workloads.

Container orchestrators are platform-agnostic and can be deployed on:

Bare metal servers
Virtual machines (VMs)
On-premises infrastructure
Public clouds (AWS, Azure, Google Cloud, etc.)
Hybrid cloud environments

For instance, Kubernetes can be deployed on a local machine, in a private data center, or across public cloud services like AWS EC2, Google Compute Engine, or OpenStack.

Understanding Kubernetes

What Is Kubernetes?

Key features of Kubernetes:

Automated scheduling: Assigns containers to nodes based on resource requirements and constraints.
Extensibility: Supports custom resources and controllers without modifying the core codebase.
Self-healing: Monitors container health and replaces failed or unresponsive containers automatically.
Service discovery and load balancing: Assigns stable DNS names and IP addresses to services, distributing network traffic evenly across pods.
Automated rollouts and rollbacks: Manages application updates and configuration changes incrementally, with automatic rollbacks on failure.
Secret and configuration management: Separates sensitive data and configuration from application code, injecting secrets securely into the runtime environment.
Storage orchestration: Mounts persistent storage from various sources dynamically, based on declarative configuration.
Batch and job processing: Supports batch jobs, cron jobs, and long-running tasks with automatic retries and failure handling.

Managed Kubernetes-as-a-Service (KaaS)

Managed Kubernetes offerings simplify setup and operations, allowing you to provision production-grade clusters with minimal effort. Examples include:

Amazon EKS
Azure Kubernetes Service (AKS)
Google Kubernetes Engine (GKE)

These platforms handle cluster provisioning, scaling, patching, and security, enabling teams to focus on application development.

Kubernetes Architecture Overview

A Kubernetes cluster consists of two main node types:

Control plane nodes: Manage the cluster and maintain its desired state.
Worker nodes: Run the containerized applications.

Image credits : https://trainingportal.linuxfoundation.org

Control Plane Node

The control plane is the brain of the Kubernetes cluster, managing cluster state, responding to user requests, scheduling workloads, and ensuring the desired state matches the actual state. Users interact with the control plane using the Kubernetes API—through the CLI (kubectl), a web UI (Dashboard), or external tools.

Core components:

API Server (kube-apiserver): Exposes the Kubernetes API, validating and processing requests, and communicating with etcd.
Scheduler (kube-scheduler): Assigns pods to nodes based on resource availability and constraints.
Controller Manager (kube-controller-manager): Runs background reconciliation loops to maintain the desired cluster state.
Cloud Controller Manager: Integrates the cluster with cloud provider APIs for storage, load balancing, and node management.
etcd: Stores all configuration and state data for the cluster, using the Raft consensus algorithm for leader election and fault tolerance.

Image credits : https://trainingportal.linuxfoundation.org

For high availability(HA), replicate control plane nodes and configure them in HA mode. In HA setups, one node acts as the leader, while others remain synchronized and ready to take over if needed.

Worker Node

Pods are the smallest deployable units in Kubernetes. A pod can contain one or more containers sharing the same network and storage context.

Image credits : https://trainingportal.linuxfoundation.org

Worker nodes host the containerized applications in pods. Each worker node contains the necessary services to run and manage these pods:

Container Runtime: Executes containers. Supported runtimes include containerd, CRI-O, and Docker (via cri-dockerd).
Kubelet: Agent that communicates with the control plane and manages pods on the node.
CRI Shim: Interfaces between the Kubelet and container runtime using the Container Runtime Interface (CRI).
kube-proxy: Manages network rules and forwards traffic to the correct pods based on Kubernetes services.
Add-ons: Optional services like DNS, logging, monitoring, and dashboards.

Networking in Kubernetes

Kubernetes networking supports four main types of communication:

Container-to-container: Containers in the same pod communicate over localhost.
Pod-to-pod: Uses the "IP-per-pod" model, with each pod receiving a unique IP.
Service-to-pod: Enables load-balanced access to pods using stable service endpoints.
External-to-service: Routes external traffic into the cluster via NodePorts, Ingress, or LoadBalancers.

Container Network Interface (CNI)

Kubernetes relies on the CNI specification to configure networking. Common CNI plugins include:

Flannel
Calico
Cilium
Weave

These plugins handle IP allocation, routing, and network policies.

Kubernetes Extensibility and Ecosystem

Kubernetes has a modular, pluggable architecture, supporting the development of:

Custom resources and operators
Custom APIs and admission controllers
Custom scheduling rules and plugins

This flexibility enables you to tailor Kubernetes to your specific needs, especially in complex microservices environments.

Installing Kubernetes

You can install Kubernetes using several cluster configurations, each serving different use cases:

All-in-One Single-Node Installation: Installs both control plane and worker components on a single node. Ideal for learning, development, and testing. Not recommended for production due to lack of high availability and scalability.
Single-Control Plane and Multi-Worker Installation: Includes a single control plane node running a stacked etcd instance, managing multiple worker nodes. Suitable for small-scale environments but introduces a single point of failure.
Single-Control Plane with External etcd and Multi-Worker Installation: The control plane runs independently from an external etcd instance, improving data durability. The single control plane manages multiple worker nodes.
Multi-Control Plane and Multi-Worker Installation: High-availability setup with multiple control plane nodes, each running a stacked etcd instance forming an HA etcd cluster. Offers better fault tolerance.
Multi-Control Plane with External etcd and Multi-Worker Installation: The most robust and production-ready configuration. Each control plane node connects to a dedicated external etcd instance, all configured in a highly available cluster. Ensures maximum resilience and scalability.

As cluster complexity increases, so do the hardware and infrastructure requirements. For production environments, use a multi-node setup with high availability and redundant control planes.

When planning infrastructure, consider:

Environment: Bare metal, public cloud, private cloud, or hybrid cloud?
Operating System: Red Hat-based, Debian-based, or Windows OS?
Networking: Which CNI plugin best fits your needs?

Next Steps

For more information, refer to:

A Developer's Guide to Kubernetes Components

hridyesh bisht — Wed, 07 May 2025 07:56:08 +0000

Kubernetes is the backbone of modern cloud-native applications. It simplifies deploying, scaling, and managing containerized workloads. But for developers, understanding its core concepts—like Pods, Deployments, and Services—is essential to building scalable and resilient apps.

In this guide, you learn about Kubernetes components from a developer’s point of view, complete with real-life use cases and visual diagrams.

Understanding Kubernetes Objects

Kubernetes uses a declarative object model. Each object’s spec defines the desired state. The status reflects the current state. Common fields are:

apiVersion: API version (e.g., v1)
kind: Object type (e.g., Pod, Service)
metadata: Name, namespace, labels
spec: Desired configuration
status: System-managed status

For certain objects like Secrets and ConfigMaps, data and stringData fields are used to store key-value pairs.Kubernetes API Server accepts object definitions in a JSON format, most often such definition manifests in a YAML format which is converted by kubectl in a JSON payload and sent to the API Server.

Enable Autocompletion (Optional)

Autocompletion enhances your CLI experience. For example, in Bash:

source <(minikube completion bash)

Nodes

A node is a physical or virtual machine in a Kubernetes cluster. Each node runs the following:

kubelet: Ensures containers in a Pod are running.
kube-proxy: Manages network rules for communication.
Container runtime: Runs containers (e.g., Docker, containerd).

In a cloud-native e-commerce app, each node could run pods handling different services—payments, inventory, or recommendations.

There are two types of nodes:

Control plane nodes: Manage the cluster. They run the API server, scheduler, controller manager, and etcd.
Worker nodes: Run application workloads.

Image credits : https://trainingportal.linuxfoundation.org

Namespaces

Namespaces partition cluster resources and isolate workloads. They allow teams to share a cluster without interfering with each other.

Default namespaces:

default: For user-defined resources.
kube-system: For system-level components.
kube-public: Readable by all users.
kube-node-lease: Used for node heartbeats.

If multiple users and teams use the same Kubernetes cluster you can partition the cluster into virtual sub-clusters using Namespaces. The names of the resources/objects created inside a Namespace are unique, but not across Namespaces in the cluster.

To list all namespaces:

kubectl get namespaces

To create a namespace:

kubectl create namespace \<namespace-name\>

Namespaces support:

Unique resource names within each namespace
Resource isolation by team, project, or environment
Resource quotas and limits

A good practice, however, is to create additional Namespaces, as desired, to virtualize the cluster and isolate users, developer teams, apps, or tiers.

Pods

A Pod is the smallest deployable unit in Kubernetes. It can contain one or more containers that share:

The same network namespace
Storage volumes

Pods are ephemeral and typically managed by higher-level objects like Deployments.

Image credits : https://trainingportal.linuxfoundation.org

Example, nginx-pod.yaml:

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx-pod  
spec:  
  containers:  
  \- name: nginx  
    image: nginx:1.22.1  
    ports:  
    \- containerPort: 80

Apply the manifest:

kubectl apply \-f nginx-pod.yaml

Generate the manifest without creating the Pod:

kubectl run nginx-pod \--image=nginx:1.22.1

Labels and Selectors

Labels are key-value pairs used to organize, select, and manage Kubernetes objects. They are used by controllers and services to organize and manage resources, hence many objects can have the same Label(s).

Label Selectors:

Equality-based: Select resources matching specific key-value pairs.
- kubectl get pods -l env=dev
Set-based: Select resources matching a set of values.
- kubectl get pods -l 'env in (dev, qa)'

Image credits : https://trainingportal.linuxfoundation.org

Use labels to group resources logically (e.g., by environment, app version).

ReplicaSets

A ReplicaSet ensures a specific number of identical Pods are always running.

Key features:

Self-healing: Replaces failed Pods
Scalable: Supports manual and automated scaling
Uses label selectors to identify Pods

For example, consider a ReplicaSet with a replica count set to 3 for a specific Pod template. Pod-1, Pod-2, and Pod-3 are identical, running the same app container image, being cloned from the same Pod template.

Although the three Pod replicas are said to be identical, they have unique Pod name and IP address. The Pod object ensures that the application can be individually placed on any worker node of the cluster as a result of the scheduling process.

YAML Example

apiVersion: apps/v1  
kind: ReplicaSet  
metadata:  
  name: frontend  
  labels:  
    app: guestbook  
spec:  
  replicas: 3  
  selector:  
    matchLabels:  
      app: guestbook  
  template:  
    metadata:  
      labels:  
        app: guestbook  
    spec:  
      containers:  
      \- name: php-redis  
        image: gcr.io/google\_samples/gb-frontend:v3

To create the ReplicaSet:

kubectl create \-f redis-rs.yaml

Assume that one of the Pods is forced to unexpectedly terminate (due to insufficient resources, timeout, its hosting node has crashed, etc.), causing the current state to no longer match the desired state.

The ReplicaSet detects that the current state is no longer matching the desired state and triggers a request for an additional Pod to be created, thus ensuring that the current state matches the desired state.

Image credits : https://trainingportal.linuxfoundation.org

Deployments

Deployments manage the creation, deletion, and updates of Pods. A Deployment automatically creates a ReplicaSet, which then creates a Pod.

There is no need to manage ReplicaSets and Pods separately, the Deployment will manage them on our behalf.

YAML Example

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: nginx-deployment  
  labels:  
    app: nginx-deployment  
spec:  
  replicas: 3  
  selector:  
    matchLabels:  
      app: nginx-deployment  
  template:  
    metadata:  
      labels:  
        app: nginx-deployment  
    spec:  
      containers:  
      \- name: nginx  
        image: nginx:1.20.2  
        ports:  
        \- containerPort: 80

** Apply the manifest:**

kubectl apply \-f def-deploy.yaml

When you update the Pod template (for example, a container image), Kubernetes performs a rolling update. Each update creates a new ReplicaSet and marks it as a new revision.

Once the rolling update has completed, the Deployment will show both ReplicaSets A and B, where A is scaled to 0 (zero) Pods, and B is scaled to 3 Pods. This is how the Deployment records its prior state configuration settings, as Revisions.

Image credits : https://trainingportal.linuxfoundation.org

When you update the Pod template (for example, a container image), Kubernetes performs a rolling update. Each update creates a new ReplicaSet and marks it as a new revision.

DaemonSets

A DaemonSet ensures a Pod runs on all (or some) Nodes in the cluster. It's ideal for running background agents (e.g., log collectors, monitoring tools).

In contrast, the ReplicaSet and Deployment operators by default have no control over the scheduling and placement of multiple Pod replicas on the same Node.

YAML Example

apiVersion: apps/v1  
kind: DaemonSet  
metadata:  
  name: fluentd-agent  
  labels:  
    k8s-app: fluentd-agent  
spec:  
  selector:  
    matchLabels:  
      k8s-app: fluentd-agent  
  template:  
    metadata:  
      labels:  
        k8s-app: fluentd-agent  
    spec:  
      containers:  
      \- name: fluentd  
        image: quay.io/fluentd\_elasticsearch/fluentd:v4.5.2

Create the DaemonSet:

kubectl create \-f fluentd-ds.yaml

Whenever a Node is added to the cluster, a Pod from a given DaemonSet is automatically placed on it. When any one Node crashes or it is removed from the cluster, the respective DaemonSet operated Pods are garbage collected. If a DaemonSet is deleted, all Pod replicas it created are deleted as well.

Authentication, Authorization, and Admission Control

To manage Kubernetes resources, all API requests go through three control stages:

Authentication: Authenticate a user based on credentials provided as part of API requests.
Authorization: Authorizes the API requests submitted by the authenticated user.
Admission Control: Software modules that validate and/or modify user requests.

Image credits : https://trainingportal.linuxfoundation.org

Authentication

Authentication verifies the identity of a user or service making a request to the API server. Kubernetes doesn’t store user objects but supports various authentication methods:

Different user types:

Normal users: Managed externally (e.g., client certificates, static token files, OIDC).
Service accounts: Used by in-cluster processes. Automatically created per namespace and mount credentials into pods. The Service Accounts are tied to a particular Namespace and mount the respective credentials to communicate with the API server as Secrets.

If properly configured, Kubernetes can also support anonymous requests, along with requests from Normal Users and Service Accounts.

Authorization

Authorization determines whether an authenticated user is allowed to perform an action.

More than one module can be configured for one Kubernetes cluster, and each module is checked in sequence. If any authorizer approves or denies a request, then that decision is returned immediately.

Supported Modes

Node Authorization: Grants kubelet access to node- and pod-related APIs.
Attribute-Based Access Control (ABAC): Policy-based access using user attributes.
- To enable ABAC mode, you must start the API server with the --authorization-mode=ABAC option, while specifying the authorization policy with --authorization-policy-file=PolicyFile.json.
Webhook: Sends authorization requests to an external service.
- To enable the Webhook authorizer, we need to start the API server with the --authorization-webhook-config-file=SOME_FILENAME option, where SOME_FILENAME is the configuration of the remote authorization service.
Role-Based Access Control (RBAC) (Recommended).

Example: Role granting pod read access in lfs158 namespace

apiVersion: rbac.authorization.k8s.io/v1  
kind: Role  
metadata:  
  namespace: lfs158  
  name: pod-reader  
rules:  
\- apiGroups: \[""\]  
  resources: \["pods"\]  
  verbs: \["get", "watch", "list"\]

In Kubernetes, multiple Roles can be attached to subjects like users, service accounts, etc. In RBAC, you can create two kinds of Roles:

Role: Grants namespace-scoped permissions.
ClusterRole: Grants cluster-wide permissions.

Once the role is created, youcan bind it to users with a RoleBinding object. There are two kinds of RoleBindings:

RoleBinding: Binds a Role or ClusterRole to users/groups/service accounts in a namespace.
ClusterRoleBinding: Binds a ClusterRole at the cluster scope.

Example: RoleBinding for user bob

apiVersion: rbac.authorization.k8s.io/v1  
kind: RoleBinding  
metadata:  
  name: pod-read-access  
  namespace: lfs158  
subjects:  
\- kind: User  
  name: bob  
  apiGroup: rbac.authorization.k8s.io  
roleRef:  
  kind: Role  
  name: pod-reader  
  apiGroup: rbac.authorization.k8s.io

Admission Control

Admission controllers validate or modify API requests after authentication and authorization but before persistence.

Controller Types

Validating: Check request validity.
Mutating: Modify request objects.

Examples

LimitRanger: Enforces resource limits.
ResourceQuota: Enforces resource quotas.
DefaultStorageClass: Sets default storage class.
AlwaysPullImages: Forces images to always be pulled.

Enable admission controllers with:

\--enable-admission-plugins=NamespaceLifecycle,ResourceQuota,PodSecurity,DefaultStorageClass

Custom controllers can be created as admission webhooks to support dynamic, external validation or mutation.

Accessing Application Pods

Each Pod is assigned a dynamic IP address. If a Pod is restarted, Kubernetes assigns a new IP. If you're connecting directly to a Pod IP, you'll lose access when the Pod is replaced.

Image credits : https://trainingportal.linuxfoundation.org

Example: Accessing Pods by IP

For example, a client accesses Pods using their individual IPs. If one Pod fails, a new Pod is created with a different IP. The client must then detect and update its target IPs, which adds complexity and increases overhead.

To overcome this situation, Kubernetes provides a higher-level abstraction called Service, which logically groups Pods and defines a policy to access them. This grouping is achieved via Labels and Selectors.

**apiVersion: apps/v1**  
**kind: Deployment**  
**metadata:**  
  **labels:**  
    **app: frontend**  
  **name: frontend**  
**spec:**  
  **replicas: 3**  
  **selector:**  
    **matchLabels:**  
      **app: frontend**  
    **template:**  
      **metadata:**  
        **labels:**  
          **app: frontend**  
      **spec:**  
        **containers:**  
        **\- image: frontend-application**  
        **name: frontend-application**  
        **ports:**  
        **\- containerPort: 5000**

Services

A Kubernetes Service provides a stable network endpoint for a group of Pods. It automatically routes traffic to healthy Pods and load balances requests.

Services use label selectors to identify target Pods.

Create a Service to Expose Pods

First, define a Deployment that runs your application and labels the Pods:

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: frontend  
spec:  
  replicas: 3  
  selector:  
    matchLabels:  
      app: frontend  
  template:  
    metadata:  
      labels:  
        app: frontend  
    spec:  
      containers:  
        \- name: frontend-application  
          image: frontend-application  
          ports:  
            \- containerPort: 5000

Then, expose these Pods using a Service:

apiVersion: v1  
kind: Service  
metadata:  
  name: frontend-svc  
spec:  
  selector:  
    app: frontend  
  ports:  
    \- port: 80  
      targetPort: 5000  
      protocol: TCP

By default, the Service type is ClusterIP, which exposes the Service only within the cluster.

To apply the Service:

kubectl apply \-f frontend-svc.yaml

Or use kubectl expose:

kubectl expose deployment frontend \--name=frontend-svc \\  
  \--port=80 \--target-port=5000`

How Services Group Pods

Services use label selectors to identify groups of Pods. You can define separate Services for each group:

frontend-svc — targets Pods with app=frontend
db-svc — targets Pods with app=db

Image credits : https://trainingportal.linuxfoundation.org

When you create a Service, Kubernetes:

Assigns it a ClusterIP (accessible only inside the cluster)
Maps that ClusterIP to a list of Pod IPs and ports (called Endpoints)
Uses kube-proxy to route traffic based on IP rules

To view the Service and its endpoints:

kubectl get service,endpoints frontend-svc

The client connects to the Service via its ClusterIP. The Service forwards traffic to one of the selected Pods and performs load balancing.

The user/client now connects to a Service via its ClusterIP, which forwards traffic to one of the Pods attached to it. A Service provides load balancing by default while selecting the Pods for traffic forwarding.

Load Balancing and Failover

A Service balances traffic across all healthy Pods. When a Pod is replaced, the Service updates its endpoints list and redirects traffic to the new Pod—no changes needed in client configuration.

Note: Each endpoint includes the Pod's IP and its target port.

Kube-proxy

In Kubernetes, each node runs kube-proxy, a network proxy that maintains network rules on nodes. It enables communication to your Pods from network sessions inside or outside of your cluster.

Each node runs kube-proxy, which:

Service Management: kube-proxy watches the Kubernetes API for changes in Service and Endpoint objects, updating the node's network rules accordingly.
Traffic Routing: It uses either iptables or IPVS to handle traffic routing. By default, iptables is used, which is simple and well-supported but less efficient than IPVS.

For each new Service, kube-proxy configures iptables rules on each node to capture traffic for the Service's ClusterIP and forward it to one of the Service's endpoints. This enables any node to receive external traffic and route it internally based on the iptables rules. When a Service is removed, kube-proxy deletes the corresponding iptables rules from all nodes.

The kube-proxy agent runs on every node, and iptables rules are redundantly populated across nodes. Each iptables instance stores routing rules for the entire cluster, ensuring that Service objects implement distributed load balancing.

Image credits : https://trainingportal.linuxfoundation.org

Traffic Policies

Kubernetes Services support traffic policies that influence routing decisions:

Cluster (Default): Routes traffic to all ready endpoints, regardless of their node.
Local: Routes traffic only to endpoints on the same node as the client. If no local endpoints are available, the traffic is dropped.

You can configure these policies in your Service manifest:

apiVersion: v1  
kind: Service  
metadata:  
  name: frontend-svc  
spec:  
  selector:  
    app: frontend  
  ports:  
    \- protocol: TCP  
      port: 80  
      targetPort: 5000  
  internalTrafficPolicy: Local  
  externalTrafficPolicy: Local

Service Discovery

Kubernetes supports two service discovery mechanisms:

Environment Variables: For each active Service, Kubernetes injects environment variables into new Pods. For example: REDIS_MASTER_SERVICE_PORT.
- Note: These variables are set only when the Pod starts.
DNS-Based Discovery: Kubernetes DNS creates names like: my-svc.my-namespace.svc.cluster.local

This allows Services to be discoverable within the cluster.

ServiceType

Kubernetes Services can be exposed in different ways, defined by the type field:

Is only accessible within the cluster.
Is accessible from within the cluster and the external world.
Maps to an entity which resides either inside or outside the cluster.

Access scope is decided by ServiceType property, defined when creating the Service.

ClusterIP (default)

Exposes the Service on an internal IP, making it accessible only within the cluster.

Image credits : https://trainingportal.linuxfoundation.org

NodePort

Exposes the Service on a static port on each node’s IP. A ClusterIP Service, to which the NodePort Service routes, is automatically created.

LoadBalancer

Exposes the Service externally using a cloud provider's load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.

Image credits : https://trainingportal.linuxfoundation.org

Note: Requires cloud provider support.

ExternalIP

Maps a Service to an external IP address. Traffic that is ingressed into the cluster with the ExternalIP (as destination IP) on the Service port, gets routed to one of the Service endpoints

Image credits : https://trainingportal.linuxfoundation.org

The administrator must configure external routing.

ExternalName

Maps the Service to an external DNS name using a CNAME record. No proxying occurs.

Multi-Port Services

Expose multiple ports in a single Service:

apiVersion: v1  
kind: Service  
metadata:  
  name: my-service  
spec:  
  selector:  
    app: myapp  
  type: NodePort  
  ports:  
    \- name: http  
      protocol: TCP  
      port: 8080  
      targetPort: 80  
      nodePort: 31080  
    \- name: https  
      protocol: TCP  
      port: 8443  
      targetPort: 443  
      nodePort: 31443

This is a helpful feature when exposing Pods with one container listening on more than one port, or when exposing Pods with multiple containers listening on one or more ports.

Port Forwarding for Local Testing

Use kubectl to forward a local port to a Service for testing:

kubectl port-forward svc/frontend-svc 8080:80

This is useful for debugging applications without exposing the Service externally.

Kubernetes Ingress

Kubernetes Services support internal routing, but defining routing logic per Service leads to duplication and limited flexibility. Ingress decouples routing rules from individual Services and acts as a centralized entry point for external traffic.

Ingress defines HTTP and HTTPS routing rules and acts as a single entry point for external traffic into your cluster. It configures a Layer 7 (application layer) load balancer and supports the following capabilities:

TLS termination: Offload SSL at the edge.
Name-based virtual hosting: Route traffic by hostname.
Fanout routing: Route traffic by URL path.
Custom routing rules: Use annotations to enable advanced behaviors.
Load balancing: Distribute traffic across Service backends.

Image credits : https://trainingportal.linuxfoundation.org

Instead of accessing a Service directly, clients connect to the Ingress endpoint. The Ingress resource defines routing rules that forward requests to the appropriate Service based on hostnames and URL paths.

Note: The Ingress resource itself does not handle traffic. An Ingress Controller—such as NGINX—interprets the rules and manages request forwarding.

Example: Name-Based Virtual Hosting

Use this pattern to route traffic based on the request hostname.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: virtual-host-ingress
  namespace: default
  annotations:
    nginx.ingress.kubernetes.io/service-upstream: "true"
spec:
  ingressClassName: nginx
  rules:
  - host: blue.example.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: webserver-blue-svc
            port:
              number: 80
  - host: green.example.com
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific
        backend:
          service:
            name: webserver-green-svc
            port:
              number: 80

In this example, requests to blue.example.com and green.example.com are routed to their respective backend Services.

Image credits : https://trainingportal.linuxfoundation.org

Example: Path-Based Fanout Routing

Use this pattern to route traffic based on the URL path.

apiVersion: networking.k8s.io/v1  
kind: Ingress  
metadata:  
  name: fan-out-ingress  
  namespace: default  
  annotations:  
    nginx.ingress.kubernetes.io/service-upstream: "true"  
spec:  
  ingressClassName: nginx  
  rules:  
  \- host: example.com  
    http:  
      paths:  
      \- path: /blue  
        pathType: ImplementationSpecific  
        backend:  
          service:  
            name: webserver-blue-svc  
            port:  
              number: 80  
      \- path: /green  
        pathType: ImplementationSpecific  
        backend:  
          service:  
            name: webserver-green-svc  
            port:  
              number: 80

Requests to example.com/blue and example.com/green are routed to the corresponding Services.

The ingress is fulfilled by an Ingress Controller, which is a reverse proxy responsible for traffic routing based on rules defined in the Ingress resource.

Ingress Controller

The Ingress resource only defines routing rules. It does not route traffic on its own. An Ingress Controller is responsible for fulfilling these rules.

An Ingress Controller:

Monitors the Kubernetes API for changes to Ingress resources
Configures the Layer 7 load balancer
Acts as a reverse proxy for external traffic

Popular Ingress Controllers

NGINX Ingress Controller
AWS Load Balancer Controller
GCE L7 Load Balancer
Istio Ingress

Note: Each controller may require specific annotations. Always specify the correct ingressClassName and annotations for compatibility.

Deploy an Ingress Resource

After enabling the Ingress Controller, deploy your Ingress resource using:

kubectl create \-f virtual-host-ingress.yaml

Annotations

Annotations allow you to store non-identifying metadata on Kubernetes objects in key-value pairs. They're not used for selection but provide auxiliary information to tools.

Common use cases:

Store build/release IDs, Git branch names.
Reference logging or monitoring tools.
Annotate ingress controller data.

Example: Add annotations during Deployment creation

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: webserver  
  annotations:  
    description: "Deployment PoC \- 2 Mar 2022"  
spec:  
  ...

Resource Quotas and Limit Ranges

In multi-tenant Kubernetes clusters, it's essential to prevent any single user or team from consuming excessive resources. Kubernetes provides ResourceQuota and LimitRange objects to enforce such constraints.

Resource Quotas

ResourceQuota objects limit the aggregate resource consumption per namespace. They can restrict:

Object counts (Pods, Services, ConfigMaps, etc.)
Compute resources (CPU, memory).
Storage resources (PersistentVolumeClaims).

Example:

apiVersion: v1  
kind: ResourceQuota  
metadata:  
  name: compute-resources  
  namespace: devspace  
spec:  
  hard:  
    requests.cpu: "1"  
    limits.cpu: "2"  
    requests.memory: 1Gi  
    limits.memory: 2Gi

Limit Ranges

LimitRange objects set default request and limit values for Pods or Containers within a namespace. They ensure that containers don't consume excessive resources and help maintain cluster stability.

Example:

apiVersion: v1
kind: LimitRange
metadata:
 name: cpu-limits
 namespace: devspace
spec:
 limits:
 - default:
     cpu: 500m
   defaultRequest:
     cpu: 500m
   max:
     cpu: "1"
   min:
     cpu: 100m
   type: Container

Autoscaling

Autoscaling in Kubernetes adjusts the number of running objects based on resource utilization, availability, and requirements. There are several types of autoscalers:

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pod replicas based on CPU utilization or other select metrics.

kubectl autoscale deploy myapp \--min=2 \--max=10 \--cpu-percent=80

Vertical Pod Autoscaler (VPA)

VPA adjusts the CPU and memory requests and limits for containers based on usage. It helps optimize resource allocation for individual pods.

Cluster Autoscaler

The Cluster Autoscaler adjusts the number of nodes in your cluster when pods fail to launch due to insufficient resources or when nodes in the cluster are underutilized. In Azure Kubernetes Service (AKS), it's recommended to let the Kubernetes Cluster Autoscaler manage the required scale settings.

Job Scheduling

Jobs

A Job creates one or more Pods to perform a specific task and ensures that the specified number of Pods successfully terminate. Jobs are useful for batch processing tasks.

Configuration Options:

parallelism: Number of Pods to run in parallel.
completions: Number of successful completions needed.
activeDeadlineSeconds: Duration in seconds the Job may be active.
backoffLimit: Number of retries before marking the Job as failed.
ttlSecondsAfterFinished: Time to retain the Job after completion.

Example:

apiVersion: batch/v1  
kind: Job  
metadata:  
  name: data-cleanup  
spec:  
  template:  
    spec:  
      containers:  
      \- name: cleaner  
        image: busybox  
        command: \["sh", "-c", "cleanup.sh"\]  
      restartPolicy: Never  
  backoffLimit: 4

CronJobs

CronJobs schedule Jobs to run periodically at fixed times, dates, or intervals. They are useful for recurring tasks like backups or report generation .

Configuration Options:

schedule: Cron format schedule string.
startingDeadlineSeconds: Deadline in seconds for starting the Job if it misses its scheduled time.
concurrencyPolicy: Specifies how to treat concurrent executions.

Example:

apiVersion: batch/v1  
kind: CronJob  
metadata:  
  name: db-backup  
spec:  
  schedule: "0 1 \* \* \*"  
  jobTemplate:  
    spec:  
      template:  
        spec:  
          containers:  
          \- name: backup  
            image: backup-tool  
          restartPolicy: OnFailure

StatefulSets

StatefulSets manage the deployment and scaling of Pods with unique, persistent identities. Unlike Deployments, StatefulSets guarantee the ordering and uniqueness of Pods, making them ideal for stateful workloads.

Key Features

Persistent storage: Each Pod in a StatefulSet gets its own PersistentVolume. This volume is retained across Pod restarts or rescheduling.
Stable network identity: Each Pod receives a unique and consistent DNS name, allowing predictable network communication (for example, pod-0.service-name).
Ordered operations: Pods are created, updated, and deleted in a defined order, one at a time. This ensures safe startup, updates, and shutdowns.

Real-World Example

Use StatefulSets to deploy clustered databases or distributed systems where each node must retain its identity and storage.

Use Case: Cassandra or Redis Clusters

In a Cassandra cluster:

Each node (Pod) requires a stable hostname for cluster gossip protocol.
Each node needs its own storage volume to persist data across rescheduling.

apiVersion: apps/v1  
kind: StatefulSet  
metadata:  
  name: redis  
spec:  
  serviceName: "redis"  
  replicas: 3  
  selector:  
    matchLabels:  
      app: redis  
  template:  
    metadata:  
      labels:  
        app: redis  
    spec:  
      containers:  
        \- name: redis  
          image: redis:7.0  
          volumeMounts:  
            \- name: redis-data  
              mountPath: /data  
  volumeClaimTemplates:  
    \- metadata:  
        name: redis-data  
      spec:  
        accessModes: \[ "ReadWriteOnce" \]  
        resources:  
          requests:  
            storage: 1Gi

Custom Resources

Custom Resources are user-defined API objects that allow you to store and retrieve structured data in Kubernetes. Combined with controllers, they help automate custom workflows or represent external systems inside your cluster.

Custom Resource Definitions (CRDs)

CRDs are the most common way to add custom resources. They allow you to define custom objects without modifying the Kubernetes source code .

apiVersion: apiextensions.k8s.io/v1  
kind: CustomResourceDefinition  
metadata:  
name: databases.example.com  
spec:  
group: example.com  
versions:  
\- name: v1  
served: true  
storage: true  
schema:  
openAPIV3Schema:  
type: object  
properties:  
spec:  
type: object  
properties:  
engine:  
type: string  
scope: Namespaced  
names:  
plural: databases  
singular: database  
kind: Database  
shortNames:  
\- db

Once registered, you can create resources like:

apiVersion: example.com/v1  
kind: Database  
metadata:  
name: my-postgres-db  
spec:  
engine: postgres

Use a CRD to manage custom services like Database, Cache, or Queue, with a controller automating provisioning tasks across your infrastructure.

API Aggregation

API Aggregation is an advanced extension mechanism. It lets you run a separate API server behind the Kubernetes API, delegating requests to custom endpoints.

Your API server must implement Kubernetes-style authentication, authorization, and admission control.
You write and deploy your own API server.

This method is more complex but offers greater flexibility.

Security Contexts

Security Contexts define privilege and access controls for Pods and containers. You can use them to enforce non-root execution, set file system permissions, and limit privilege escalation.

Example: Secure Pod Configuration

apiVersion: v1  
kind: Pod  
metadata:  
  name: secure-pod  
spec:  
  securityContext:  
    runAsUser: 1000       \# Runs Pod as non-root user  
    fsGroup: 2000         \# Shared file system group  
  containers:  
  \- name: app  
    image: busybox  
    securityContext:  
      allowPrivilegeEscalation: false  \# Prevents gaining extra privileges

Note: Always run containers as a non-root user unless absolutely necessary.

Pod Security Admission

Pod Security Admission (PSA) is a built-in admission controller in Kubernetes. It enforces security standards at the namespace level by applying predefined policies.

Restricted: Strictest, enforces non-root and drops capabilities.
Baseline: Reasonably secure defaults.
Privileged: Allows full capabilities—use with caution.

Example: Enable Restricted Policy

kubectl label namespace my-namespace \\ pod-security.kubernetes.io/enforce=restricted

Network Policies

Network Policies control traffic flow to and from Pods. By default, all traffic is allowed unless restricted by a policy.

Example: Allow Only Frontend to Access Database

apiVersion: networking.k8s.io/v1  
kind: NetworkPolicy  
metadata:  
  name: allow-frontend  
spec:  
  podSelector:  
    matchLabels:  
      role: db  
  ingress:  
  \- from:  
    \- podSelector:  
        matchLabels:  
          role: frontend  
    ports:  
    \- port: 5432

Combine multiple policies to fine-tune network security for microservices.

Metrics Server

The Metrics Server is a lightweight resource monitoring component. It provides CPU and memory usage data for Pods and nodes.

Example Commands:

kubectl top pods  
kubectl top nodes

Prometheus

Prometheus is a robust monitoring tool that collects and queries time-series metrics.

Scrapes metrics from applications and Kubernetes components.
Supports alerting and visualizations via Grafana.

Example Use Case:

Monitor HTTP request rates and latency in a web application. Integrate alerts when request rates spike or response times degrade.

Helm: Kubernetes Package Manager

Helm is the de facto package manager for Kubernetes. It enables developers to deploy and manage complex applications using templated YAML files—called charts—that follow DRY (Don't Repeat Yourself) principles.

Package and deploy complex applications like WordPress, NGINX Ingress, or custom APIs using reusable charts. A chart includes:

Template files for resource definitions
Configuration values
Metadata (e.g., chart name, version)

Charts can be stored in repositories—similar to how .deb or .rpm packages are stored for Linux distributions—or in container registries.

Key Benefits

Reusable templates: Use Helm charts to define, install, and upgrade Kubernetes applications.
Version control: Roll back to previous releases with a single command.
GitOps friendly: Integrates well with tools like Argo CD and Flux for continuous delivery.

How Helm Works

Helm is a command-line tool that runs alongside kubectl and uses your existing kubeconfig file to connect securely to your cluster. It performs the following actions:

Searches chart repositories based on your criteria.
Downloads the selected chart to your local system.
Uses the Kubernetes API to deploy the resources defined in the chart.

You can also use Helm to upgrade or delete deployments.

Benefits:

DRY deployment configs
Easy upgrades/rollbacks
Integrates with GitOps workflows

Service Mesh: Advanced Service Communication

A service mesh abstracts communication between microservices into a dedicated infrastructure layer. It helps manage:

Secure communication: Enforces mutual TLS (mTLS) between Pods.
Advanced traffic routing: Supports strategies like canary releases and A/B testing.
Observability: Collects traffic metrics, latency data, and failure insights without modifying app code.

Architecture Overview

Each Pod includes a sidecar proxy that handles communication and policy enforcement. These proxies form the data plane, while a central control plane manages configuration and telemetry.

Data Plane: Handles service traffic. This usually includes sidecar proxies injected into each Pod, or node-level proxies in some implementations.
Control Plane: Manages configuration, service discovery, policy enforcement, and telemetry.

The sidecar proxy intercepts all inbound and outbound traffic to the Pod, enabling consistent policy enforcement and observability without modifying the application code.

Decoupling of networking logic from application code

Popular Service Meshes

Name	Notable Feature
Istio	Full-featured, widely adopted
Linkerd	Lightweight and CNCF-supported

Next steps

For more information, refer to:

Getting Started with minikube for Kubernetes

hridyesh bisht — Tue, 06 May 2025 15:57:10 +0000

Minikube is a lightweight tool that runs a full Kubernetes cluster on your local machine. It's ideal for developers who want to test applications, simulate production behavior, or learn Kubernetes without using cloud resources. Whether you’re developing microservices, testing Ingress routing, or deploying with Helm, Minikube provides a fast, isolated environment with full Kubernetes features.

Key Features:

Installs and manages Kubernetes components
Supports single-node and multi-node clusters
Offers both CLI and web-based access
Enables custom profiles and driver selection

Hardware Requirements

Minikube provisions local resources for cluster nodes. Minimum recommended per node:

CPU: 2 cores
Memory: 2 GB (4–8 GB recommended)
Storage: 20 GB
Internet: Required for initial setup and image downloads

Ensure your system has enough resources for both Minikube and any workloads.

Software Requirements

Minikube requires a supported Type-2 hypervisor or container runtime. It uses these to isolate Kubernetes from your host OS.

OS	Hypervisors	Container Runtimes
Linux	VirtualBox, KVM2, QEMU	Docker, Podman
macOS	VirtualBox, HyperKit, VMware Fusion, Parallels	Docker, Podman
Windows	VirtualBox, Hyper-V, VMware Workstation, QEMU	Docker, Podman

Note: On Linux, you can use --driver=none to run Minikube directly on the host. This bypasses isolation and requires root access.

Create and Manage Clusters

Run the following to create your first cluster:

minikube start

This command:

Detects or uses the specified driver
Creates a VM or container with:
- 2 CPUs
- 6 GB memory
- 20 GB storage
Bootstraps Kubernetes using kubeadm
Installs Docker as the default container runtime
Creates a default profile to track cluster state

To create a multi-node cluster:

minikube start \--profile custom-cluster \--nodes 3 \--kubernetes-version=v1.28.1 \--driver=docker

Use the --profile flag to create and manage multiple clusters.

Cluster Profiles

Minikube uses profiles to store cluster configuration. By default, all commands apply to the default profile. To work with a custom profile:

minikube stop \--profile custom-cluster  
minikube start  \# Uses the default profile

Use --profile in commands to manage multiple environments.

Enable Autocompletion (Optional)

Autocompletion enhances your CLI experience. For example, in Bash:

source <(minikube completion bash)

Accessing Minikube

kubectl (CLI)

Minikube includes a bundled version of kubectl:

minikube kubectl \-- get pods

However, for convenience, install kubectl separately. It automatically detects your Minikube cluster.

Kubernetes Dashboard (Web UI)

To use the web-based UI:

addons enable metrics-server  
minikube addons enable dashboard  
minikube dashboard

This opens a browser UI for inspecting deployments, pods, and logs.

Kubernetes API Access

The API server is the entry point to the cluster. You can access it via:

kubectl and CLI tools
Web Dashboard
Custom automation using HTTP API calls

Use kubectl proxy to expose the API server:

kubectl proxy

This opens access at http://localhost:8001/.

To explore the API: curl http://localhost:8001/api/v1

To run the proxy in the background:

kubectl proxy &

Access Without a Proxy

To access the Kubernetes API directly, use an authentication token or certificate credentials.

For example, you must generate a token and grant access:

export TOKEN=$(kubectl create token default)  
export APISERVER=$(kubectl config view \--minify \-o jsonpath='{.clusters\[0\].cluster.server}')
curl $APISERVER \--header "Authorization: Bearer $TOKEN" \--insecure

Deploy and Access an NGINX Application Using the Minikube Dashboard

You’ll use the nginx container image from Docker Hub.

Start Minikube: To launch your local Kubernetes cluster, run:
- minikube start
Verify the status:
- minikube status
Launch the Dashboard :Start the web-based Kubernetes Dashboard:
- minikube dashboard

This command opens the dashboard in your default browser. By default, it connects to the default namespace, where all operations occur unless you change the context.

Note: If you reboot your machine or log out and log back in, rerun the minikube dashboard command to reopen the dashboard.

Deploying an Application - Accessing the Dashboard

To deploy the NGINX application:

In the Dashboard, select Deploy.
Choose Show Advanced Options to set:
- Labels
- Namespace
- Resource Requests
Use nginx as the container image.
Select Deploy.

The deployment creates:

A Deployment resource (for example, web-dash)
A ReplicaSet (e.g., web-dash-74d8bd488f)
A Pod (e.g., web-dash-74d8bd488f-dwbzz)

Note: Resource names are unique and may vary in your cluster. The naming pattern follows Kubernetes conventions.

Use the left navigation panel to explore the Deployment, ReplicaSet, and Pod resources.

You can:

View properties by selecting resource names.
Scale the deployment from the vertical three-dots menu.
Delete individual Pods and observe them automatically recreated.
Delete the deployment to remove all Pods.

Once we create the web-dash Deployment, we can use the resource navigation panel from the left side of the Dashboard to display details of Deployments, ReplicaSets, and Pods in the default Namespace.

Access the Application

To access the application: get the Minikube IP

minikube ip

In your browser, go to http://192.168.99.100:\. Replace <NodePort> with the actual port assigned to your service (e.g., 31074).

Minikube opens the application in your browser. You should see the default NGINX welcome page.

We can see the Nginx welcome page, displayed by the webserver application running inside the Pods created. Our requests could be served by either one of the three endpoints logically grouped by the Service since the Service acts as a Load Balancer in front of its endpoints.

Enable Ingress for Routing

Minikube includes the NGINX Ingress Controller as a built-in add-on. To enable it, run:

minikube addons enable ingress

Apply an Ingress manifest:

kubectl apply \-f virtual-host-ingress.yaml

Add entries to `/etc/hosts`:

192.168.99.100 blue.example.com green.example.com

Deploy an Ingress Resource

After enabling the Ingress Controller, deploy your Ingress resource using:

kubectl create \-f virtual-host-ingress.yaml

Configure Liveness, Readiness, and Startup Probes

Probes monitor and manage container health. If a container becomes unresponsive, the kubelet uses probes to take action (e.g., restart the container).

Rather than restarting it manually, we can use a Liveness Probe. Liveness Probe checks on an application's health, and if the health check fails, kubelet restarts the affected container automatically.

Liveness Probes

Use Liveness Probes to detect and recover from unresponsive applications.

Supported types:

Exec command
HTTP GET
TCP Socket
gRPC

Liveness Probe (Exec)

livenessProbe:  
  exec:  
    command: \["cat", "/tmp/healthy"\]  
  initialDelaySeconds: 15  
  periodSeconds: 5

Simulate file removal after startup to trigger restarts. Until the container reports "ready," the Pod will not receive traffic from a Service.

Startup Probe (HTTP)

Use Startup Probes for applications that take a long time to initialize. They prevent premature liveness and readiness checks.

startupProbe:  
  httpGet:  
    path: /startup  
    port: 8080  
  failureThreshold: 30  
  periodSeconds: 10 `

Integrating Low-Code & High-Code Solutions Effectively

hridyesh bisht — Mon, 11 Nov 2024 10:39:38 +0000

Low and high-code solutions address different needs in today's development landscape. Low-code platforms enable fast development with minimal coding, making them ideal for non-developers or projects with tight timelines. High code requires more technical expertise, offering greater customization and control over complex features. Combining low and high-code approaches can provide flexibility and speed, but successful integration and monitoring require a clear strategy.

With over three years of experience in low-code solutions, I often get questions at meetups and hackathons like:

Why should I use low-code solutions?
Can I integrate low-code and high-code solutions?
How do I architect infrastructure that supports low- and high-code?
How can I unify logs and traces to monitor my entire infrastructure?

This blog addresses these questions and helps you:

Decide when to choose low code or high code.
Integrate both solutions seamlessly.
Architect an infrastructure that supports both approaches.
Enable unified logging and tracing with tools like SigNoz.

When to Use Low-Code vs. High-Code

Low-code platforms allow you to build applications with minimal coding, letting you focus on business logic. They offer visual tools, drag-and-drop interfaces, and pre-built components that simplify development. Low-code solution enables you to:

Build and deploy applications quickly using reusable components.
Adapt to changing business needs with flexible, scalable applications.
Automate workflows, reducing the overhead of traditional software development.

Low-code is often the best choice for fast development or projects with limited technical resources. High-code, however, is ideal when customization, scalability, and control are critical. Sometimes, combining both approaches provides an effective balance for different project components. In some cases, combining both approaches can leverage their strengths, effectively balancing different project components.

For example, consider an employee onboarding app that collects employee details, generates tasks, assigns managers, and integrates with HR systems. You have two options:

Low-Code Approach: Use drag-and-drop components for the UI, integrate APIs, automate task generation, and set up notifications.
High-Code Approach: Manually code the UI, APIs, and business logic in Java or Python for full customization.

Benefits of Low-Code for Front-End Components:

Pre-built components: Forms, task management, and notifications are ready-made.
Centralized dashboard: Easily monitor updates, reducing time on code changes and bug fixes.
Built-in connectors: Common HR integrations like SAP reduce custom code requirements.
Rapid prototyping: Build and adapt prototypes based on feedback.

Integrating Low Code and High Code Solutions

Combining low-code and high-code solutions lets you leverage both rapid development and customization. This approach creates a flexible, scalable architecture supporting speed and control. Key benefits include:

Flexibility: Design each component with the best approach for its needs. Use low-code for user interfaces and workflows and high-code for backend logic.
Scalability: Quickly iterate on low-code components for front-end changes while handling complex business logic with high-code.
Efficient collaboration: Non-developers can build front-end features in low-code while developers focus on high-code elements, maintaining functionality.

Despite the benefits, integrating low and high-code solutions comes with challenges that require careful planning:

Compatibility: Low-code platforms may have limited interoperability with custom high-code solutions.
Data synchronization: Ensure seamless data flow between components, maintaining consistent formats and structures.
API limitations: Restricted APIs in low-code platforms can complicate consistent bidirectional integration with high-code systems.

In the earlier example about employee onboarding, the low code for the UI and the high code for backend functions were combined. In an employee onboarding system,

Low-code: Manages front-end tasks, onboarding workflows, and notifications.
High-code: Handles backend processing, complex integrations, analyzing data to track metrics and generating documents.

This approach lets non-technical users manage the onboarding workflow while developers focus on implementing secure and efficient backend operations.

Architecting an Infrastructure with Low-Code and High-Code Solutions

When architecting an infrastructure with both low-code and high-code, address compatibility, data flow, and scalability:

Separation of concerns: Use microservices or modular components to separate low-code and high-code tasks, allowing flexibility in updates and maintenance.
Data flow and APIs: Establish clear data pathways using APIs, creating a bridge between low-code and high-code.
Scalability: Design the infrastructure to scale specific components independently, supporting growth without overhauling the system.

To integrate both, you can use Middleware to manage communication between low- and high-code components using an API gateway. An API gateway centralizes communication, manages authentication, and routes requests to the appropriate services.

The benefits of architecting an Infrastructure with Low-Code and High-Code Solutions are:

Scalability: By separating low-code and high-code tasks, each component can be scaled based on specific usage requirements, optimizing resource allocation.
Efficient Collaboration: Developers and non-developers can collaborate efficiently, with each team focusing on the parts that best align with their expertise.

For example, in the employee onboarding app. You add an extra component:

Middleware: An API gateway centralizes communication, manages authentication, and routes requests to the appropriate services. Unified Logging and Tracing

Combining low-code and high-code solutions adds flexibility but can complicate monitoring. Unified logging and tracing with observability tools like SigNoz provide a centralized view of application performance across both environments. SigNoz’s APM (Application Performance Monitoring) features enable monitoring and troubleshooting through a single dashboard.

Unified logging and tracing across both approaches lets you:

Monitor application performance: View metrics and logs across all components.
Detect issues: Identify bottlenecks in both low-code and high-code elements.
Simplify troubleshooting: Trace issues end-to-end for faster incident resolution.
Optimize infrastructure: Use insights to allocate resources more effectively.

Implementing SigNoz for Unified Observability

Instrument High Code Applications:
1. Integrate OpenTelemetry libraries in your high code applications to capture traces, metrics, and logs. This includes configuring custom traces for critical processes such as API calls, external integrations, and custom logic.
2. High code applications can leverage SigNoz’s SDKs in languages like Python, Java, and Node.js to capture detailed telemetry data.
Connect Low Code Platforms: 1.If your low-code platform supports custom logging or webhooks, configure it to send logs to SigNoz. Alternatively, integrate through REST APIs to stream logs, events, and metrics from the low-code environment into SigNoz.

Use SigNoz’s UI to view real-time logs and traces across low- and high-code components. A unified dashboard allows you to correlate data from different sources, providing comprehensive insights and aiding in root-cause analysis.

Consider an employee onboarding app, and you need to add another layer to your infrastructure. An observability layer to:

Aggregates all telemetry data from low and high code layers, allowing centralized monitoring and troubleshooting.

Present real-time performance insights, helping teams monitor low and high code components.

Best Practices for Unified Observability

Standardize log formats: Ensure consistent log formats across components for seamless parsing.
Use trace IDs: Connect transactions across low-code and high-code environments for end-to-end traceability.
Monitor and adjust metrics regularly: Track key metrics and adjust infrastructure as needed.

Combining low-code and high-code solutions can empower your development strategy, providing the agility to build while retaining control over complex features. You can optimize performance and streamline collaboration across teams by architecting a unified infrastructure and leveraging robust observability tools like SigNoz. Whether you're building rapid prototypes or integrating with advanced systems, a thoughtful blend of low-code and high-code approaches offers the best of both worlds, making your applications adaptable, scalable, and resilient.

How to effectively store and ANALYSE logs in AWS CLOUD

hridyesh bisht — Wed, 22 Mar 2023 15:12:53 +0000

Services and applications typically create logs that contain significant amounts of information. This information is logged and stored on persistent storage, allowing it to be reviewed and analyzed at any time.

By monitoring the data within your logs, you can quickly identify potential issues you want to be made aware of as soon as they occur. Resolving an incident as quickly as possible is paramount for developing real-life solutions.

Having more data about how your environment is running far outweighs the disadvantage of needing more information, especially when it matters to your business in the case of incidents and security breaches.

Some logs can be monitored in real-time, allowing automatic responses to be carried out depending on the data contents of the log. Logs often contain vast amounts of metadata, including date stamps and source information such as IP addresses or usernames.

This blog aims to help you understand how to store and analyze logs in AWS and some recommended practices.

Storing and analyzing logs in AWS can be done using various services and tools, depending on your requirements and use case. Here are some services you can consider:

A. AWS CloudWatch

AWS CloudWatch provides valuable insights into the health and performance of your applications and resources, which can help you optimize their performance, increase availability, and improve the overall customer experience. Various components of Amazon CloudWatch include:

1. Dashboards: You can quickly and easily design different dashboards to represent the data by building your views. For example, you can view all performance metrics and alarms from resources relating to a specific customer.

Once you have built your Dashboards, you can easily share them with other users, even those who may not have access to your AWS account.

Note: The resources within your customised dashboard can be from multiple different regions.

2. Metrics: You can monitor a specific element of an application or resource over time, for example, the number of DiskReadson in an EC2 instance.

Anomaly detection allows CloudWatch to implement machine learning algorithms against your metric data to help detect any activity generally expected outside the normal baseline parameters.

Note: Different services will offer different metrics.

3. Amazon CloudWatch Alarms: You can implement automatic actions based on specific thresholds that you can configure relating to each metric.

For example, you could set an alarm to activate an auto-scaling operation if your CPU utilization of an EC2 instance peaked at 75% for more than 2 minutes.

There are three different states for any alarm associated with a metric,

OK – The metric is within the defined configured threshold.
ALARM – The metric has exceeded the thresholds set.
INSUFFICIENT_DATA – There is insufficient data for the metric to determine the alarm state.

Image credits: https://docs.aws.amazon.com/images/AmazonCloudWatch/latest/monitoring/images/CW-Overview.png

4. CloudWatch EventBridge: You connect applications to various targets, allowing you to implement real-time monitoring and respond to events in your application.

The significant benefit of using CloudWatch EventBridge is that it allows you to implement the event-driven architecture in a real-time decoupled environment.

Various elements of this feature include:

Rules: A rule acts as a filter for incoming streams of event traffic and then routes these events to the appropriate target defined within the rule. The target must be in the same region.
Targets: Targets are where the Rules direct events, such as AWS Lambda, SQS, Kinesis, or SNS. All events received are in JSON format.
Event Buses: It receives the event from your applications, and your rules are associated with a specific event bus.

5. CloudWatch Logs: You have a centralized location to store your logs from different AWS services that provide logs as an output, such as CloudTrail, EC2, VPC Flow logs, etc., in addition to your own applications.

An added advantage of CloudWatch logs comes with the installation of the Unified CloudWatch Agent, which can collect logs and additional metric data from EC2 instances as well as from on-premise services running either a Linux or Windows operating system. This metric data is in addition to the default EC2 metrics that CloudWatch automatically configures for you.

Various types of insights within CloudWatch include Log Insights, Container Insights, and Lambda Insights.

B. AWS Cloud Trail

AWS CloudTrail records and tracks all AWS API requests. It captures an API request made by a user as an event and logs it to a file it stores on S3.

CloudTrail captures additional identifying information for every event, including the requester's identity, the initiation timestamp, and the source IP address.

You can use the data captured by CloudTrail to help you enhance your AWS environment in several ways.

Security analysis tool.
Help resolve and manage day-to-day operational issues and problems.

C. AWS COnfig

AWS Config records and captures resource changes within your environment, allowing you to perform several actions against the data that help optimize resource management in the cloud.

AWS Config can track changes made to a resource and store the information, including metadata, in a Configuration Item (CI) file. This file can also serve as a resource inventory.

It can provide information on who made the change and when through AWS CloudTrail integration. AWS CloudTrail is used with AWS Config to help you identify who made the change and when and with which API.

Note: AWS Config is region specific.

Image credits: https://static.us-east-1.prod.workshops.aws/public/5659531a-ebcf-42ca-bd3f-f5b15e64cda5/static/images/logsinsights/Cloud-Watch-Insights-Query-Results.PNG

D. AWS Cloud front logs

Enabling CloudFront access logs allows you to track each user's request for accessing your website and distribution. These logs contain information about the requests made to your CloudFront distributions, such as the request's date and time, the requester's IP address, the URL path, and the response's status code.

Amazon S3 stores these logs, similar to S3 access logs, providing a durable and persistent storage solution. Although enabling logging is free, S3 will charge you for the storage used.

E. AWS VPC Flow logs

VPC Flow Logs allow you to capture IP traffic information between the network interfaces of resources within your VPC. This data aids in resolving network communication incidents and monitoring security by detecting prohibited traffic destinations.

Note: VPC Flow Logs do not store data in S3 but transmit data to CloudWatch logs.

Before creating VPC Flow Logs, you should know some limitations that may affect their implementation or configuration.

If you have a VPC peered connection, you can only view flow logs of peered VPCs within the same account.
You cannot modify its configuration once you create a VPC Flow Log. Instead, you need to delete it and create a new one to make changes.

You need an IAM role with the appropriate permissions to send your flow log data to a CloudWatch log group. You select this role during the setup configuration of VPC Flow Logs.

Recommended practices for storing and analyzing logs in AWS:

By following these best practices, you can effectively store and analyze logs in AWS and improve the reliability and performance of your applications.

Define a consistent logging format such as JSON or Apache Log Format.
Use log rotation to prevent from taking up too much storage space. You can use AWS Elastic Beanstalk helps to automate log rotation.
Set up alerts to notify you of any critical issues in your logs to quickly resolve any issues. Amazon CloudWatch helps to set up alerts based on predefined thresholds or custom metrics.
Use encryption to protect your log data in transit and at rest. Amazon S3 server-side encryption or Amazon CloudFront field-level encryption helps protect your data,
You should regularly review and analyze your logs to help identify potential issues. AWS services like Amazon Athena, Amazon Elasticsearch Service, and AWS Glue help you gain insights.
Use a log aggregation service to centralize your logs across many AWS services. AWS CloudTrail or Amazon CloudWatch Logs helps to centralize your logs.
Implement a data retention policy that specifies how long you need to keep your log data. To help avoid unnecessary storage costs and ensure compliance with regulatory requirements.

By monitoring the data within your logs, you can quickly identify potential issues you want to be made aware of as soon as they occur. In addition, by combining this monitoring of logs with thresholds and alerts, you can receive automatic notifications of potential issues, threats, and incidents, before they become production issues.

By logging what's happening within your applications, network, and other cloud infrastructure, you can build a performance baseline and establish what's routine and what isn't.

Things to know before Streaming data

hridyesh bisht — Mon, 05 Sep 2022 18:09:21 +0000

Consider times in your life when someone said something that left you speechless. It’s the ideal moment for a witty comeback, but you have nothing to say. You think of the perfect response after walking away, but it is too late. The moment has passed us. This is an example of how some data degrades value over time.

Some data comes as an unending stream of events and is best analysed while in flight. They process raw data in real-time using streams, and you save only the information and insight that is useful. Streaming data architecture enables developers to analyse time-sensitive data with greater value to generate a real-time situation.

This blog will cover the introduction to streaming data, components of streaming data architecture, integrating batch processing with stream processing, and in depth about Amazon kinesis services such as Kinesis Video Streams, Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics .

Q.What do you mean by stream processing?

Stream processing involves ingesting a continuous data stream and analysing, filtering, transforming, or improving the data in real time. This improves visibility into various areas of data activity, such as service consumption, server usage, and device geolocation.

Businesses, for example, can continuously analyse social media streams to watch changes in public attitude toward their brands and products and respond promptly.

Image credits: https://f.hubspotusercontent10.net/hubfs/4757017/stream_processing_3-01.jpg

Stream processing services and architectures are becoming increasingly popular because they enable developers to mix data feed from multiple sources, and since not all data is produced equally and its value changes.

Q.What is batch processing ?

Before stream processing, vast amounts of data were often stored in a database and processed all at once. They examined this data using batch processing because, as the name implies, they performed it all in one “batch.”

Batch processing collects, stores, and analyses data in fixed-size pieces regularly. The schedule depends on the frequency of data gathering and the related value of the insight gained. This value lies at the heart of stream processing.

There are two issues related to batch processing that impact the value of data

Batch processing systems divide data into consistent and evenly spaced time intervals. This results in a consistent workload that is predictable but not intelligent. Sessions that begin in one batch may finish up in another. This complicates the examination of connected transactions.
They have optimised batch architectures to handle enormous amounts of data at once. As a result, an analysis job may have to wait for long periods of time because the queue must be full before processing can begin. While the batch job’s size is consistent, the time in each batch of data is not.

Batch processing is built around a data-at-rest architecture. Before processing can begin, the collection has to be stopped and we must store the data. Subsequent batches of collected data bring the need to create an aggregate across multiple batches. In contrast to this, streaming architectures handle never-ending data flows naturally and with grace. Using streams, patterns detected, results inspected, and we can examine simultaneously multiple streams.

Image credits: https://www.researchgate.net/profile/Olawande-Daramola/publication/333653951/figure/tbl1/AS:767176877281282@1559920629763/Comparison-between-batch-processing-and-streaming-processing-82.png

I believe it is crucial to emphasise that batch processing is still required. Stream processing is a supplement to batch computing. Some forms of information require real-time data processing because the data has an actionable value at the collected time and its value diminishes rapidly. Steam processing was developed to solve latency, session boundaries, and unpredictable load.

Q. What are Components of Stream application?

Generally speaking, streaming data frameworks are described as having five layers; the Source, Stream Ingestion, Stream Storage, Stream Processing, and the Destination.

Data is generated by one or more sources or producers including mobile devices, meters in smart homes, click streams, IoT sensors, or logs.
Data is gathered at the Stream Ingestion Layer by one or more producers, structured as Data Records, and placed in a data stream.
1. They convert it to a common message format and actively stream it.
We store the data in the Data Stream. Before we can evaluate data with SQL-based analytics tools, data streams from one or more message brokers are gathered, converted, and formatted.
1. The outcome could be an API call, an action, a visualisation, an alert, or, in some situations, the creation of a new data stream.
2. The Stream Processing Layer is managed by Consumers. Consumers access streams, read data, then process data contained inside a stream.
3. The Consumers deliver Data Records to the fifth layer, the destination. Such as a Data Lake or Data Warehouse, durable storage, such as Amazon S3, or Amazon Redshift.

Things to know about Machine learning(ML) models on cloud

hridyesh bisht — Thu, 31 Mar 2022 12:17:45 +0000

The ability to make decisions is dependent on large volumes of historical data. And as machines are getting better at making decisions by understanding the data. Developers need to comprehend why and when to use Machine

This blog explains what ML is and how Distributed ML works. We will be covering Amazon Rekognition, Amazon Lex, Amazon Sagemaker, and Amazon EMR.

Q.What is Machine learning?

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.

Image Credits: https://analyticsinsight.b-cdn.net/wp-content/uploads/2021/08/ML-System.jpg

An example of low code Machine learning solutions. When considering Machine learning solutions, there are many use cases to consider. Let us consider a few sample use cases,

A. Analyse images and videos

Amazon Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Amazon Rekognition API, and the service can identify objects, people, text, scenes, and activities. It can detect any inappropriate content as well.

Images are uploaded to the Rekognition service in one of two ways.

Store the image file within an S3 bucket and then provide the S3 location of the image to the Rekognition service.
Base64-encode the image data and supply this as an input parameter to the API operation.

Amazon Rekognition provides highly accurate facial analysis, face comparison, and face search capabilities. Some common use cases for using Amazon Rekognition include the following:

Face-based user verification
Sentiment and demographic analysis
Images and stored videos searchable so you can discover objects and scenes that appear within them.
You can search images, store videos, and stream videos for faces that match those stored in a container known as a face collection.
Detect adult and violent content in images and stored videos. Developers can filter inappropriate content based on their business needs, using metadata.
Recognise and extract textual(text and numbers) content from images in different orientations.

Image Credits: https://aws.amazon.com/blogs/architecture/category/artificial-intelligence/amazon-rekognition/

Hence we have a separate API to processing videos and video streams. Since, Processing video files need more compute and thus several of the Video API operations are asynchronous.

With the video processing APIs, you always host the video to be processed as a file within an S3 bucket. You then supply the S3 file location as an input parameter to the respective start operation.

With Custom Labels, you can identify the objects and scenes in images and videos that are specific to your business needs. For more information, refer

B. Create a Chatbot using Amazon Lex

Amazon Lex can be used to create and embed chatbots into your applications. Internally, the Amazon Lex service uses the same deep learning engine that powers Amazon Alexa.

Amazon Lex uses automatic speech recognition(ASR) for converting speech to text, and natural language understanding(NLU) to recognise the intent of the text.

The unit of build and deployment within Amazon Lex is the bot itself. Developers can build and deploy multiple bots, each with its own set of skills and behaviours. An intent represents a kind of outcome or action that the bot may perform. A single bot can be composed of multiple intents. For each intent, you need to provide the following attributes:

Intent name: A descriptive name describing what the intent accomplishes.
Utterances: One or several phrases the user speaks or types activate the intent.
Fulfilment process: The method used to complete or fulfil the intent.

Amazon Lex also provides several built-in intents that you can leverage. Each intent may need and have to request extra attributes(slots) from the user to complete this intended outcome. Each slot you define requires you to specify a slot type. You can define and create your custom slot types, or leverage any of the inbuilt slot types.

Image Credits: https://aws.amazon.com/blogs/machine-learning/building-a-real-time-conversational-analytics-platform-for-amazon-lex-bots/

After the deployment of your chatbot, Amazon Lex provides a feature to monitor and track so-called missed utterances. For which Amazon Lex cannot match at runtime against any of the registered utterances.

Amazon Lex can integrate into other messaging platforms using channels. All network connections established to Amazon Lex are done so only using HTTPS. Hence they are encrypted, and can thus be considered secure. Additionally, the Amazon Lex API requires a signature to be calculated and supplied with any API calls.

For more information on various machine learning general use case solutions refer to,

https://aws.amazon.com/machine-learning/

If you don't prefer Low code solution, you should try looking into Sagemaker. They are good for computing and deploying your ML models, as you get AWS compute servers.

Q.What is a Sagemaker?

At its core, sagemaker is a fully managed service that provides the tools to build, train and deploy machine learning models. It has some components in it such as managing notebooks and helping label and train models deploy models with a variety of ways to use endpoints.

SageMaker algorithms are available via container images. Each region that supports SageMaker has its copy of the images. You will begin by retrieving the URI of the container image for the current session's region. You can also utilise your own container images for specific ML algorithms.

Image Credits: http://programmerprodigycode.files.wordpress.com/2022/03/0bc53-1mfyty2swftpsulqybcgy-w.png

Q.How can we host Sagemaker models?

SageMaker can host models through its hosting services. The model is accessible to the client through a SageMaker endpoint. The Endpoint is accessible over HTTPS and SageMaker Python SDK.

Another way would be using AWS Batch. It manages the processing of large datasets within the limits of specified parameters. When a batch transform job starts, SageMaker initialises compute instances and distributes the inference or pre-processing workload between them.

In Batch Transform, you provide your inference data as an S3 URI and SageMaker will care of downloading it, running the prediction and uploading the results afterwards to S3 again.

Batch Transform partitions the Amazon S3 objects in the input by key and maps Amazon S3 objects to an instance. To split input files into mini-batches you create a batch transform job, set the SplitType parameter value to Line. You can control the size of the mini-batches by using the BatchStrategy and MaxPayloadInMB parameters.

After processing, it creates an output file with the same name and the .out file extension. The batch transforms job stores the output files in the specified location in Amazon S3, such as s3://awsexamplebucket/output/.

Image Credits: https://aws.amazon.com/blogs/machine-learning/performing-batch-inference-with-tensorflow-serving-in-amazon-sagemaker/

The predictions in an output file are in the same order as the corresponding records in the input file. To combine the results of many output files into a single output file, set the AssembleWith parameter to Line.

For more information refer,

https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html
https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-model-deployment.html#ex1-batch-transform

Q. What is distributed machine learning?

When machine learning processes have been deployed across a cluster of computing resources. So why use Distributed Machine learning?

To paralyse your machine learning processing requirements, allowing you to achieve quicker results
The complexity of the data set (features) may exceed the capabilities of a single node setup.
The accuracy of a machine learning model can be enhanced by processing more data. This, in turn, is connected back to the large datasets point above.

Image Credits: https://www.guavus.com/wp-content/uploads/2020/05/centralised-decentralised-1024x474.png

Apache Spark can provision a cluster of machines, configured in a manner that provides a distributed computing engine. Your datasets are partitioned and spread across the Spark cluster, allowing the cluster to process the data in parallel.

Q. Why Apache Spark?

Spark contains Resilient Distributed datasets (RDD) which saves time taken in reading and writing operations.
In-memory computing: In spark, data is stored in the RAM, so it can access the data quickly and speed up the speed of analytics.
Flexible: Spark supports many languages and allows the developers to write applications in Java, Scala, R or Python.
Resilient Distributed Datasets(RDD) are designed to handle the failure of any worker node in the cluster.
Better analytics: Spark has a rich set of SQL queries, machine learning algorithms, complex analytics.

MLlib is Apache Spark's scalable machine learning library. It contains fast and scalable implementations of standard machine learning algorithms.

Q. Why Spark MLLib?

Spark MLlib is on top of Spark which eases the development of efficient large-scale machine learning algorithms.
MLlib is easy to deploy and does not need any pre-installation if Hadoop cluster is already installed and running.
MLlib provides ultimate performance gains (about 10 to 100 times faster than Hadoop and Apache Mahout).

For more information on Apache Spark and MLLib, refer to,

https://programmerprodigy.code.blog/2022/01/09/introduction-to-apache-spark-sparkql-and-spark-mlib/

Q. How to use Distributed ML on AWS Cloud?

Amazon EMR provides a managed Hadoop frame loop that makes it easy, fast and cost effective to process vast amounts of data.

Amazon EMR can be used to perform: log analysis, web indexing, ETL, financial forecasting, bioinformatics and, machine learning.

Amazon EMR, together with Spark, simplifies the task of cluster and distributed job management. As we can use Amazon EMR at every stage of the Machine Learning pipeline.

Image Credits: https://d1.awsstatic.com/products/EMR/Product-Page-Diagram_Amazon-EMR.803d6adad956ba21ceb96311d15e5022c2b6722b.png

You can customize the installation of applications that complement the core EMR Hadoop application. When you launch an EMR cluster, you need to define and allocate compute resources to three different nodes: Master, Core and Task.

Q. How to Select the right to compute instance?

Choosing a Compute instance completely biased on either price or compute, might not be a good option. As you select a cheaper compute instance, it takes you about 30 mins. But if you would have selected a better compute instance, it takes 10 mins. The second alternative would have been a better alternative economically and time-based.

Some points to remember while choosing CPU and GPU will be

The CPU time grows proportional to the size of the matrix squared or cubed.
The GPU time grows almost linearly with the size of the matrix for the sizes used in the experiment. It can add more compute cores to complete the computation in much shorter times than a CPU.
Sometimes the CPU performs better than GPU for these small sizes. In general, GPU excel for large-scale problems.
For larger problems, GPUs can offer speedups in the hundreds. For Example, an application used for facial or object detection in an image or a video will need more computing. Hence GPUs might be a better solution.

For more information, feel free to listen to my session on introduction to Algorithms and AWS Sagemaker:

https://vimeo.com/586886985/7faddfb340

For more information on Sagemaker,

https://aws.amazon.com/blogs/aws/sagemaker/

After considering all the no/low code solutions and coding solutions. Let's consider a use case,

If you have a relatively simple algorithm with a less diverse data set. Then i would recommend no/low code solution using a centralised compute instance. If your algorithm is complicated, and your data set is diverse. Then i would recommend a distributed machine learning approach.

Things to know about Data-Driven Architecture on cloud

hridyesh bisht — Sat, 19 Mar 2022 05:04:38 +0000

As data becomes more diverse and valuable, we will see more emphasis on data-driven architecture . Developers need to understand the importance of accuracy, consistency, and quality of data. So they can develop quality data pipelines, and products to make sure we put the data first.

This blog explains what data is, how can we enrich our data, how can we analyse our data, and how to best use our data. We will be covering AWS Glue, AWS QuickSight, and AWS Sagemaker.

Inspiration of this blog, was after reading the Forbes Blog on, "The Age Of Analytics And The Importance Of Data Quality".

Q.What is Data?

Data is raw information. For example, your daily consumption of coffee. It is raw information about the amount of coffee you have consumed, but if you analyse it and gain insights from it.

Types of coffee beans or coffee flavour
How much sugar do you put into the coffee?

Image Credits: https://ciscocanada.files.wordpress.com/2013/09/cisco_blog_canada_coffee.png

Now that we differentiate between information and data. There are many formats to store and transfer data, these formats depend on the type of data. For example,

Write coffee ingredients on a piece of paper i.e unstructured
Write it in a .csv file i.e structured
A combination of both i.e semi-structured.

Image Credits: https://programmerprodigycode.files.wordpress.com/2022/03/ecd9e-1sbcb7tf8jjwzchdtt_sodw.png

Q.How to enrich our data?

As a data engineer, you would like to maximise the insights you could gather from your data. Some data formats are developer-friendly, and some are not. So we need to convert data to developer-friendly formats, there are many ways of doing it.

An example of no/low code could be AWS Glue,

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorise your data, clean it, enrich it, and move it reliably between various data stores and data streams.

You can store your data using various AWS services and still maintain a unified view of your data using the AWS Glue Data Catalogue. Use Data Catalogue to search and discover the datasets that you own, and maintain the relevant metadata in one central repository.

Image Credits: https://d1.awsstatic.com/aws-glue-graphics/Product-page-diagram_AWS-Glue_Elixir%402x.6511bc93abc20bb7bc8d03ebe2be1cbb7f2623fe.png

Q.How does AWS Glue work?

You define jobs in AWS Glue to do the work that's required to extract, transform, and load (ETL) data from a data source to a data target. You perform the following actions:

For datastore sources, you define a crawler to populate your AWS Glue Data Catalogue with metadata table definitions.
1. Point your crawler at a data store, and the crawler creates table definitions in the Data Catalogue.
AWS Glue can generate a script to transform your data or, you can provide the script in the AWS Glue console or API.( currently in Python and Scala scripts)
You can run your job on-demand, or you can set it up to start when a specified trigger occurs. The trigger can be a time-based schedule or an event.

You use the AWS Glue console to define and orchestrate your ETL workflow. The console calls several API operations in the AWS Glue Data Catalogue and AWS Glue Jobs system to perform the following tasks:

Define AWS Glue objects such as jobs, tables, crawlers, and connections.
Schedule when crawlers run.
Define events or schedules for job triggers.
Search and filter lists of AWS Glue objects.
Edit transformation scripts.

Image Credits: https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2018/04/17/PartitionedData2.jpg

If you don't prefer a No/Low code solution, you should try Pandas Library. Pandas library is great for data wrangling, and most of the data engineers will have experience with Pandas.

For more information, feel free to listen to my session on introduction to AWS Glue where I compare No/Low code solutions to Pandas library:

https://www.youtube.com/watch?v=njxWiaqlErQ&t=963s

Q.What to do after enriching your data?

Data visualisation helps you to visualise your data as maps or graphs and interact with them. This makes it much easier for the human mind to digest the data and thus allowing it to spot patterns and trends in a much better way. This could be either done by standard business analysis tools like Tableau or R or python. A few key benefits are,

Identifying important trends depending on the type of visualisation can help you to determine trends over time amongst a data set.
Being able to spot and identify relationships within your data is key, it can help you to both drive future business decisions in the right direction and also to make corrective actions elsewhere.
Having a quick reference to a visualisation allows the data to collaborate with many recipients.

Image Credits: https://i.pinimg.com/originals/7a/42/8e/7a428e9a180bb7e4911d5eaab8297982.jpg

There are a variety of ways to present your data, depending on what type of data you are trying to show. For each use case, there will be a specific type of chart, for example:

To present data that shows relationships between data points, use scatter or bubble chart.
To compare data between two or more data sets, use either a Bar, Column or Line chart.
Looking at the distribution of data across an entire data set, use a histogram.
Represent the part-to-whole relationship of a data set, use a pie chart, stacked column chart, 100% stacked column chart, or a treemap.

Image Credits: https://az801952.vo.msecnd.net/uploads/b9335f90-bb61-4773-899e-3927c923b9be.png

An example of no/low code could be AWS QuickSight

Amazon QuickSight allows everyone to understand your data by asking questions in natural language, exploring through interactive dashboards, or looking for patterns and outliers powered by machine learning.

Quicksight allows you to share dashboards, email reports, and embedded analytics. By taking your data and visually displaying the questions you want to answer you can gain relevant insights into your company data

It allows you to draw various graphs and charts using options in User Interface. There are a lot of different options to work with. Let's cover a few terminologies

Fields: These reflect the columns of the table in the database.
Visual Types: This is how your data will be represented. This can be from a simple sum to a chart/graph or even a heat map.
Sheets: These allow for many visuals to be stored together on a single page. To keep things simple, we'll be working with only one sheet.

Try changing the Visual Type of this data and see how it's represented. You might need to add extra fields to the Field wells to make them populate correctly.

Image Credits: https://d2908q01vomqb2.cloudfront.net/b6692ea5df920cad691c20319a6fffd7a4a766b8/2017/09/21/quicksight-sept-4.gif

QuickSight, by default, has an automatic save feature enabled by default for each analysis. Personally, the case study of Quicksight in the NFL has to be one of the interesting use cases reads.

If you don't prefer a No/Low code solution, you should try looking into MatplotLib, Seaborn, and Bokeh Library. They are great for data visualisation and most of the data engineers will have experience with them.

Q.How can we predict an outcome using our data?

After Data visualisation helps us understand patterns in data. We would like to predict/classify an outcome based on historical data.

Q.What is Machine learning?

Image Credits: https://analyticsinsight.b-cdn.net/wp-content/uploads/2021/08/ML-System.jpg

An example of low code Machine learning solutions. When considering Machine learning solutions, there are many use cases to consider. Let us consider a few,

Extract text and data from documents: Rather than building up your Model from scratch, you could use AWS Textract.
1. Amazon Textract extracts text, handwriting, and data from scanned documents.
If you want to build Chatbots, then AWS Lex would help you build chatbots.
1. To design, build, test, and deploy conversational interfaces in applications using advanced natural language models.
If you want to automate speech recognition, AWS Transcribe.
1. An automatic speech recognition service that makes it easy to add speech to text capabilities to any application. Consider the use case of Alexa.

For more information on various machine learning general use case solutions refer to,

https://aws.amazon.com/machine-learning/

If you don't prefer Low code solution, you should try looking into Sagemaker. They are good for computing and deploying your ML models, as you get AWS compute servers.

Q.What is a sagemaker?

Image Credits: http://programmerprodigycode.files.wordpress.com/2022/03/0bc53-1mfyty2swftpsulqybcgy-w.png

Q.How can we host Sagemaker models?

SageMaker can host models through its hosting services. The model is accessible to the client through a SageMaker endpoint. The Endpoint is accessible over HTTPS and SageMaker Python SDK.

In Batch Transform, you provide your inference data as an S3 URI and SageMaker will care of downloading it, running the prediction and uploading the results afterwards to S3 again.

Image credits: https://aws.amazon.com/blogs/machine-learning/performing-batch-inference-with-tensorflow-serving-in-amazon-sagemaker/

For more information refer,

https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html
https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-model-deployment.html#ex1-batch-transform

Q. How to Select the right to compute instance?

Some points to remember while choosing CPU and GPU will be

The CPU time grows proportional to the size of the matrix squared or cubed.
The GPU time grows almost linearly with the size of the matrix for the sizes used in the experiment. It can add more compute cores to complete the computation in much shorter times than a CPU.
Sometimes the CPU performs better than GPU for these small sizes. In general, GPU excel for large-scale problems.
For larger problems, GPUs can offer speedups in the hundreds. For Example, an application used for facial or object detection in an image or a video will need more computing. Hence GPUs might be a better solution.

For more information, feel free to listen to my session on introduction to Algorithms and AWS Sagemaker:

https://vimeo.com/586886985/7faddfb340

For more information on Sagemaker,

https://aws.amazon.com/blogs/aws/sagemaker/

After considering all the no/low code solutions and coding solutions. Let's consider a use case,

If you have a relatively small business with not that much need of customisation, then perhaps no/low code solutions. But if you want to customise your application, you would have to you coding solutions. A point to remember, depending on your datasets size, diversity and quality, you could either go for CPU(less compute) or GPU (more compute).