DEV Community: Abhiraj Adhikary

Apache Fluss: Architecting the Streaming-First Persistent Data Stack

Abhiraj Adhikary — Mon, 18 May 2026 08:33:52 +0000

Traditional Modern Data Stacks are inherently fragmented, relying on separate silos for streaming, batch processing, lakehouses, and AI systems. This fragmentation forces organizations to maintain multiple copies of data, increases operational complexity, and creates expensive retention overhead in systems like Kafka. Additionally, analytical freshness suffers because data pipelines are often batch-oriented and disconnected from real-time applications.

As real-time AI systems, recommendation engines, fraud detection, and agentic workflows become business-critical, the industry is shifting from batch-first architectures toward streaming-first persistent systems, where streams and tables converge into a unified abstraction.

This new architectural paradigm enables organizations to process, store, query, and serve data continuously without maintaining separate infrastructures for streaming and analytics.

The Core Architecture: How It Works

This unified ecosystem moves away from traditional broker-centric designs and introduces a streaming-native analytical storage model.

1. Ingestion Layer (CDC, IoT, Logs)

Continuous event streams such as:

Change Data Capture (CDC)
IoT telemetry
Application logs
Clickstream events

are written directly into the storage core, bypassing heavy broker retention dependencies.

This reduces:

Kafka storage costs
Operational overhead
Data duplication across systems

2. Storage Core — Apache Fluss

At the center of the architecture sits Apache Fluss, which acts as a real-time streaming storage engine.

Responsibilities

Maintains streaming tables
Stores changelogs
Preserves low-latency hot data
Automatically tiers cold data into object storage (S3/OBS)

Key Innovation

Instead of treating streams as temporary transport layers, Fluss treats them as a persistent analytical substrate.

This enables:

Real-time reads/writes
Stateful streaming
Stream-table unification
Efficient historical access

3. Compute Layer — Apache Flink SQL

Apache Flink SQL performs stateful transformations and real-time analytics.

Major Capability: Union Reads

Flink can simultaneously query:

Hot data from Fluss
Historical cold data from the lakehouse

This creates a seamless analytical experience across real-time and historical datasets.

Typical Workloads

Sessionization
Fraud detection
Feature engineering
Aggregations
Real-time ETL

4. Persistence Layer — Apache Iceberg

Cold and immutable historical datasets are persisted into Apache Iceberg.

Benefits

ACID table guarantees
Schema evolution
Time travel
Partition optimization
Open table format interoperability

Catalogs such as:

Nessie
Polaris

manage metadata and versioning for Iceberg tables.

5. Query & OLAP Layer

Specialized analytical engines accelerate different workloads.

Databend

Optimized for:

High-throughput OLAP
Warehouse-scale analytical queries
Concurrent workloads

Dremio

Provides:

Semantic acceleration
BI query optimization
Lakehouse exploration

Trino

Enables:

Federated SQL querying
Cross-platform analytics
Distributed query execution

Together, these engines provide a flexible analytical ecosystem over the unified lakehouse.

6. AI & Vector Layer

Modern AI applications require real-time embeddings and semantic retrieval systems.

Vector Databases

Qdrant
Milvus

store embeddings generated from streaming pipelines.

Use Cases

Recommendation systems
Semantic search
Retrieval-Augmented Generation (RAG)
Real-time personalization
Agent memory systems

This enables AI systems to continuously consume fresh streaming data.

7. Infrastructure & Operations Layer

The entire ecosystem is deployed using cloud-native infrastructure.

Kubernetes

Provides:

Container orchestration
Horizontal scaling
Self-healing deployments

Terraform

Enables:

Infrastructure-as-Code (IaC)
Reproducible environments
Automated provisioning

Airflow

Handles:

Workflow orchestration
Batch coordination
Dependency management

Implementation & Practical Use Case

Real-Time E-Commerce Platform

Consider a large-scale e-commerce system.

Data Sources

CDC events from transactional databases
User clickstreams
Product interactions
Payment events
Inventory updates

Processing Flow

CDC and clickstream events continuously flow into Apache Fluss.
Apache Flink computes:

Live sessions
Fraud signals
User activity windows
Recommendation features
1. Hot operational data remains in Fluss for low-latency access.
2. Historical data persists into Apache Iceberg.
3. Dremio accelerates BI dashboards over the lakehouse.
4. Databend powers heavy OLAP analytics workloads.
5. Qdrant stores vector embeddings for personalized recommendations.

Strategic Evaluation

Key Advantages

Reduced Costs

Minimizes Kafka retention overhead
Reduces unnecessary data duplication
Uses cheaper object storage for cold data

Unified Processing Logic

Stream-table unification enables Flink to seamlessly access both:

Real-time streaming data
Historical lakehouse data

without separate architectures.

AI-Ready Infrastructure

Native support for:

Vector databases
Real-time feature pipelines
Streaming embeddings
RAG architectures

makes the system ideal for modern AI workloads.

Cloud-Native Scalability

Designed for:

Kubernetes deployments
Remote object storage
Elastic compute scaling
Multi-cloud infrastructure

Conclusion

The rise of Apache Fluss signals a fundamental architectural shift in modern data engineering.

Streaming is no longer treated as a transient transport mechanism — it is becoming the primary abstraction for data persistence and analytics.

By collapsing the traditional boundaries between ingestion, storage, streaming, analytics, and AI, this architecture provides the low-latency foundation required for:

Real-time intelligence
Continuous feature freshness
AI-native applications
Agentic systems
Next-generation recommendation engines

This unified streaming-first ecosystem represents the future of modern data platforms.

Parallel & Concurrent Computing

Abhiraj Adhikary — Tue, 27 Jan 2026 08:30:14 +0000

Parallel and concurrent computing are no longer niche topics for high-performance researchers; they are essential for anyone wanting to squeeze real performance out of modern hardware.

1. Motivation: The End of "Free Lunch"

For decades, software got faster simply because hardware engineers increased CPU clock speeds. However, around 2004, we hit a "Power Wall." Increasing clock speeds further generated more heat than could be dissipated.

CPU Core Stagnation: Instead of making one core faster (increasing GHz), manufacturers began adding more cores to a single chip.
The Shift: To gain performance now, developers must write code that can run across these multiple cores simultaneously.

2. Serial vs. Parallel Execution

The difference lies in how tasks are queued and processed.

Feature	Serial Execution	Parallel Execution
Workflow	One task must finish before the next begins.	Multiple tasks (or parts of a task) run at the same time.
Hardware	Uses a single processor core.	Uses multiple cores or multiple processors.
Analogy	A single grocery checkout line.	Multiple checkout lanes open at once.

3. Key Definitions

Concurrency vs. Parallelism

These terms are often used interchangeably, but they describe different concepts:

Concurrency: The art of dealing with many things at once. It’s about structure. A system is concurrent if it can handle multiple tasks by switching between them (interleaving).
Parallelism: The act of doing many things at once. It’s about execution. It requires hardware capable of running tasks at the exact same moment.

Deterministic vs. Non-deterministic Execution

Deterministic: Given the same input, the program always produces the same output and follows the same execution path.
Non-deterministic: The outcome or the order of execution can change between runs, even with the same input. This is common in parallel systems because the thread scheduler decides when each task runs, often leading to different interleaving.

4. Common Pitfalls

Writing parallel code is notoriously difficult because of the "bugs" that only appear when timing is just right (or wrong).

Race Conditions

A race condition occurs when the output depends on the sequence or timing of uncontrollable events.

Example: Two threads try to increment a counter simultaneously. If they both read "10," add 1, and write back "11," the counter only increases by 1 instead of 2.

Deadlocks

A deadlock is a "Mexican Standoff" in code. It happens when:

Thread A holds Resource 1 and waits for Resource 2.
Thread B holds Resource 2 and waits for Resource 1. Neither can proceed, and the program freezes.

Synchronization Issues

To prevent race conditions, we use "locks" or "mutexes." However, over-synchronizing leads to problems:

Contention: Too many threads fighting for the same lock, which slows the system down to serial speeds.
Starvation: A thread is perpetually denied access to resources because other "greedier" threads keep taking them.

Understanding how memory is allocated is the "make or break" moment for designing parallel systems. It dictates how your workers (threads or processes) talk to each other and how much they’ll fight over resources.

2.1 Shared Memory Parallelism (Multithreading)

In this model, multiple threads live within a single process. Imagine a single kitchen (the memory) where multiple chefs (threads) are working at the same counter.

Shared Space: All threads can see and modify the same variables. This makes communication lightning-fast because you don't have to "send" data; it's already there.
The Synchronization Tax: Since everyone is touching the same "ingredients," you need strict rules (locks/mutexes) to prevent them from chopping the same carrot at the same time. This adds significant logic complexity.
The Python Catch (GIL): In standard Python (CPython), the Global Interpreter Lock (GIL) ensures only one thread executes Python bytecode at a time. Even on a 16-core machine, multithreading in Python won't give you a 16x speedup for CPU-heavy math; it’s mostly useful for I/O tasks like downloading files.

2.2 Distributed Memory Parallelism (Multiprocessing)

Here, you have multiple processes, each with its own private "kitchen." No process can peek into another's memory.

Independence: Since memory isn't shared, you don't have to worry about one process accidentally overwriting another’s variables. This eliminates many race conditions.
Message Passing: If Process A needs data from Process B, it must be explicitly "sent" over a communication channel (like a Pipe or Queue). This is called Message Passing.
True Parallelism: Because each process has its own memory and its own instance of the Python interpreter, the GIL is bypassed. This is the go-to method for compute-bound tasks (e.g., heavy data processing, image rendering).
The Overhead: Creating a new process is "heavier" and slower than creating a thread, and sending large amounts of data between processes can be a performance bottleneck.

Summary Comparison

Feature	Multithreading (Shared)	Multiprocessing (Distributed)
Memory	Shared among all threads	Private to each process
Communication	Fast (Shared variables)	Slower (Message passing)
Complexity	High (Needs locks/semaphores)	Lower (Isolation)
Python GIL	Restricted by GIL	Bypasses GIL
Best Use Case	I/O-bound (API calls, DB reads)	CPU-bound (Math, Data Science)

The Global Interpreter Lock (GIL) is perhaps the most famous (and infamous) technical detail of the Python programming language. It is essentially a "safety latch" that has shaped how the entire Python ecosystem handles performance.

3.1 What is the GIL and Why Does It Exist?

The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

The Reason: Python uses reference counting for memory management. If two threads increment or decrement the "use count" of an object simultaneously, it could lead to memory leaks or, worse, deleting an object that is still in use.
The Benefit: It makes the implementation of CPython (the standard Python version) much simpler and faster for single-threaded programs. It also makes integrating C libraries (which might not be thread-safe) much easier.

3.2 Impact on Python Multithreading

Because of the GIL, even if your computer has 32 CPU cores, a standard Python program using threading will only utilize one core at a time for execution.

The Illusion of Parallelism: To a human, it looks like threads are running in parallel because the GIL switches between them very quickly (every 5ms or so).
CPU-Bound Bottleneck: If your code is doing heavy math (CPU-bound), multithreading actually makes it slower than a single-threaded program. This is because of the "lock overhead"—the time wasted by threads fighting over who gets to hold the GIL.

3.3 How the GIL is Bypassed

The GIL isn't an impenetrable wall; it’s more like a gate that can be opened under specific conditions.

1. Native Extensions (The "C" Escape)

Libraries like NumPy, SciPy, and Pandas are written in C or Fortran. When you perform a massive matrix multiplication in NumPy, the library "releases" the GIL, does the heavy lifting in C across multiple cores, and "grabs" the GIL back only when it’s done.

Note: This is why Python is a powerhouse for Data Science despite the GIL.

2. I/O Operations

When a thread is waiting for something external—like a website to respond, a file to be read from a disk, or a database query—it voluntarily releases the GIL.

While Thread A waits for a download, Thread B can take the GIL and start working. This makes Python threads excellent for network-heavy tasks.

3. Multiprocessing

As we discussed earlier, the GIL is per-interpreter. By using the multiprocessing module, you launch entirely separate instances of the Python interpreter.

Each process has its own GIL.
Each process can sit on its own CPU core.
This is the standard way to achieve "True Parallelism" in Python for pure Python code.

Summary: Threading vs. Multiprocessing in Python

Task Type	Recommended Approach	Why?
CPU-Bound (Math, Compression)	`multiprocessing`	Bypasses GIL, uses all cores.
I/O-Bound (Web Scraping, API)	`threading`	Efficiently uses "waiting time."
Scientific Computing	`NumPy / Pandas`	Releases GIL internally in C code.

In Python, the threading module is the go-to choice for tasks where the bottleneck isn't your CPU's speed, but rather the latency of external systems.

4.1 Threading Use Cases

Threads are ideal for I/O-bound workloads. In these scenarios, the processor spends most of its time idle, waiting for a response from a device or network.

Network Requests: Fetching data from multiple APIs or web scraping. While Thread A waits for a server in New York to respond, the GIL is released, allowing Thread B to start a request to a server in London.
Disk Operations: Reading or writing multiple files. Since disk I/O is significantly slower than CPU cache, threads allow you to overlap the "wait time" of different file operations.
User Interfaces (GUIs): Keeping the interface responsive. One thread handles the "click" events while a background thread does the heavy lifting, preventing the window from freezing.

4.2 ThreadPoolExecutor

Modern Python development favors the concurrent.futures.ThreadPoolExecutor over the older threading.Thread class. It provides a higher-level interface for managing a "pool" of threads.

What is a Thread Pool?

Instead of creating and destroying a thread for every single task (which is expensive), you create a Pool of workers that stay alive and pick up tasks from a queue as they become available.

Key Methods: `map` vs. `submit`

The ThreadPoolExecutor offers two primary ways to run tasks:

map(func, *iterables):
- Works like the built-in map.
- Executes the function across all items in the iterable in parallel.
- Pros: Very simple; returns results in the order they were submitted.
submit(func, *args):
- Schedules a single callable and returns a Future object.
- Pros: More flexible; allows you to handle individual task completion and different arguments for each task.

Code Example: Efficiently Fetching Data

from concurrent.futures import ThreadPoolExecutor
import requests

urls = ["https://google.com", "https://python.org", "https://github.com"]

def fetch_status(url):
    response = requests.get(url)
    return f"{url}: {response.status_code}"

# Using a context manager ensures threads are cleaned up automatically
with ThreadPoolExecutor(max_workers=3) as executor:
    # 'map' handles the distribution of URLs to the 3 threads
    results = list(executor.map(fetch_status, urls))

for r in results:
    print(r)

Why use a Pool instead of manual Threads?

Resource Management: It prevents you from accidentally spawning 10,000 threads and crashing your system.
Cleanliness: Using the with statement (context manager) ensures that all threads are joined and resources are released even if an error occurs.
Future Objects: It provides "Futures," which are placeholders for results that haven't happened yet, allowing you to check if a task is "done" or if it "cancelled."

To master parallel computing, you must be able to diagnose the bottleneck. Is your code waiting for the "brain" (CPU) or the "delivery truck" (I/O)? Choosing the wrong tool for the workload can actually make your code slower.

6.1 Identifying Workload Characteristics

CPU-Bound (Compute-Heavy)

The speed is limited by the CPU's clock speed and core count.

Examples: Matrix multiplication, image processing, data compression, searching for prime numbers.
Significance: These tasks keep the processor usage at 100%.

I/O-Bound (Wait-Heavy)

The speed is limited by Input/Output operations. The CPU often sits idle, waiting for data.

Examples: Web scraping (Network), reading thousands of small CSVs (Disk), waiting for a database query to return.
Significance: Processor usage is usually low; the system is waiting on external latency.

6.2 Performance Comparison Table

Here is how each execution style behaves under different pressures:

Workload Type	Serial Execution	Multithreading	Multiprocessing
I/O-Bound	Very Slow (Total wait time)	Fastest (Overlaps wait time)	Fast (But uses more memory)
CPU-Bound	Slow	Slowest (GIL overhead + context switching)	Fastest (Uses all cores)

6.3 Demonstrations (Mental Model)

I/O-Bound: The `sleep()` Test

Imagine a task that does nothing but time.sleep(1). This simulates waiting for a network response.

Serial: To do this 10 times, it takes 10 seconds.
Multithreading: You spawn 10 threads. They all start "sleeping" at the same time. The total time is roughly 1 second.
Why? The GIL is released during sleep(), letting threads wait in parallel.

CPU-Bound: The Mathematical Loop

Imagine calculating the sum of squares for 50 million numbers.

Serial: Takes X seconds.
Multithreading: Takes X + overhead seconds. Because of the GIL, only one thread can do math at a time. The CPU is essentially "juggling" threads, which wastes time.
Multiprocessing: If you have 4 cores, it takes roughly X / 4 seconds. Each core handles a chunk of the numbers independently.

Summary: The Decision Tree

Is the CPU usage low while the program is running? $\rightarrow$ It's I/O-bound. Use threading or asyncio.
Is one core pegged at 100%? $\rightarrow$ It's CPU-bound. Use multiprocessing or a library like NumPy.
Are you limited by memory? $\rightarrow$ Be careful with multiprocessing, as each process copies the memory space.

While Multithreading is like having one chef with multiple hands, Multiprocessing is like hiring four chefs in four separate kitchens. This is the only way to achieve "true" parallelism for Python-native code.

7.1 The `multiprocessing` Module

This module bypasses the GIL by creating entirely new instances of the Python interpreter for each task.

Process-based parallelism: Each process has its own memory space and its own GIL.
Safety: Since memory isn't shared by default, one process can't accidentally corrupt another's data.

7.2 Pool, Map, and Starmap

The multiprocessing.Pool class is the workhorse for data-parallel tasks.

map(func, iterable): The simplest way to parallelize. It chops the iterable into chunks and sends them to the worker processes.
starmap(func, iterable_of_tuples): Used when your function requires multiple arguments.
- Example: If func(x, y) is your function, starmap takes [(1, 2), (3, 4)].

`ProcessPoolExecutor`

Found in concurrent.futures, this provides an identical interface to the ThreadPoolExecutor we saw earlier. It is generally preferred in modern code for its consistency and better error handling.

7.3 Communication & Shared Memory

Sometimes processes do need to talk to each other. Since they don't share memory, we use special constructs:

Tool	Description	Best For...
`Value` / `Array`	Allocates a small piece of shared memory (C-style) that all processes can see.	Simple counters or flags.
`Queue`	A thread- and process-safe FIFO (First-In-First-Out) pipe.	Passing complex objects or results back to the main process.
`Pipe`	A direct connection between two processes.	Fast, two-way communication between exactly two workers.

7.4 Limitations in Interactive Environments (Jupyter)

A common "gotcha" for data scientists is that the multiprocessing module often fails or behaves unpredictably in Jupyter Notebooks or the IPython console.

Serialization (Pickling): Python must "pickle" (serialize) your function and data to send it to the other process. If you define a function inside a notebook cell, the worker process might not be able to find its definition.
The if __name__ == "__main__": block: On Windows and macOS, you must wrap your multiprocessing code in this block to prevent a recursive loop of process creation.
- Jupyter doesn't always handle this entry point correctly.

Workaround: If you run into issues in Jupyter, move your functions into a separate .py file and import them into your notebook.

7.5 Summary: When to use Multiprocessing

YES: For "number crunching" (e.g., calculating $\pi$ to a billion digits).
YES: For heavy image/video processing.
NO: For simple I/O (it uses way more RAM than threads).
NO: When you need to share massive amounts of data (the "pickling" overhead will kill your performance).

When multiple threads or processes try to change the same piece of data at the same time, you enter the world of Race Conditions. This is the most common source of "heisenbugs"—bugs that seem to disappear when you try to look for them.

8.1 Shared State Modification Problems

A race condition occurs when the final outcome of a program depends on the timing or scheduling of the execution.

If two threads are incrementing a shared variable, the operation looks like one step in Python (x += 1), but the CPU sees three distinct steps:

Read the current value of $x$.
Add 1 to that value.
Write the new value back to memory.

If Thread A is interrupted after step 1, and Thread B finishes all three steps, Thread A will eventually overwrite Thread B's work with an outdated value.

8.2 Demonstration of Incorrect Results

In a perfectly synchronized world, if you have 10 threads each adding 1 to a counter 100,000 times, the result should be 1,000,000.

In a race condition scenario, the result might be 742,384. This happens because thousands of "updates" were lost when threads stomped on each other’s data.

8.3 Threading vs. Multiprocessing Behavior

The way these two handle "shared state" is fundamentally different, which changes how they fail.

In Multithreading: Race conditions are common and dangerous. Because all threads share the same memory, they can all "see" and "touch" the same variables globally.
In Multiprocessing: Race conditions are rare by default. Since each process has its own memory, incrementing x in Process A does nothing to x in Process B.
- Exception: You only face race conditions in multiprocessing if you explicitly use Shared Memory constructs (like Value or Array) or shared external resources (like a database or a file on disk).

8.4 Synchronization Primitives

To fix these issues, we use tools that force threads to "wait their turn."

1. The Lock (Mutex)

A Lock is the simplest tool. It has two states: locked and unlocked.

A thread must "acquire" the lock before touching the shared data.
If another thread holds the lock, everyone else must wait.
Analogy: The "talking stick" in a meeting. You can't speak unless you hold the stick.

2. The Semaphore

A Semaphore is like a Lock, but it allows a specific number of threads to enter.

Analogy: A restaurant with 10 tables. The first 10 groups get in; the 11th must wait until someone leaves.

3. The RLock (Re-entrant Lock)

A standard Lock can cause a thread to "deadlock itself" if it tries to acquire the same lock twice. An RLock allows the same thread to acquire the lock multiple times without freezing.

Summary: The Cost of Safety

While synchronization prevents data corruption, it comes with a performance price:

Overhead: Managing locks takes CPU time.
Serial Bottlenecks: If every thread is waiting for the same lock, your "parallel" program is actually running one-by-one (serial).

Numerical integration is a "perfect" parallel problem. It follows the embarrassingly parallel pattern, where a large task can be easily divided into independent sub-tasks that don't need to communicate with each other.

9.1 The Grid-Based Technique (Rectangle Rule)

To find the area under a curve f(x) between a and b, we divide the interval into $N$ small rectangles. The total area is the sum of the areas of these rectangles.

$$Area \approx \sum_{i=0}^{N-1} f(x_i) \Delta x$$

In a serial approach, a single CPU core calculates rectangle #1, then #2, then #3, all the way to N. If N is 100 million, this takes a significant amount of time.

9.2 Identifying Parallelizable Regions

The beauty of integration is that the calculation of "Rectangle #500" does not depend on the result of "Rectangle #499."

The Strategy: Split the total range $[a, b]$ into sub-intervals.
The Workers: If you have 4 cores, Core 1 handles the first 25%, Core 2 the second 25%, and so on.
The Reduction: Once all cores finish their local sums, you add those 4 sums together to get the final answer.

9.3 Implementation Strategies

Multithreading Approach

Performance: Low. Because integration is CPU-bound (pure math), the Python GIL will prevent the threads from running the math in parallel.
Use Case: Only beneficial if the function $f(x)$ involved an I/O wait (e.g., fetching a coordinate from a remote database), which is rare in pure math.

Multiprocessing Approach

Performance: High. This is the correct tool. By using a ProcessPoolExecutor, each core gets a chunk of the grid.
Efficiency: You get nearly "linear scaling." If 1 core takes 10 seconds, 4 cores should take roughly 2.5 seconds.

9.4 Performance Measurement

To prove the speedup, we use the time module. It is vital to measure only the calculation, excluding the time it takes to set up the data.

import time

start_time = time.time()

# ... Parallel Integration Logic ...

end_time = time.time()
print(f"Execution Time: {end_time - start_time:.4f} seconds")

Critical Metrics:

Speedup ($S$): $S = \frac{T_{serial}}{T_{parallel}}$
Efficiency ($E$): $E = \frac{S}{Number\ of\ Cores}$ (Ideally, this is close to 1.0 or 100%).

9.5 Summary Table: Integration Performance

Method	Execution	Expected Speedup
Serial	One core, one by one.	1x (Baseline)
Multithreading	Context switching on one core.	~0.9x (Slower due to overhead)
Multiprocessing	Multiple cores simultaneously.	~3.8x (on a 4-core machine)
NumPy (Vectorized)	Optimized C-backend/SIMD.	Fastest (often 50x - 100x)

To wrap up our foundations, we look at the "low-hanging fruit" of the computing world. An Embarrassingly Parallel problem is one where little to no effort is needed to separate the problem into a number of parallel tasks.

10.1 Definition and Characteristics

A problem is embarrassingly parallel if there is no dependency (or very little) between the sub-tasks.

No Communication: Task A doesn't need to know what Task B is doing to finish its job.
No Shared State: Workers don't need to update a global variable constantly (which avoids those pesky race conditions).
High Scalability: These problems scale almost perfectly; doubling your CPU cores usually halves the execution time.

10.2 Core Examples

Monte Carlo Simulations

These simulations use repeated random sampling to obtain numerical results (like predicting stock market trends or calculating $\pi$). Since every "random trial" is independent, you can run a million trials on one core or divide them across a thousand cores with zero logic changes.

Weather Ensemble Models

Meteorologists don't just run one weather forecast; they run dozens of "ensembles" with slightly different starting conditions. Since Forecast A doesn't affect Forecast B, they are computed in parallel across massive supercomputers.

Batch Data Processing

Imagine you have 10,000 high-resolution photos to resize. Resizing photo #1 has nothing to do with photo #100. This is a classic "Map" operation where a worker pool can chew through the pile of files as fast as the disk can provide them.

CNN (Convolutional Neural Network) Workloads

In Deep Learning, a Convolutional layer applies filters to an image. Each "pixel" calculation or each "filter" application can be done independently. This is why GPUs—which have thousands of tiny cores—are so much faster than CPUs for AI tasks.

FFT (Fast Fourier Transform)

While the classic DFT is $O(N^2)$, the FFT reduces complexity to $O(N \log N)$. In many implementations, the data is split into "even" and "odd" parts that can be processed recursively in parallel, making it a staple of digital signal processing.

Summary of the "Parallel Spectrum"

Type	Communication Needs	Difficulty to Parallelize
Embarrassingly Parallel	None	Very Easy
Coarse-Grained	Occasional	Moderate
Fine-Grained	Constant/Frequent	Hard (High risk of overhead)

While Python’s multiprocessing is great for a single machine, MPI (Message Passing Interface) is the gold standard for high-performance computing (HPC) across clusters of multiple computers. It is the language of supercomputers.

11.1 MPI Fundamentals

Unlike the shared-memory models we’ve discussed, MPI is built entirely on the Distributed-Memory Model.

Independent Processes: Each process has its own address space. There is no shared "global variable." If Process 0 has a variable x, Process 1 cannot see it unless Process 0 explicitly sends it.
The "Rank": Every process in an MPI job is assigned a unique ID called a Rank (starting from 0). You use this rank to tell each process what part of the work it should do.
The "Communicator": This is a group of processes that can talk to each other. The default group containing all your processes is called COMM_WORLD.

11.2 mpi4py: MPI for Python

The mpi4py library provides the Python bindings for the MPI standard. It allows Python scripts to communicate across a network.

Key Concepts

COMM_WORLD: The primary communicator.
Get_size(): Tells you the total number of processes running.
Get_rank(): Tells the current process its unique ID.
Point-to-Point Communication: Using send() and recv() to move data between specific ranks.
Collective Communication: Using bcast() (one-to-all) or reduce() (all-to-one) to synchronize data.

11.3 Running MPI Programs

You cannot run an MPI script by simply typing python script.py. You must use a process manager, typically mpirun or mpiexec, which handles the launching of multiple instances across your CPU cores or network nodes.

The Command:
mpirun -n 4 python3 my_script.py
(This launches 4 independent instances of your script.)

Example: The "Who Am I?" Pattern

from mpi4py import MPI

# Get the communicator
comm = MPI.COMM_WORLD

# Get the size (total processes) and rank (my ID)
size = comm.Get_size()
rank = comm.Get_rank()

print(f"Hello! I am process {rank} out of {size} total processes.")

if rank == 0:
    data = {'key': 'value'}
    comm.send(data, dest=1)
    print("Process 0 sent data to Process 1.")
elif rank == 1:
    data = comm.recv(source=0)
    print(f"Process 1 received: {data}")

Summary: MPI vs. Multiprocessing

Feature	`multiprocessing`	`mpi4py` (MPI)
Scope	Single Machine (Multi-core)	Multi-Node (Clusters/Supercomputers)
Memory	Shared-memory constructs available	Strictly Distributed (Message Passing)
Launch	Standard Python interpreter	`mpirun` / `mpiexec`
Scaling	Limited by one motherboard	Scales to thousands of CPUs

This completes the technical foundation of Parallel & Concurrent Computing! We've traveled from CPU core stagnation all the way to distributed supercomputing.

Would you like me to create a "Cheat Sheet" summarizing which tool (Threading, Multiprocessing, or MPI) you should use based on the specific type of project you are working on?

In MPI, communication is how independent processes coordinate to solve a single problem. There are two primary ways processes "talk": one-to-one (Point-to-Point) or all-together (Collective).

12.1 Point-to-Point Communication

This is the most basic form of messaging, involving exactly two processes: a sender and a receiver.

send(obj, dest): The source process sends a Python object to a specific rank.
recv(source): The destination process waits to receive an object from a specific rank.
Blocking Communication: By default, these operations are "blocking." The sender waits until the message is safely in the transmission buffer, and the receiver waits (sleeps) until the message actually arrives. If you recv() and no one ever send(), your program will hang forever.

12.2 Collective Communication

Collective operations involve all processes in a communicator (e.g., COMM_WORLD). These are highly optimized and usually much faster than writing multiple point-to-point loops.

Operation	Description	Analogy
Broadcast (`bcast`)	One process sends the same data to everyone else.	A teacher giving a handout to the whole class.
Scatter (`scatter`)	One process takes a list and gives one piece to each process.	Dealing a deck of cards to players.
Gather (`gather`)	One process collects a piece of data from everyone else into a list.	A teacher collecting homework from every student.
Reduce (`reduce`)	Everyone sends data to one process, which "crunches" it (e.g., Sum, Max).	Everyone votes, and the teller announces only the total count.

12.3 Performance: The "Case" Matters

In mpi4py, there is a massive performance difference between lowercase methods (e.g., send) and uppercase methods (e.g., Send).

Lowercase Methods (`send`, `recv`, `bcast`)

Mechanism: Uses pickle to serialize Python objects.
Flexibility: Can send almost any Python object (dicts, lists, custom classes).
Performance: Slower. The overhead of pickling and unpickling large amounts of data can create a bottleneck.

Uppercase Methods (`Send`, `Recv`, `Bcast`)

Mechanism: Uses Buffer-based communication. It points directly to a contiguous block of memory.
Flexibility: Requires data to be in a buffer-like format, typically NumPy arrays.
Performance: Extremely Fast. This is near-C speeds because it avoids the Python overhead and communicates the raw memory directly.

Rule of Thumb: If you are moving NumPy arrays for math, always use the uppercase methods (e.g., comm.Send(my_array, dest=1)).

12.4 Summary: When to use what?

Use Point-to-Point for complex logic where specific workers need unique instructions.
Use Collective for mathematical synchronization (e.g., summing partial results of an integral).
Use Uppercase methods whenever you are doing heavy data lifting with NumPy.

Building a Complete DevOps Pipeline: Flask App with Docker, Jenkins, GitHub Actions, Prometheus, and Grafana

Abhiraj Adhikary — Sat, 01 Nov 2025 16:09:51 +0000

In today's fast-paced software development world, DevOps practices are essential for streamlining workflows, ensuring reliable deployments, and monitoring applications effectively. This tutorial walks you through a hands-on DevOps project using Flask as the web framework, Pytest and Playwright for testing, Docker for containerization, GitHub Actions and Jenkins for CI/CD, and Prometheus with Grafana for monitoring. Whether you're a beginner or an experienced engineer, this guide will help you build, test, deploy, and monitor a simple Flask app.

By the end, you'll have a production-ready setup that demonstrates key DevOps principles like automation, containerization, and observability. Let's dive in!

Project Overview: What We're Building

This DevOps project creates a basic Flask web application that serves a simple HTML template. We integrate testing, containerization, CI/CD pipelines, and monitoring to create a robust ecosystem. Key tools include:

Flask: For the backend web app.
Pytest & Playwright: For unit and end-to-end (E2E) testing.
Docker: For building and orchestrating containers.
GitHub Actions & Jenkins: For automated CI/CD.
Prometheus & Grafana: For metrics collection and visualization.

The full source code is available on GitHub. Keywords: DevOps tutorial, Flask Docker CI/CD, Prometheus Grafana monitoring.

Step 1: Setting Up the Flask Application

Start by creating a simple Flask app. Install dependencies like flask and prometheus_flask_exporter for metrics exposure.

Here's the core app.py code:

from flask import Flask, render_template
from prometheus_flask_exporter import PrometheusMetrics

app = Flask(__name__)
metrics = PrometheusMetrics(app)

@app.route('/')
def home():
    return render_template('index.html')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

This app renders an index.html from the templates folder and exposes metrics at /metrics. The static folder holds CSS styles for a polished UI. Run it locally with python app.py and access at http://localhost:5000.

For SEO: Flask web app tutorial, Python DevOps project.

Step 2: Implementing Tests with Pytest and Playwright

Testing is crucial in DevOps. We use Pytest for backend unit tests and Playwright for E2E browser automation.

Install libraries: pip install pytest playwright.

Backend Tests (tests/test_app.py): Verifies routes and responses.

Example test:

  from app import app

  def test_home():
      client = app.test_client()
      response = client.get('/')
      assert response.status_code == 200

E2E Tests (tests/test_e2e.py): Simulates browser interactions.

Run the app first, then pytest. All 4 tests (2 backend, 2 E2E) should pass. Note: Keep the app running on localhost:5000 for Playwright to test the UI.

This ensures code quality before deployment. Keywords: Pytest Playwright tutorial, automated testing DevOps.

Step 3: Containerizing with Docker

Docker makes deployments consistent. We use a multistage Dockerfile for efficiency:

# Builder stage
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Runtime stage
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /app .
EXPOSE 5000
CMD ["python", "app.py"]

Build and push: docker build -t yourusername/flask-app:latest . and docker push yourusername/flask-app:latest.

For orchestration, docker-compose.yml spins up the full stack:

version: '3'
services:
  app:
    build: .
    ports:
      - "5000:5000"
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
  jenkins:
    image: jenkins/jenkins:lts
    ports:
      - "8080:8080"
      - "50000:50000"
    volumes:
      - jenkins_home:/var/jenkins_home
volumes:
  jenkins_home:

Run with docker-compose up. Access services at:

Flask: http://localhost:5000
Prometheus: http://localhost:9090
Grafana: http://localhost:3000
Jenkins: http://localhost:8080

The prometheus.yml configures scraping:

scrape_configs:
  - job_name: 'flask'
    scrape_interval: 15s
    metrics_path: '/metrics'
    static_configs:
      - targets: ['app:5000']

Keywords: Docker multistage build, Docker Compose DevOps.

Step 4: CI/CD with GitHub Actions and Jenkins

Automation is the heart of DevOps.

GitHub Actions (.github/workflows/ci.yml): Triggers on push/PR to main. Tests, builds, and pushes Docker image if tests pass.

  name: CI
  on: [push, pull_request]
  jobs:
    build:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v2
        - name: Set up Python
          uses: actions/setup-python@v2
          with: { python-version: '3.12' }
        - run: pip install -r requirements.txt
        - run: pytest
        - name: Build and Push Docker
          if: success()
          uses: docker/build-push-action@v2
          with:
            push: true
            tags: yourusername/flask-app:latest

Jenkins (Jenkinsfile): Pipeline for build, test, and deploy.

  pipeline {
      agent { docker { image 'python:3.12' } }
      stages {
          stage('Build') { steps { sh 'pip install -r requirements.txt' } }
          stage('Test') { steps { sh 'pytest' } }
          stage('Deploy') {
              steps {
                  withCredentials([usernamePassword(credentialsId: 'dockerhub', usernameVariable: 'USER', passwordVariable: 'PASS')]) {
                      sh 'docker build -t $USER/flask-app:latest .'
                      sh 'echo $PASS | docker login -u $USER --password-stdin'
                      sh 'docker push $USER/flask-app:latest'
                  }
              }
          }
      }
  }

Setup Jenkins: Install Docker, Docker Pipeline, and Git plugins. Restart at http://localhost:8080/restart. Create a pipeline with your GitHub repo.

Keywords: Jenkins CI/CD pipeline, GitHub Actions Docker push.

Step 5: Monitoring with Prometheus and Grafana

Expose metrics via prometheus_flask_exporter. In Grafana, add Prometheus as data source (http://prometheus:9090), create dashboards for app metrics like requests and response times.

This setup provides real-time insights. Keywords: Prometheus Grafana tutorial, Flask monitoring DevOps.

Conclusion: Why This DevOps Project Matters

This project showcases a full DevOps lifecycle: from coding and testing to deployment and monitoring. It's scalable, automated, and observable—perfect for modern apps. Fork the repo, experiment, and level up your skills!

For more repos, follow me on GitHub. Share your thoughts in the comments!

Building a YouTube Video Search App with Flask, Whisper, and RAG

Abhiraj Adhikary — Thu, 09 Oct 2025 14:26:35 +0000

Building a YouTube Video Search App with Flask, Whisper, and RAG

Ever wanted to search for specific moments in a YouTube video by just typing a keyword? Imagine pinpointing that exact timestamp where someone explains "machine learning" in a 5-minute tutorial—without scrubbing through the whole thing. I built a Flask-based web app called video-rag-search that does exactly this, using Retrieval-Augmented Generation (RAG), OpenAI's Whisper, and a sprinkle of AI magic. In this post, I'll walk you through what it does, how it works, and why it's a fun project for developers to explore.

What Does It Do?

The video-rag-search app lets you:

Paste a YouTube video link (up to 5 minutes long).
Automatically download and transcribe the audio using OpenAI's Whisper.
Generate 5 key topics from the transcript using Groq's LLM.
Search for moments in the video by selecting a topic, with results linked to exact timestamps.
Cache results for speed and store data in a MariaDB database for persistence.

Think of it as a smart search engine for YouTube videos, powered by semantic search and AI transcription. Whether you're a student skimming lectures or a developer digging through tech talks, this tool saves time.

Why Build This?

I wanted to combine my love for Flask, AI, and video content into a practical tool. YouTube is a treasure trove of knowledge, but finding specific moments can be a pain. By leveraging RAG (Retrieval-Augmented Generation), we can make video content searchable in a way that's intuitive and developer-friendly. Plus, it's a great excuse to play with cutting-edge AI libraries like Whisper and SentenceTransformers!

Tech Stack

Here's the lineup of tools and libraries powering the app:

Flask: Lightweight Python web framework for the backend and UI.
OpenAI Whisper: Transcribes YouTube audio to text with timestamps.
Groq LLM: Generates meaningful keywords from transcripts.
SentenceTransformers: Creates semantic embeddings for search.
MariaDB: Stores transcripts and embeddings for persistence.
yt-dlp: Downloads YouTube audio efficiently.
Flask-Caching: Speeds up repeated searches.
pydub: Handles audio file processing.
NumPy: Computes similarity scores for search.

You'll also need a Groq API key (free tier available) and a MariaDB instance (local or cloud).

How It Works

Let's break down the app's workflow, from YouTube link to search results.

1. Input a YouTube Link

The user submits a YouTube URL via a simple form (index.html). The app validates it using a regex to ensure it's a proper YouTube link (e.g., youtube.com/watch?v=... or youtu.be/...). Whitespace and quotes are stripped for cleanliness.

2. Download Audio

Using yt-dlp, the app downloads the audio as an MP3 file. It checks the video's duration (via pydub) and enforces a 5-minute limit to keep processing manageable. If the video's too long, you get a friendly error message.

3. Transcribe with Whisper

OpenAI's Whisper (medium model) transcribes the audio, producing segments with text and timestamps (e.g., [10.2 - 12.5] "Welcome to AI basics"). Empty or invalid segments are filtered out to ensure quality.

4. Store in MariaDB

Each segment is saved in a MariaDB table (video_data) with:

Video ID (from the YouTube URL).
Segment text, start/end times, and a timestamped YouTube link.
Semantic embeddings (as JSON, generated later).

The table is created dynamically if it doesn't exist, with defensive migrations to handle schema changes.

5. Generate Keywords with Groq

The transcript is sent to Groq's LLM (model: openai/gpt-oss-20b) with a prompt to extract 5 relevant keywords. For example, a machine learning tutorial might yield:

Neural Networks
Backpropagation
Overfitting
Gradient Descent
Activation Functions

The app parses the LLM's response, prioritizing bold (**...**), numbered, or bulleted lists, and cleans up markdown artifacts.

6. Semantic Search with Embeddings

To enable smart searching, the app uses SentenceTransformers (all-MiniLM-L6-v2) to create embeddings for each transcript segment. These are stored as JSON in MariaDB. When a user selects a keyword (e.g., "Neural Networks"), the app:

Encodes the keyword into an embedding.
Computes cosine similarity against stored segment embeddings.
Returns the best-matching segment (if similarity ≥ 0.5) with its timestamp and a clickable link.

7. Caching for Speed

Results are cached using Flask-Caching with the video ID as the key. If the same video is searched again within an hour, the app skips processing and loads from cache.

8. User Interface

The UI (built with Jinja2 templates) guides users through three steps:

Input Link: Enter a YouTube URL.
Select Keyword: Choose from 5 auto-generated keywords.
View Results: See the matching timestamp, transcript snippet, and a link to jump to that moment in the video.

Errors (e.g., invalid URL, failed transcription) are logged and displayed as user-friendly messages.

Code Highlights

Here's a peek at some key functions (simplified for brevity):

def download_audio(youtube_link):
    args = ["yt-dlp", "-x", "--audio-format", "mp3", "-o", "video.mp3", youtube_link]
    result = subprocess.run(args, capture_output=True, text=True)
    if result.returncode != 0:
        raise RuntimeError(f"Download failed: {result.stderr}")
    if not os.path.exists("video.mp3"):
        raise FileNotFoundError("Audio file not found.")

def parse_keywords(text: str) -> list:
    bold = re.findall(r"\*\*(.+?)\*\*", text)
    candidates = bold if bold else re.findall(r"^\s*\d+\.\s*(.+)", text, re.M)
    return [re.sub(r"[\s\.,;:!]+$", "", item.strip()) for item in candidates][:5]

@app.route('/select_keyword/<int:index>', methods=['GET'])
def select_keyword(index):
    keywords = session.get('keywords', [])
    query = keywords[index - 1].lower()
    embedder = SentenceTransformer('all-MiniLM-L6-v2')
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    # ... (fetch segments, compute cosine similarity, return best match)

Getting Started

Want to try it yourself? Here's how to set it up:

Clone the Repo:

   git clone <your-repo>
   cd video-rag-search

Install Dependencies:

   pip install flask whisper sentence-transformers groq pydub flask-caching mariadb numpy yt-dlp

Set Up Environment: Create a .env file:

   GROQ_API_KEY=your_groq_key
   DB_USER=root
   DB_PASSWORD=RootPass123!
   DB_HOST=localhost
   DB_PORT=3306
   DB_NAME=youtube_search

Set Up MariaDB:
Install MariaDB locally or use a cloud provider. Create a youtube_search database.
Run the App:

   python app.py

Visit http://localhost:5000 and paste a YouTube link (try a short tech tutorial!).

Challenges and Lessons

Whisper Load Time: The medium model is heavy. Preloading or using a smaller model (tiny) could speed things up, but I prioritized accuracy.
Embedding Storage: Storing embeddings as JSON in MariaDB works but isn't ideal for scale. A vector database like FAISS or Pinecone would be better (planned for v2!).
LLM Parsing: Groq's output varies, so robust parsing (e.g., handling markdown) was key to consistent keywords.
Caching: Flask-Caching with a simple in-memory store is great for dev but needs Redis for production.

What's Next?

I'm excited to extend this project with:

Quiz Generation: Turn transcripts into MCQ quizzes for learning.
User Accounts: Add login/register to track search history.
Cloud DB: Move to Neon Postgres or Render for scalability.
Audio Readout: Use text-to-speech for accessibility.
Leaderboard: Rank users by search activity or quiz scores.

Try It Out!

The video-rag-search app is a fun blend of AI, web dev, and data science. It’s open-source, so fork it, tweak it, or add your own spin! Got ideas for features or hit a snag? Drop a comment on Dev.to or open an issue on the repo.

Happy coding, and let’s make YouTube videos searchable! This was build during MariaDB hackathon by @anikchand461 and me.

GitHub Profile Summarizer with n8n and Bright Data

Abhiraj Adhikary — Sat, 30 Aug 2025 11:20:20 +0000

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

Building a Chat with GitHub with n8n and Bright Data

What I Built

I created an AI-powered GitHub Profile Summarizer using n8n and Bright Data. This agent takes a GitHub username as input, scrapes the user's public profile, and generates a concise HTML summary of their bio, top repositories, and contributions. It leverages Bright Data's web scraping capabilities and Mistral AI's language model to deliver a polished, human-readable output. The workflow handles invalid usernames gracefully, ensuring a robust user experience.

Demo

![Chat with GitHub](

Watch the demo video showcasing the workflow generating a summary for a sample GitHub profile.

n8n Workflow

The workflow JSON is available in this GitHub Gist.

Technical Implementation

The agent is built using an n8n workflow with the following components:

Webhook: Receives a GET request with a username query parameter.
Set Username: Extracts and sets the GitHub username, defaulting to abhirajadhikary06 if none is provided.
Validate Username: Uses regex (^[a-zA-Z0-9][a-zA-Z0-9-]{0,37}[a-zA-Z0-9]$) to ensure the username is valid.
Bright Data Scraper: Scrapes the GitHub profile using Bright Data's verified node.
Mistral AI: Uses the mistral-large-latest model with a prompt to summarize the scraped data into a 200-word Markdown summary.
Memory: Maintains a context window of 50 interactions for conversational continuity.
AI Agent: Configured as a conversational agent with a prefix: "You are a helpful assistant summarizing GitHub profiles."
Markdown to HTML: Converts the AI-generated Markdown summary to HTML.
Chat Trigger: Supports chat-based input for interactive use cases.
Error Handling: Returns a 400 status code with an error message for invalid usernames.

Bright Data Verified Node

The Bright Data verified node is central to the workflow, scraping the GitHub profile page (https://github.com/{{ $json.githubUsername }}) using the dataset ID gd_lyrexgxc24b3d4imjt. It reliably extracts structured data (bio, repositories, contributions) without triggering GitHub's rate limits or CAPTCHAs, thanks to Bright Data's proxy management. The scraped data is passed to the AI Agent for summarization.

Journey

Building this agent was a rewarding challenge. Integrating Bright Data's scraper required fine-tuning the dataset configuration to extract relevant profile data consistently. The Mistral AI model needed a precise prompt to produce concise summaries, which I iterated on to balance detail and brevity. Handling invalid usernames robustly was another hurdle, solved by tightening the regex validation. Learning to chain n8n nodes with AI and web scraping tools deepened my understanding of automation and data processing. The biggest lesson was the power of combining reliable data extraction (Bright Data) with intelligent processing (Mistral AI) in a seamless n8n workflow.

Building EventStack – A Lightweight, Real-Time Doodle & Luma Clone Using Tornado

Abhiraj Adhikary — Thu, 19 Jun 2025 20:37:21 +0000

Have you ever struggled to coordinate a meeting time with a group? Tools like Doodle make scheduling easier — but I wanted to create something simpler, open-source, and custom-built with a modern stack. That’s how EventStack was born.

EventStack is a lightweight event scheduling app that allows users to propose time slots, vote on availability, and finalize meetings — all with a slick frontend and real-time updates.

Why I Built It

I wanted to explore Tornado, a powerful Python framework known for handling asynchronous and real-time web apps. Unlike Flask or Django, Tornado gives fine-grained control over sockets, routing, and performance. I also wanted to integrate:

GitHub OAuth for easy login
PostgreSQL as a robust backend
A beautiful frontend using Tailwind CSS
Potential for WebSocket-based real-time voting

This project was a perfect way to combine learning with utility.

Tech Stack

Backend: Tornado – asynchronous Python framework
Frontend: Tailwind CSS + custom HTML templates
Auth: GitHub OAuth2 (manual token exchange using requests)
Database: PostgreSQL (used NeonDB Postgres during initial dev, later moved to local)
Hosting: Runs locally and deployable to platforms like Railway, etc.

Authentication with GitHub

OAuth integration was handled manually — bypassing libraries like Authlib — to better understand the token exchange process. Users log in via GitHub, and their profile data is stored securely in the database.

# Get GitHub token manually using requests
response = requests.post(
    "https://github.com/login/oauth/access_token",
    data={...}, headers={"Accept": "application/json"}
)

Features

✅ Secure GitHub login
✅ Create events with multiple time slots
✅ Vote for available slots
✅ Real-time voting updates
✅ Auto-finalization and notifications (planned)

Frontend Preview

A clean dashboard for users to view and manage events
Interactive voting interface
Markdown-ready comment section (coming)

All templates are rendered server-side with Jinja2 and styled using Tailwind for responsiveness and polish.

Lessons Learned

Tornado requires more boilerplate than Flask, but it pays off for async control.
GitHub OAuth is surprisingly easy when broken down.
NeonDB's PostgreSQL is handy for prototyping — but local or cloud-managed Postgres is better for production.
Real-time updates will require integrating tornado.websocket.WebSocketHandler.

What's Next?

Email or GitHub notifications on finalization

Final Thoughts

EventStack is more than just a clone — it’s a showcase of how you can build something powerful, fast, and modern with minimal libraries. If you’re looking to build real-time apps in Python, give Tornado a try.

Want to contribute? The GitHub repo will be public soon. Drop a ⭐️ if you like the project!

My Contribution to Kharagpur Winter of Code 2024 (KWOC)

Abhiraj Adhikary — Sun, 19 Jan 2025 17:38:53 +0000

As Kharagpur Winter of Code (KWOC) 2024 draws to a close, I am thrilled to share my journey, contributions, and learnings. KWOC provided me with a platform to contribute to open-source projects, hone my technical skills, and collaborate with a vibrant community of developers. Here’s an overview of the work I accomplished during this enriching experience:

Projects I Worked On

1. Beautiify

Beautiify is a dynamic project focused on enhancing web design components. I contributed to multiple features:

Infinite Scroll Emoji Background
- PR Link: #1391
- Description: Implemented a visually appealing background with infinitely scrolling emojis. Users can add custom components to it via HTML.
Responsive Feedback Form-2
- PR Link: #1392
- Description: Made the feedback form responsive across devices. Enhanced the design with a gradient background, green borders for placeholders, and star animations.
Swag Shipment Form
- PR Link: #1397
- Description: Designed a comprehensive swag shipment form with all essential placeholders.
Error Pages Category and Component
- PR Link: #1405
- Description: Introduced a category for error pages and added a reusable error component for contributors to build upon.
Spooky Themed Hero Component Responsiveness
- PR Link: #1439
- Description: Made the spooky-themed hero component fully responsive, ensuring images adapt seamlessly across devices.

2. Eventica

Eventica is a platform designed for managing events efficiently. My contribution included:

Home Page Design Enhancement
- PR Link: #45
- Description: Revamped the home page with changing images, adding a vibrant and engaging touch to the design.

3. MindDrive

MindDrive is an innovative project aimed at improving user experiences. My contribution was:

Arrow Visibility in Dark Mode
- PR Link: #65
- Description: Ensured arrows are clearly visible in dark mode, enhancing accessibility and user experience.

Summary of My Work and Learnings

During KWOC 2024, I explored various aspects of front-end development, including:

Responsive Design: I learned the importance of making components adaptable to different devices and screen sizes.
Enhanced Aesthetics: Implementing gradients, animations, and dynamic backgrounds sharpened my design sensibilities.
Collaboration: Working with mentors and fellow contributors taught me effective communication and the value of feedback.
Version Control: Gained deeper insights into Git and GitHub workflows, managing multiple branches, and resolving conflicts.

Conclusion

KWOC 2024 has been an incredible learning journey. Each project challenged me to push my boundaries and equipped me with skills that I will carry forward in my development journey. I am grateful for the opportunity to contribute to impactful projects and collaborate with talented individuals.

To future contributors: Open source is not just about code—it's about community, learning, and growth. Dive in, explore, and enjoy the process!

Feel free to tweak or add personal touches to the draft as needed. Let me know if you'd like me to expand on any section!

Leveraging DEM using Daytona

Abhiraj Adhikary — Sat, 28 Dec 2024 21:29:42 +0000

In this blog, we'll dive 🌊🤿 into building a Streamlit-based dashboard for analyzing Spotify User Sentiment using Airbyte for data extraction, Motherduck (DuckDB) for storage and querying, and Daytona for streamlined development environments. This project explores how these technologies integrate with Streamlit to create an interactive and insightful data analysis application.

📁 Folder Structure Overview

SPOTIFY-REVIEWS-ANALYSIS
├── .devcontainer
│   ├── devcontainer.json
├── .streamlit
│   ├── config.toml
├── assets
│   ├── main.png
├── src
│   ├── config
│   │   ├── __init__.py
│   │   ├── config.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── database.py
│   ├── app.py
├── .env
├── venv
├── .gitignore
├── LICENSE.md
├── README.md
├── requirements.txt

.devcontainer/devcontainer.json: Configures development environment.
.streamlit/config.toml: Streamlit's UI style and configuration.
assets: Stores static assets like images.
src/config/config.py: Handles environment variables.
src/utils/database.py: Queries data from Motherduck.
src/app.py: Streamlit dashboard and logic.
.env: Stores environment variables securely.

👉 Tips On Folder Structure

SPOTIFY-REVIEWS-ANALYSIS: This is the outer folder of the repository.
src: This folder contains config and utils for project logic.

☀️ Daytona Integration

Daytona is an open-source Development Environment Manager (DEM) designed to simplify and streamline the process of setting up development environments.

🛠️ Why Daytona?

Consistency: Ensures uniform development environments across all team members.
Scalability: Manages multiple environments seamlessly.
Security: Isolates and secures environments.
Efficiency: Reduces overhead during setup and switching between environments.

📚 Daytona Setup

Installation: Follow the official installation guide.
Configuration: Create a daytona.yaml file with dependencies and environment configurations.
Environment Initialization: Run Daytona commands to set up your development environment.

environment:
  name: spotify-reviews-analysis
  dependencies:
    - python
    - pip
  scripts:
    start: "streamlit run src/app.py"

Daytona ensures every developer has an identical and functional environment for running the project seamlessly.

🎏 Streamlit Setup

Streamlit is an open-source Python library that enables developers to create interactive web apps for data science and machine learning projects.

📜 Code Snippet: Streamlit Core Structure

import streamlit as st
import plotly.express as px
from utils.database import get_reviews_for_sentiment

st.set_page_config(page_title="Spotify Analysis", page_icon="🗳️", layout="wide")

# Title
st.markdown("## 🗳️ Spotify Sentiment Analysis")

# Sidebar
sentiment_type = st.sidebar.selectbox("Sentiment Analysis Type", ["Polarity", "Subjectivity"])

# Fetch and Display Data
reviews_df = get_reviews_for_sentiment()
st.dataframe(reviews_df)

When you run app.py with streamlit run src/app.py, the dashboard launches at http://localhost:8501.

📊 Core Logic of Sentiment Analysis

Sentiment analysis is powered by TextBlob to determine the polarity (positive/negative sentiment) and subjectivity (factual/opinionated content) of reviews.

🧠 Sentiment Analysis Function

from textblob import TextBlob

def get_sentiment(text):
    blob = TextBlob(str(text))
    return blob.sentiment.polarity if sentiment_type == "Polarity" else blob.sentiment.subjectivity

📈 Visualization Example

fig = px.histogram(reviews_df, x='sentiment', title='Sentiment Distribution')
st.plotly_chart(fig)

🦆 Database Integration with Motherduck

🔗 database.py

import duckdb
from config.config import MOTHERDUCK_TOKEN

def get_connection():
    return duckdb.connect(f"md:?token={MOTHERDUCK_TOKEN}")

def get_reviews_for_sentiment():
    conn = get_connection()
    query = """
    SELECT content, score FROM spotify_reviews WHERE content IS NOT NULL
    """
    return conn.execute(query).fetch_df()

This code fetches Spotify review data securely using MOTHERDUCK_TOKEN stored in .env through config.py file.

🗂️ config.py

import os
from dotenv import load_dotenv

load_dotenv()
MOTHERDUCK_TOKEN = os.getenv("MOTHERDUCK_TOKEN")

🔄 Connection Between `app.py` and `database.py`

The app.py imports get_reviews_for_sentiment from database.py, creating a seamless flow of data into the dashboard.

🎯 Conclusion

We successfully built a Spotify Reviews Sentiment Analysis Dashboard using Airbyte, Motherduck, Streamlit, and Daytona. This project demonstrates the power of consistent development environments, robust data storage, and insightful visualization.

👨‍💻 Check out the complete code on GitHub.
📺 Live PROJECT https://spotify-sentiment-analysis.streamlit.app

Sentiment Analysis #Happy Coding! #Daytona 🚀🦆

📊 Dropbox User Sentiment Analysis using Airbyte 🪼 and Motherduck 🦆

Abhiraj Adhikary — Sat, 28 Dec 2024 11:58:59 +0000

In this blog, we'll dive 🌊🤿 into building a Streamlit-based dashboard for analyzing Dropbox User Sentiment using Airbyte for data extraction and Motherduck (DuckDB) for storage and querying. This post continues from our previous discussion in "Leveraging Airbyte 🪼 and Motherduck 🦆 for Sentiment Analysis" and explores how these technologies integrate with Streamlit to create an interactive and insightful data analysis application.

📁 Folder Structure Overview

DROPBOX-REVIEWS-ANALYSIS
├── .devcontainer
│   ├── devcontainer.json
├── .streamlit
│   ├── config.toml
├── assets
│   ├── main.png
├── dropbox-reviews-analytics
│   ├── src
│   │   ├── config
│   │   │   ├── __init__.py
│   │   │   ├── config.py
│   │   ├── utils
│   │   │   ├── __init__.py
│   │   │   ├── database.py
│   │   ├── app.py
├── .env
├── venv
├── .gitignore
├── LICENSE.md
├── README.md
├── requirements.txt

.devcontainer/devcontainer.json: Configures development environment.
.streamlit/config.toml: Streamlit's UI style and configuration.
assets: Stores static assets like images.
src/config/config.py: Handles environment variables.
src/utils/database.py: Queries data from Motherduck.
src/app.py: Streamlit dashboard and logic.
.env: Stores environment variables securely.

👉 Tips On Folder Structure

DROPBOX-REVIEWS-ANALYSIS: This is the outer folder of repo on Github (while building project on your own don't create this folder)
dropbox-reviews-analytics: This is the main folder where you will add src followed by config and utils

🎏 Streamlit Setup

Streamlit is an open-source Python library that enables developers to create interactive web apps for data science and machine learning projects.

📜 Code Snippet: Streamlit Core Structure

import streamlit as st
import plotly.express as px
from utils.database import get_reviews_for_sentiment

st.set_page_config(page_title="Dropbox Analysis", page_icon="🗳️", layout="wide")

# Title
st.markdown("## 🗳️ Dropbox Sentiment Analysis")

# Sidebar
sentiment_type = st.sidebar.selectbox("Sentiment Analysis Type", ["Polarity", "Subjectivity"])

# Fetch and Display Data
reviews_df = get_reviews_for_sentiment()
st.dataframe(reviews_df)

When you run app.py with streamlit run src/app.py, the dashboard launches at http://localhost:8501.

📊 Core Logic of Sentiment Analysis

Sentiment analysis is powered by TextBlob to determine the polarity (positive/negative sentiment) and subjectivity (factual/opinionated content) of reviews.

🧠 Sentiment Analysis Function

from textblob import TextBlob

def get_sentiment(text):
    blob = TextBlob(str(text))
    return blob.sentiment.polarity if sentiment_type == "Polarity" else blob.sentiment.subjectivity

📈 Visualization Example

fig = px.histogram(reviews_df, x='sentiment', title='Sentiment Distribution')
st.plotly_chart(fig)

🦆 Database Integration with Motherduck

🔗 database.py

import duckdb
from config.config import MOTHERDUCK_TOKEN

def get_connection():
    return duckdb.connect(f"md:?token={MOTHERDUCK_TOKEN}")

def get_reviews_for_sentiment():
    conn = get_connection()
    query = """
    SELECT content, score FROM dropbox_reviews WHERE content IS NOT NULL
    """
    return conn.execute(query).fetch_df()

This code fetches Dropbox review data securely using MOTHERDUCK_TOKEN stored in .env through config.py file.

🗂️ config.py

import os
from dotenv import load_dotenv

load_dotenv()
MOTHERDUCK_TOKEN = os.getenv("MOTHERDUCK_TOKEN")

🔄 Connection Between `app.py` and `database.py`

The app.py imports get_reviews_for_sentiment from database.py, creating a seamless flow of data into the dashboard.

⚙️ Why `devcontainer.json` and `config.toml`?

devcontainer.json: Provides a consistent environment for development, anyone willing to use Docker for containerization can use it.
config.toml: Controls Streamlit UI customization (e.g., colors, fonts, themes).

Example config.toml:

[theme]
primaryColor="#0061FE"
backgroundColor="#0E1117"
secondaryBackgroundColor="#262730"
textColor="#FAFAFA"
font="Monospace"

⚠️ Deployment Challenges

Avoid specifying exact library versions in requirements.txt, like instead of plotly == 5.24.1 write plotly only.
Ensure .env is configured correctly in deployment environments.
Validate database connection tokens during runtime.

Backend Deployment Flow:

Load environment variables from .env.
Establish connection with Motherduck DB.
Fetch and process data.
Render dashboard in Streamlit.

🎯 Conclusion

We successfully built a Dropbox Reviews Sentiment Analysis Dashboard using Airbyte, Motherduck, and Streamlit. This project demonstrates the power of data analysis and visualization.

👨‍💻 Check out the complete code on GitHub.
📺 Live PROJECT https://airbyte-motherduck-hackathon-sentiment-analysis.streamlit.app

Sentiment Analysis #Happy Coding! #Airbyte🪼🦆

Leveraging Airbyte 🪼 and Motherduck 🦆 for Sentiment Analysis

Abhiraj Adhikary — Fri, 27 Dec 2024 09:46:31 +0000

This blog is a part of the Airbyte + Motherduck Hackathon where I’ll demonstrate how to connect Google Sheets with Motherduck using Airbyte. This setup forms the backbone of my Dropbox Sentiment Analysis Dashboard, enabling seamless data integration and storage for analysis. This blog makes it easy to make your fist setup on Airbyte between your source and destination, it is advised to go through the official documentation after this. Let’s dive in! 🤿🌊

Overview of the Project 🗺️

The goal is to analyze user reviews of the Dropbox app using sentiment analysis techniques. Here's a breakdown of the workflow:

Dataset Source: A CSV dataset of Dropbox app user reviews, downloaded from Kaggle.
Preprocessing: Uploaded the CSV to Google Sheets for basic formatting (e.g., converting ratings from text to integers).
Airbyte Integration: Used Airbyte to connect Google Sheets (source) with Motherduck (destination).
Destination Setup: Motherduck stores the data in DuckDB (similar to SQL databases).
Analysis: Built a sentiment analysis dashboard using Python and Streamlit.

Let me walk you through the setup process for Airbyte and Motherduck. 🎮

What is Airbyte? 🧐

Airbyte is an open-source data integration platform that helps synchronize data between different sources and destinations. It provides a wide range of connectors and a user-friendly interface to automate data workflows.

What is Motherduck? 🤔

Motherduck is a cloud-based platform built on DuckDB, a fast and lightweight SQL engine. It allows efficient data analysis and management, making it an excellent choice for scalable and real-time data handling.

Setting Up Airbyte 🪼

Step 1: Go to Airbyte and log in.

You’ll land in the Airbyte workspace. Follow these steps:

Create a New Connection

Click on New Connection and choose Google Sheet as your source.
Share your dataset on Google Sheets and copy the link.
Paste the shared link into the placeholder in Airbyte.
Authenticate your Google account (ensure it's the same account linked to the Google Sheet).

Select Destination

Under the Marketplace, search for and select Motherduck.
Authenticate Motherduck as the destination (process is written below).

Configuring Motherduck 🦆

Step 2: Head over to Motherduck and sign up.

After signing up, delete the sample workspace (not needed for this setup).
Navigate to Settings under your profile.
In the General tab, generate a Motherduck token (API Key).
Copy the token and paste it into Airbyte when prompted.

Schedule the Sync 🎗️

Configure the sync schedule to keep your Motherduck database updated with any changes in the Google Sheet.
Click Next to finalize the connection.

Validating the Connection 🔄

After completing the setup, check if the source data has successfully transferred to the destination:

On the left panel of your Motherduck page, find Attached Databases.
Under my_db, navigate to main, where you’ll see your dataset (e.g., dropbox_reviews).
Start a new notebook and run queries to confirm the data transfer.

Example query:

from my_db.main.dropbox_reviews
select
    score,
    content,
    reviewId,
    _airbyte_raw_id,
    _airbyte_extracted_at
limit 100

What’s Next? 🚞

This blog covers the setup of Airbyte and Motherduck for seamless data integration. In my next post, I’ll dive into:

Project Structure: A detailed walkthrough of the Dropbox Sentiment Analysis project.
Coding Logic: Explanation of Python libraries used for sentiment analysis.
Dashboard Deployment: How to deploy the application on Streamlit.

PROJECT 📊 : https://airbyte-motherduck-hack-dropbox-sentiment-analysis.streamlit.app

Stay tuned for an exciting journey into sentiment analysis of Dropbox User Reviews! 🚀🌕🪼🦆

Edit: Blog on "Dropbox User Review Sentiment Analysis" is Out today..28th December

AirbyteHQ #Motherduck #HappyConnecting

Unlock Your Creativity: 6 End-to-End Python Projects Using Open-Source APIs

Abhiraj Adhikary — Thu, 19 Dec 2024 05:41:28 +0000

Are you looking to build impactful projects with Python and open-source APIs? Whether you're an aspiring developer or a seasoned coder, crafting end-to-end applications can showcase your skills and enhance your portfolio. This blog explores six innovative project ideas that leverage Python as the main language and integrate different open-source tools, with features like GitHub OAuth using Supabase. Let’s dive in!

1. Personalized Job Finder Platform

Description: Create a platform where users can find jobs tailored to their skills and location, track applications, and save resumes.

Features:

GitHub OAuth login using Supabase.
Job recommendations based on user preferences.
Application tracking system.

Open-Source Tools:

Supabase: For user authentication and database management.
FastAPI: To develop a robust backend.
BeautifulSoup: For web scraping job listings.
Streamlit: To create an interactive front end.
PDFPlumber: For parsing uploaded resumes.

2. AI-Powered Recipe Generator

Description: Develop a tool that generates recipes based on available ingredients and analyzes their nutritional value.

Features:

Save recipes via Supabase.
AI-generated recipes using text models.
Nutrition analysis of recipes.

Open-Source Tools:

Supabase: For recipe storage and user authentication.
Hugging Face Transformers: For generating recipe suggestions.
Spoonacular API: For nutrition analysis.
FastAPI: To handle backend operations.
Streamlit: For a seamless UI experience.

3. Collaborative Study Platform

Description: Build a platform where users can collaborate on notes in real time and participate in gamified study challenges.

Features:

Real-time collaborative document editing.
Gamification with leaderboards.
GitHub OAuth for login.

Open-Source Tools:

Supabase: For managing users and storing notes.
Socket.IO: For real-time collaboration.
Quill.js: To integrate a rich text editor.
MongoDB: For storing documents.
FastAPI: Backend development.

4. Eco-Friendly Shopping Assistant

Description: A web app that helps users evaluate products for eco-friendliness and calculates the carbon footprint of their shopping habits.

Features:

Barcode scanner for product lookup.
Eco-friendliness ratings of products.
Carbon footprint calculations.

Open-Source Tools:

Supabase: For user authentication and data storage.
ZXing API: To scan barcodes.
Open Food Facts API: For product information.
Pandas: To calculate and analyze data.
Streamlit: For visualizing the insights.

5. Fitness Tracker with Social Features

Description: A fitness tracker that lets users monitor their progress and share achievements with friends.

Features:

Track fitness goals and daily activity.
Social sharing of fitness achievements.
GitHub OAuth for login.

Open-Source Tools:

Supabase: For managing user data and achievements.
Google Fit API: To sync fitness data.
Matplotlib: For creating visualizations of progress.
Dash: Interactive dashboards for users.
FastAPI: Backend services.

6. AI-Powered Code Review Assistant

Description: Develop a tool that integrates with GitHub to perform automated code reviews and provide suggestions.

Features:

GitHub OAuth for authentication.
Automated code analysis with actionable insights.
Integration with pull requests for seamless code reviews.

Open-Source Tools:

Supabase: Authentication and user management.
GitHub API: To fetch and manage pull requests.
Hugging Face Transformers: For analyzing and improving code.
FastAPI: Backend for handling requests.
Streamlit: UI to display review results.

Conclusion

These projects are excellent for mastering Python and open-source tools while building real-world applications. Whether it’s a job finder, recipe generator, or code review assistant, the possibilities are endless. By integrating APIs like Supabase, Hugging Face, or Open Food Facts, you’ll learn to create efficient, scalable solutions.

Start building today, and let your creativity shine!

UI Card Library

Abhiraj Adhikary — Sat, 14 Dec 2024 19:57:37 +0000

Participating in the Frontend Challenge - December Edition, CSS Art: December has been an inspiring journey into the world of CSS art.

Inspiration

A curated collection of beautifully designed UI cards with direct access to their Figma designs. Each card includes creator details with links to their LinkedIn and Twitter profiles. Perfect for inspiration and collaboration!

Demo

You can view the CSS art piece I created for this challenge below:

Github Repo: https://github.com/abhirajadhikary06/UI-Card-Library
Live Preview: https://ui-card-library.vercel.app/

Journey

Embarking on this project allowed me to delve deeper into CSS techniques, such as positioning, transformations, and animations. I learned how to manipulate simple HTML elements to create intricate designs, enhancing both my technical skills and artistic expression.

One of the key takeaways was understanding the importance of planning and sketching designs before coding, which streamlined the development process. I'm particularly proud of how the final piece reflects the initial concept, demonstrating the potential of CSS in creating art.

Looking ahead, I aim to experiment with more complex animations and interactive elements, further blending art with functionality in web design.

Note: The code for this project is open-source and available under the MIT License.

Thank you for viewing my submission!

DEV Community: Abhiraj Adhikary

Apache Fluss: Architecting the Streaming-First Persistent Data Stack

The Core Architecture: How It Works

1. Ingestion Layer (CDC, IoT, Logs)

2. Storage Core — Apache Fluss

Responsibilities

Key Innovation

3. Compute Layer — Apache Flink SQL

Major Capability: Union Reads

Typical Workloads

4. Persistence Layer — Apache Iceberg

Benefits

5. Query & OLAP Layer

Databend

Dremio

Trino

6. AI & Vector Layer

Vector Databases

Use Cases

7. Infrastructure & Operations Layer

Kubernetes

Terraform

Airflow

Implementation & Practical Use Case

Real-Time E-Commerce Platform

Data Sources

Processing Flow

Strategic Evaluation

Key Advantages

Reduced Costs

Unified Processing Logic

AI-Ready Infrastructure

Cloud-Native Scalability

Conclusion

Parallel & Concurrent Computing

1. Motivation: The End of "Free Lunch"

2. Serial vs. Parallel Execution

3. Key Definitions

Concurrency vs. Parallelism

Deterministic vs. Non-deterministic Execution

4. Common Pitfalls

Race Conditions

Deadlocks

Synchronization Issues

2.1 Shared Memory Parallelism (Multithreading)

2.2 Distributed Memory Parallelism (Multiprocessing)

Summary Comparison

3.1 What is the GIL and Why Does It Exist?

3.2 Impact on Python Multithreading

3.3 How the GIL is Bypassed

1. Native Extensions (The "C" Escape)

2. I/O Operations

3. Multiprocessing

Summary: Threading vs. Multiprocessing in Python

4.1 Threading Use Cases

4.2 ThreadPoolExecutor

What is a Thread Pool?

Key Methods: map vs. submit

Code Example: Efficiently Fetching Data

Why use a Pool instead of manual Threads?

6.1 Identifying Workload Characteristics

CPU-Bound (Compute-Heavy)

I/O-Bound (Wait-Heavy)

6.2 Performance Comparison Table

6.3 Demonstrations (Mental Model)

I/O-Bound: The sleep() Test

CPU-Bound: The Mathematical Loop

Summary: The Decision Tree

7.1 The multiprocessing Module

7.2 Pool, Map, and Starmap

ProcessPoolExecutor

7.3 Communication & Shared Memory

7.4 Limitations in Interactive Environments (Jupyter)

7.5 Summary: When to use Multiprocessing

8.1 Shared State Modification Problems

8.2 Demonstration of Incorrect Results

8.3 Threading vs. Multiprocessing Behavior

8.4 Synchronization Primitives

1. The Lock (Mutex)

2. The Semaphore

Key Methods: `map` vs. `submit`

I/O-Bound: The `sleep()` Test

7.1 The `multiprocessing` Module

`ProcessPoolExecutor`

Lowercase Methods (`send`, `recv`, `bcast`)

Uppercase Methods (`Send`, `Recv`, `Bcast`)