DEV Community: maryu0

Python AsyncIO Explained: Coroutines, Tasks, Queues, Locks & Semaphores with Examples

maryu0 — Fri, 05 Jun 2026 14:49:10 +0000

Asynchronous programming is one of those topics that feels confusing until it suddenly clicks.

When I first started learning Python's asyncio, I understood the syntax but struggled to understand why things behaved the way they did. Why does await sometimes run sequentially? Why do we need tasks? When should we use locks, queues, or semaphores?

To build a stronger intuition, I created a small repository of focused examples that explore the core concepts of asyncio step by step.

Repository: https://github.com/maryu0/python-asyncio

Why AsyncIO?

Traditional Python code executes one operation at a time.

For CPU-heavy work, this is often fine. However, many modern applications spend most of their time waiting for external resources:

API calls
Database queries
File operations
Network requests
Message queues

While the program is waiting, the CPU is mostly idle.

asyncio allows Python to switch to other work during these waiting periods, improving efficiency for I/O-bound applications.

Learning Path

The repository is organized from beginner-friendly concepts to more practical concurrency patterns.

1. Coroutines

File: coroutine.py

The starting point of asyncio.

You'll learn:

How to create async functions
What await does
How the event loop executes coroutines

Example:

async def greet():
    print("Hello")
    await asyncio.sleep(1)
    print("World")

Understanding coroutines is the foundation for everything else.

2. Why Tasks Matter

File: Need_for_TASKS.py

One of the biggest beginner misconceptions is assuming that multiple await statements automatically run concurrently.

They don't.

Consider:

await task1()
await task2()
await task3()

This executes sequentially.

This example demonstrates why asyncio.create_task() exists and how tasks enable concurrent execution.

3. Running Concurrent Work

File: tasks.py

Once tasks are introduced, we can run multiple coroutines at the same time.

Example:

t1 = asyncio.create_task(worker())
t2 = asyncio.create_task(worker())

await t1
await t2

This significantly reduces waiting time for I/O-heavy operations.

4. gather() and TaskGroup

File: gather.py

When managing multiple concurrent operations, Python provides powerful abstractions.

asyncio.gather()

Run multiple coroutines together and collect their results.

results = await asyncio.gather(
    task1(),
    task2(),
    task3()
)

TaskGroup

Introduced in newer Python versions, TaskGroup provides safer task management and structured concurrency.

This file compares both approaches and explains when each is useful.

5. Protecting Shared Resources

File: Lock.py

Concurrency introduces a new challenge: race conditions.

When multiple coroutines access shared data simultaneously, unexpected behavior can occur.

asyncio.Lock ensures only one coroutine modifies a shared resource at a time.

lock = asyncio.Lock()

async with lock:
    shared_counter += 1

This pattern is essential whenever multiple tasks update shared state.

6. Practical AsyncIO Patterns

File: Practice.py

This file combines multiple concepts into realistic examples:

Concurrent execution with gather
Timeout handling using wait_for
Fallback strategies
Async generators
Streaming-style output

These patterns are commonly used in production systems interacting with APIs and external services.

7. Producer-Consumer Queues

File: queue.py

Real-world systems often produce work faster than it can be processed.

asyncio.Queue acts as a buffer between producers and consumers.

Common use cases include:

Job processing systems
Event pipelines
Background workers
Message handling

This example demonstrates how queues help smooth bursts of incoming work.

8. Limiting Concurrency with Semaphores

File: semaphore.py

Sometimes running everything concurrently is actually a bad idea.

Imagine sending 1,000 API requests simultaneously.

You might:

Hit rate limits
Overload a service
Consume excessive resources

asyncio.Semaphore limits how many tasks run at once.

semaphore = asyncio.Semaphore(3)

async with semaphore:
    await make_request()

This pattern is extremely useful when working with external APIs.

Key Takeaways

After working through these examples, a few ideas became much clearer:

Coroutines define asynchronous work.
Tasks enable concurrent execution.
gather() and TaskGroup help coordinate multiple tasks.
Locks prevent race conditions.
Queues provide buffering between producers and consumers.
Semaphores prevent excessive concurrency.
Async generators enable streaming-style workflows.

Most importantly, asyncio isn't about making code magically faster.

It's about making better use of waiting time.

Repository Structure

python-asyncio/
│
├── coroutine.py
├── Need_for_TASKS.py
├── tasks.py
├── gather.py
├── Lock.py
├── Practice.py
├── queue.py
├── semaphore.py
└── Concepts.md

Recommended Learning Order

coroutine.py
Need_for_TASKS.py
tasks.py
gather.py
Lock.py
Practice.py
queue.py
semaphore.py
Concepts.md

Following this order helps build intuition gradually, from basic coroutines to advanced concurrency control patterns.

Who Is This For?

This repository is intended for:

Python beginners learning asyncio
Students exploring concurrent programming
Developers preparing for backend engineering
Anyone who wants hands-on asyncio practice before using frameworks or production systems

The examples are intentionally small and educational, focusing on clarity rather than production architecture.

Explore the Repository

GitHub: https://github.com/maryu0/python-asyncio

If you're learning asyncio, I'd love to know:

Which asyncio concept was the hardest for you to understand when you first started?

Tags: #python #asyncio #beginners #programming

I built an AI debugging assistant with Llama 3.3 — here's what actually worked

maryu0 — Fri, 15 May 2026 19:13:56 +0000

Every developer has been there. It's 2am, your CI pipeline is red, and you're staring at a wall of error logs trying to figure out which of the 47 things that could be wrong is actually wrong.

That pain is what made me build FailSense — an AI debugging assistant that ingests error logs and returns ranked, actionable fixes using Llama 3.3. Here's an honest breakdown of what I built, the mistakes I made, and what I'd do differently.

~40% reduction in debugging time · ~99% uptime on AWS · 2 services, one pipeline

The problem with debugging + LLMs

The naive approach is obvious: dump the error into ChatGPT and hope for the best. It kind of works. But it breaks down quickly when:

Your error spans multiple files and stack frames
The root cause is buried 3 levels deep in a dependency
You need ranked fixes, not a monologue
You want this in your own pipeline, not a chat UI So I decided to build something purpose-built for error log analysis — with structured output, confidence-ranked fixes, and a real deployment.

Architecture: keep it boring

The stack is deliberately simple. Two services. One job each.

Next.js (Frontend) → FastAPI (Backend) → Llama 3.3 via Groq

The Next.js frontend handles log input and renders ranked fixes. The FastAPI backend owns all the prompt logic, output parsing, and error handling. Llama 3.3 runs on Groq for low-latency inference — this matters more than you'd think when users are already frustrated.

Lesson learned: Don't add a third service just because you can. Every hop between services is a new failure point, a new auth layer, and a new thing to monitor at 2am.

The prompt that actually works

This took the most iteration. The first version just said "here's an error, fix it." The output was verbose, unstructured, and hard to parse programmatically. Here's the version that works:

system_prompt = """
You are a senior software engineer debugging production errors.
Given an error log, return ONLY a JSON array of fixes, ranked by likelihood.
Each fix must have:
  - rank (int): 1 = most likely cause
  - cause (str): one sentence root cause
  - fix (str): exact steps to resolve
  - confidence (float): 0.0 to 1.0

Return nothing else. No preamble. No markdown. Raw JSON only.
"""

Three things made this work:

Explicit output format — telling the model to return raw JSON (not markdown-wrapped JSON) saved me a ton of parsing headaches
Role framing — "senior software engineer" shifts the model toward precise, opinionated output over safe hedging

3. Ranked by likelihood — forcing a ranking means the most actionable fix is always first, which is what a tired developer actually wants

Parsing LLM output without going insane

LLMs are not deterministic JSON machines. Sometimes Llama 3.3 returns perfect JSON. Sometimes it adds a sentence before it. Sometimes the confidence is a string instead of a float. Here's the defensive parsing layer I built:

import json, re

def parse_fixes(raw: str) -> list:
    # Strip markdown fences if present
    clean = re.sub(r"```

(?:json)?|

```", "", raw).strip()

    try:
        fixes = json.loads(clean)
    except json.JSONDecodeError:
        # Try to extract the JSON array from within a larger string
        match = re.search(r'\[.*\]', clean, re.DOTALL)
        fixes = json.loads(match.group()) if match else []

    # Normalize confidence to float
    for f in fixes:
        f["confidence"] = float(f.get("confidence", 0.5))

    return sorted(fixes, key=lambda x: x["rank"])

Hot take: If you're not writing a fallback parser for LLM output, you're writing a bug. Models drift, prompts drift, and what works today breaks next month.

Deployment: boring is good

Next.js on Vercel. FastAPI on Railway. Both wired up with GitHub Actions for CI/CD. Every push to main triggers a deploy. The whole thing costs under $5/month to run.

The ~99% uptime wasn't magic — it was just not doing anything clever. No custom load balancers, no exotic infra. Just two managed services that restart themselves when they crash.

What I'd do differently

Add evals from day one. I had no systematic way to know if a prompt change made things better or worse. I was eyeballing it. Don't eyeball it.
Stream the response. Waiting 3-4 seconds for the full JSON response feels slow. Streaming partial results — even just a loading state with intermediate tokens — makes it feel snappy.

- Log everything. What errors are users pasting in? What fixes are they ignoring? This data is gold for improving the prompt and I threw it away by not logging it.

The takeaway

Building production AI tools is less about the model and more about the scaffolding around it. The prompt, the output parser, the fallback handling, the latency — that's where the real engineering happens.

FailSense isn't magic. It's a well-prompted LLM with a defensive parser and a boring deployment. That's enough to cut debugging time by ~40% and actually ship something people use.

Check out the full source on GitHub · Built with Next.js, FastAPI, Groq, and Llama 3.3