10 gRPC Techniques Every Python Developer Needs for Faster Microservices

#programming #devto #python #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Microservices need to talk to each other. The way they talk matters a lot. You have the most popular option, REST, where you send JSON over HTTP. That works, but it comes with a lot of weight. Your services have to parse text, agree on fields, and handle errors by hand. gRPC changes that. It uses Protocol Buffers to define a contract. You write a .proto file, and from that file you generate code for both the server and the client. The messages are binary, which makes them smaller and faster to process. The framework also gives you streaming, strong typing, and automatic retries. I have built several services this way, and the difference in speed and clarity is immediate.

Let me start with the first technique. You define a shared contract in a .proto file. This file lives in a central repository, or you copy it to each service. Inside, you describe the messages and the services. A message is like a struct with typed fields. A service lists the RPC methods. For example, a user service might have a method GetUser that takes a UserRequest and returns a UserResponse. Here is a simple proto file I use:

syntax = "proto3";

package userservice;

service UserService {
  rpc GetUser (UserRequest) returns (UserResponse);
  rpc ListUsers (ListRequest) returns (stream UserResponse);
  rpc CreateUsers (stream CreateRequest) returns (CreateSummary);
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

message UserRequest {
  string user_id = 1;
}

message UserResponse {
  string user_id = 1;
  string name = 2;
  string email = 3;
  int32 age = 4;
}

message ListRequest {
  int32 page_size = 1;
  string page_token = 2;
}

message CreateRequest {
  string name = 1;
  string email = 2;
  int32 age = 3;
}

message CreateSummary {
  int32 success_count = 1;
  repeated string failed_ids = 2;
}

message ChatMessage {
  string user_id = 1;
  string message = 2;
  int64 timestamp = 3;
}

After writing that file, you run a tool to generate the Python code. The command looks like this:

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. users.proto

That gives you two files: users_pb2.py for the message classes and users_pb2_grpc.py for the server and client stubs. This generated code is what you import in your actual service. You never type the serialization logic by hand. The contract is enforced by the compiler. If someone changes a field number or type, the code will break at compile time, not at runtime. That alone saves hours of debugging.

Now, the second technique is implementing a unary RPC with async support. I use Python’s asyncio heavily because it lets me handle many requests without blocking. The gRPC async server works with an event loop. You create a class that inherits from the generated servicer base class. Inside, you write an async method that matches the RPC name. Here is a minimal example for GetUser:

import asyncio
import grpc
from concurrent import futures
import users_pb2
import users_pb2_grpc

class UserServiceServicer(users_pb2_grpc.UserServiceServicer):
    async def GetUser(self, request, context):
        await asyncio.sleep(0.01)  # simulate DB call
        return users_pb2.UserResponse(
            user_id=request.user_id,
            name="Alice",
            email="alice@example.com",
            age=30
        )

async def serve():
    server = grpc.aio.server(
        futures.ThreadPoolExecutor(max_workers=10),
        options=[('grpc.max_send_message_length', 4 * 1024 * 1024)]
    )
    users_pb2_grpc.add_UserServiceServicer_to_server(UserServiceServicer(), server)
    server.add_insecure_port('[::]:50051')
    await server.start()
    await server.wait_for_termination()

if __name__ == '__main__':
    asyncio.run(serve())

The grpc.aio.server uses an executor for blocking operations, but the async methods themselves are non‑blocking. This means that while one request is waiting for a database call, another request can be processed. The options dictionary lets you tune message sizes and other parameters. I set max_send_message_length to 4 MB to handle large responses without error.

The third technique is server‑side streaming. Instead of returning a single response, you yield multiple responses. This is perfect for listing users or sending a feed of events. The client gets a stream of objects. On the server side, you write an async generator that yields each result. I have used this for paginated API endpoints where the client doesn’t want to call multiple times. Here is ListUsers:

async def ListUsers(self, request, context):
    users = [
        {"id": "1", "name": "Bob", "email": "bob@example.com", "age": 25},
        {"id": "2", "name": "Carol", "email": "carol@example.com", "age": 28},
    ]
    for user in users:
        yield users_pb2.UserResponse(
            user_id=user["id"],
            name=user["name"],
            email=user["email"],
            age=user["age"]
        )
        await asyncio.sleep(0.005)  # simulate processing delay

Notice that I use yield inside an async function. The client reads each message as it arrives. The server can also cancel the stream if the client disconnects, which is handled automatically by the gRPC runtime.

The fourth technique is client‑side streaming. Here, the client sends a stream of requests, and the server returns a single response. This is useful for batch operations like uploading many records. The server receives an async iterator – you loop over it with async for. In my example CreateUsers, I collect the names that succeeded and those that failed:

async def CreateUsers(self, request_iterator, context):
    success_count = 0
    failed_ids = []
    async for create_req in request_iterator:
        try:
            # insert into database
            await asyncio.sleep(0.001)
            success_count += 1
        except:
            failed_ids.append(create_req.name)
    return users_pb2.CreateSummary(
        success_count=success_count,
        failed_ids=failed_ids
    )

The client calls this by repeatedly sending CreateRequest messages and then waiting for the summary. This pattern reduces the number of round trips compared to sending each request separately.

The fifth technique is bidirectional streaming. Both sides send and receive messages in any order. I used this for a chat system. The server reads incoming messages and yields responses. The crucial point is that each side’s stream is independent. The server can process a message and immediately send a reply, even while the client is still sending. Here is a simple echo server:

async def Chat(self, request_iterator, context):
    async for chat_msg in request_iterator:
        response = users_pb2.ChatMessage(
            user_id=chat_msg.user_id,
            message=f"Echo: {chat_msg.message}",
            timestamp=1234567890
        )
        yield response

On the client side, you can open the stream, send a few messages, and then read the responses. The async generator ensures that the loop doesn’t block while waiting for the next input.

The sixth technique is proper error handling. gRPC provides a set of status codes that are more expressive than HTTP statuses. You can set the code and a detailed string on the context object. If you need to attach extra information, you can use grpc.Status and context.send_initial_metadata. I always check for missing or invalid arguments and return INVALID_ARGUMENT. For missing resources, NOT_FOUND is appropriate. Here is an example:

from grpc import StatusCode

async def GetUser(self, request, context):
    if not request.user_id:
        context.set_code(StatusCode.INVALID_ARGUMENT)
        context.set_details("user_id must be provided")
        return users_pb2.UserResponse()
    if request.user_id == "missing":
        context.set_code(StatusCode.NOT_FOUND)
        context.set_details("User not found")
        return users_pb2.UserResponse()
    # normal flow
    return await self._fetch_user(request.user_id)

On the client side, you catch grpc.RpcError and inspect its code and details. This gives you a structured way to handle failures without guessing HTTP codes.

async def get_user():
    async with grpc.aio.insecure_channel('localhost:50051') as channel:
        stub = users_pb2_grpc.UserServiceStub(channel)
        try:
            response = await stub.GetUser(users_pb2.UserRequest(user_id="missing"))
        except RpcError as e:
            print(f"gRPC error: {e.code()} - {e.details()}")

The seventh technique is interceptors. You can wrap every RPC call with logic that logs, authenticates, or adds metrics. gRPC’s async server interceptor has a single method intercept_service. You create a class that inherits from grpc.aio.ServerInterceptor and override that method. Inside, you can inspect the handler_call_details, which contain the method name and metadata. Then you call continuation to proceed with the actual handler. Here is a logging interceptor:

class LoggingInterceptor(grpc.aio.ServerInterceptor):
    async def intercept_service(self, continuation, handler_call_details):
        method = handler_call_details.method
        print(f"Request: {method}")
        response = await continuation(handler_call_details)
        print(f"Response for: {method}")
        return response

You attach it to the server like this:

server = grpc.aio.server(interceptors=[LoggingInterceptor()])

The eighth technique is health checking. In a microservice world, you need a way to know if a service is alive and ready. gRPC defines a standard health checking protocol. You add the health service to your server and set the status for each service you provide. I use the grpc_health package. Here is how to set it up:

from grpc_health.v1 import health, health_pb2, health_pb2_grpc

health_servicer = health.aio.HealthServicer()
health_pb2_grpc.add_HealthServicer_to_server(health_servicer, server)

# Mark your service as serving
await health_servicer.set("userservice.UserService", health_pb2.HealthCheckResponse.SERVING)

Clients can then call the Check method to see if the service is healthy. This integrates with container orchestration tools that run periodic health checks.

async def check_health():
    async with grpc.aio.insecure_channel('localhost:50051') as channel:
        health_stub = health_pb2_grpc.HealthStub(channel)
        response = await health_stub.Check(health_pb2.HealthCheckRequest(service="userservice.UserService"))
        return response.status == health_pb2.HealthCheckResponse.SERVING

The ninth technique is load balancing. When you have multiple instances of the same service, you want to distribute the load. gRPC supports client‑side load balancing. You give the channel a DNS name that resolves to multiple IP addresses. Then you set the load balancing policy. The round_robin policy sends each request to a different server. Here is an example:

channel = grpc.aio.insecure_channel(
    'dns:///server-cluster.example.com:50051',
    options=[
        ('grpc.lb_policy_name', 'round_robin'),
        ('grpc.dns_min_time_between_resolutions_ms', 10000)
    ]
)
stub = users_pb2_grpc.UserServiceStub(channel)

You can also use pick_first for simple failover. The DNS resolver re‑resolves periodically, so you don’t need a separate service discovery layer.

The tenth technique is performance optimization and benchmarking. gRPC can be fast, but only if you configure it correctly. On the server side, I set grpc.so_reuseport to allow multiple worker processes to share the same port. I increase max_concurrent_streams to handle many simultaneous connections. I also tune keepalive parameters to detect dead connections quickly.

server = grpc.aio.server(
    futures.ThreadPoolExecutor(max_workers=100),
    options=[
        ('grpc.so_reuseport', 1),
        ('grpc.max_concurrent_streams', 1000),
        ('grpc.keepalive_time_ms', 30000),
        ('grpc.keepalive_timeout_ms', 10000),
        ('grpc.http2.min_time_between_pings_ms', 10000),
    ]
)

On the client side, I enable compression with grpc.default_compression_algorithm. The value 2 is for gzip. I also enable retries with grpc.enable_retries and set max_reconnect_backoff_ms to avoid hammering a down server.

channel = grpc.aio.insecure_channel(
    'localhost:50051',
    options=[
        ('grpc.default_compression_algorithm', 2),
        ('grpc.enable_retries', 1),
        ('grpc.max_reconnect_backoff_ms', 5000),
    ]
)

Testing the performance is important. I write simple benchmarks that measure latency and throughput. I use the timeit module or a profiling tool. The goal is to find bottlenecks – often the database, not the gRPC layer. But by tuning these knobs, I have seen my services handle thousands of requests per second without breaking a sweat.

These ten techniques form the backbone of how I build microservices. I start with a clear proto contract, implement async servers and clients, choose the right streaming pattern for each use case, handle errors explicitly, add interceptors for observability, make sure the service is healthy, spread the load, and then squeeze every bit of performance out of the system. The code examples I gave are ready to be copied and adapted. You can run them with a simple virtual environment and the grpcio and grpcio-tools packages. The beauty of gRPC with Python is that you get all the power of a modern RPC framework without leaving the comfort of your favorite language.

I remember the first time I switched from REST to gRPC. The response times dropped by half. The code became cleaner because I no longer had to validate JSON fields manually. And the streaming patterns opened up possibilities that were hard with HTTP – real‑time notifications, progressive loading, and efficient bulk operations. It takes a little more setup at the beginning, but the payoff in reliability and speed is worth every line of the proto file.

If you are new to this, start small. Define one service with one unary method. Get it working on your local machine. Then add streaming, then error handling, then interceptors. You will see how each piece fits together. The documentation for gRPC Python is good, but the examples here are the ones I use daily. I hope they help you build services that are simple to maintain and fast to run.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!