Cem AKAN

Posted on May 29

Stop Overloading Your REST APIs: A Practical Guide to gRPC

#api #architecture #backend #microservices

Why traditional REST isn’t always enough and how gRPC is changing the way our microservices talk to each other.

If you are building microservices, mobile backends, or real-time data streams, you have probably hit a wall with traditional API communication. The web has grown, our systems have scaled, and the old ways of doing things are starting to show their limits.

Recently, I had the privilege of presenting a webinar to the Cloud Native Addis Ababa community about one of the most powerful technologies in modern backend development: gRPC. I wanted to adapt that session into a comprehensive article series because understanding gRPC is rapidly becoming an essential skill for any backend engineer.

Before we start looking at lines of code or complex architecture diagrams, we need to set the stage. At its core, modern backend development is entirely about communication. We are not just talking about how we collaborate as humans, but how we enable efficient, strictly typed, and incredibly fast communication between our isolated software services. Finding the right way to manage that constant server-to-server chatter is exactly what solves our biggest performance bottlenecks today.

In this guide, we are going to embark on a deep dive into the world of gRPC. My goal is that by the end of this read, you will not just know what the acronym stands for, but you will know exactly how to architect and build high-performance APIs with it.

Here is our roadmap:

Just to give you a bit of context on who is taking you on this journey: I am Cem, a Computer Engineering senior at Çukurova University. My daily focus is heavily rooted in backend development and system architecture. I spend a lot of my time working on internal automation tools, primarily utilizing Go and Node.js.

Beyond writing code and exploring cloud native tools, I am incredibly passionate about community building. I serve as the founder and leader of the Yazılım Çukurova community because I firmly believe that the absolute best way to learn is to teach and build things together.

Now that the introductions are out of the way, let us jump right into the basics.

My goal is that by the time you finish reading this, you will not just have heard of gRPC. You will understand why it is the preferred framework for modern cloud-native applications. You will know how to define strict interfaces, how to choose between gRPC and REST, and you will understand the power of bidirectional streaming. We are aiming for practical, production ready knowledge.

Let us start with the absolute basics. What exactly is RPC?

It stands for Remote Procedure Call. The concept is beautifully simple: I want to write code on my machine that calls a function on your machine, but I want it to feel exactly like I am calling a local function in my own memory space. I want to tell my code to “calculate the result” and have the magic happen somewhere else on the network without me having to manually open sockets, parse TCP packets, or serialize data. The ultimate goal is transparency.

The diagram from the presentation represents the heart of RPC. When your client code calls a remote function, it is actually talking to a local piece of code called a Client Stub. Think of this stub as a packer. It takes your parameters, say the numbers 5 and 10, and marshals them into a network message. The RPC Runtime then sends this message across the wire.

On the receiving end, the server has a similar stub that unpacks the message, finds the actual function, runs it, and then reverses the whole process for the return value. To your code, it looked like a single, simple function call. To the system, it was a complex round-trip journey.

One of the biggest hidden heavy lifters in this process is Marshalling. Your computer stores objects in a very specific, complex way in its memory. Network wires, however, only understand flat streams of bytes. Marshalling is the translation process of flattening that complex in-memory object into a byte stream that can travel. If your client is written in Python and your server is in Java, this translation gets incredibly complex. This exact complexity is where many older RPC systems stumbled.

The Ghosts of Architectures Past

Traditionally, RPC was designed to mimic a standard function call. Standard function calls are synchronous. You call a function, and your thread freezes until the answer comes back. This is perfectly fine for a simple desktop app. However, in a massive microservices architecture, if Service A waits for Service B, and Service B is waiting for Service C, we suddenly have a cascading latency nightmare.

We did not arrive at our current solutions overnight. In the 90s, the industry tried CORBA. It attempted to solve everything but was famously painful to configure. Then came SOAP, which provided great structure but drowned developers in heavy XML tags. For the last decade or so, REST has been the undisputed king. It leveraged the native language of the web using HTTP and JSON. It was incredibly easy to use. But as our systems grew to massive scales, easy was no longer enough. We needed fast and efficient.

Remember earlier when I said RPC tries to make remote calls look local? That is actually its biggest trap. It is known as the Local Call Fallacy. When you call a local function, it almost never fails. When you call a function over a network, a router might crash, the server might be rebooting, or the Wi-Fi might drop. Traditional RPC frameworks often tried to hide these network errors to maintain that illusion of a local call, which inevitably led to catastrophic application crashes. A truly modern framework needs to acknowledge that the network is a chaotic, unpredictable place.

If REST works, why change?

At this point, you might be asking yourself: If REST is so incredibly popular, why change anything?

Let me be clear, REST is amazing for public APIs where you have no idea who the client is or what language they are using. But for internal microservices? It is horribly inefficient.

Sending JSON means sending repetitive, heavy text over the wire over and over again. Your expensive cloud servers burn valuable CPU cycles just turning plain text into numbers. Furthermore, REST is loosely typed. If I change a field name on the server, the client will not know until it crashes at runtime in production. We desperately need something stricter and lighter.

Another massive challenge is maintenance in a polyglot company. Today, your backend might be in Go, your data science team uses Python, and your frontend is in React. If you update a REST API, you often have to manually rewrite and update the client libraries for all those different languages. This creates an integration hell that severely slows down development cycles.

So, as an industry, we created a wishlist for a modern solution. We realized we needed the strict structural contracts of the old SOAP days, the ease of use of REST, and the raw performance of binary protocols. We needed it to work seamlessly whether we were coding in C++, Go, or Ruby. Critically, we needed it to handle streaming so we could send continuous data without constantly opening and closing network connections.

This was the massive gap that existed in the market around 2015.

This brings us to the star of the show. You see, Google was experiencing these exact scaling and performance problems over a decade ago. To connect their massive data centers, they built an internal tool called Stubby. It worked brilliantly, and in 2015, they rewrote it and open-sourced it to the world as gRPC.

What does the ‘g’ stand for? It originally stood for Google, but today the ‘g’ stands for something different in every single release version. What truly matters is what it represents: the next generation of RPC, designed specifically from the ground up for the incredible scale and speed required by the cloud-native world.

Let us peel back the layers and look at the architecture. gRPC is not just a simple library; it is an entire stack. At the very bottom, doing all the heavy lifting, is HTTP/2. Sitting directly on top of that is the ‘Core’, which is often written in C or C++ to provide raw, uncompromised speed. Finally, at the top, we have the language bindings. This is the part you interact with. Whether you are writing your backend in Python, Go, or Java, you are ultimately driving the exact same high-performance C-core engine underneath. This architecture guarantees consistency and blazing speed across every platform.

The Engine: Why HTTP/2 Changes Everything

If gRPC is a high-performance sports car, HTTP/2 is its engine.

Most traditional REST APIs still run on HTTP/1.1. In the HTTP/1.1 world, if you want to send ten concurrent requests, you often need to open multiple TCP connections or wait in line for each to finish. It is essentially a single-lane road.

HTTP/2 introduces a concept called Multiplexing. It takes your data, breaks it into tiny binary frames, and sends them all at once over a single, long-lived connection. It allows gRPC to handle thousands of concurrent calls without the heavy overhead of constantly opening and closing connections.

This is an absolute killer feature for microservices. Imagine your application needs to fetch a user profile, their recent orders, and their recommendations. In gRPC, these are three separate API calls, but they travel over the exact same network wire at the exact same time. If the ‘orders’ database query is running slow, the profile data is not blocked. It just zips right past it on the multiplexed highway. This efficiency is exactly why gRPC dominates in environments where services chatter constantly.

Now, let us talk about the data payload itself. By default, gRPC uses Protocol Buffers, commonly known as Protobuf. You can think of Protobuf as two distinct things: First, it is a language used to strictly define your API contracts. Second, it is a mechanism to serialize your data into bytes.

People always ask, “Why not just use JSON?” JSON is fantastic for human readability, but it is terrible for machine efficiency. Every time you send a payload like {"name": "Cem"}, you are sending the actual characters 'n', 'a', 'm', 'e' over the network.

In Protobuf, the field ‘name’ is assigned a numeric tag, like ‘1’. This makes the payload significantly smaller on the wire. More importantly, computers can parse binary data drastically faster than they can parse text. When your company is processing billions of internal requests, this efficiency translates directly to saved money and lower CPU usage.

This is what a .proto file actually looks like. It is highly readable but incredibly strict. We define the syntax version, a package name to prevent naming conflicts, and our Messages, which are simply data structures. Notice the = 1 and = 2? Those are the unique numeric tags that replace your field names on the network wire. Finally, we define the Service. We state exactly what methods exist, what inputs they take, and what they return. This file becomes your ironclad contract.

Here is the absolute best part of this entire system: You write that .proto file exactly once. Then, you hand it over to a tool called the protoc compiler. If your backend team uses Go, it generates Go code. If the mobile team uses Swift, it generates native Swift code. It creates all the data classes, the complex networking logic, and the stubs for you automatically. You never write manual HTTP requests or JSON parsers again. You just call methods on these auto-generated objects, which massively speeds up your development cycle.

Because gRPC enforces a Contract First approach, you get compile-time safety. If a developer tries to pass a string into a field that the contract defines as an integer, the code simply will not compile. You catch your errors before you deploy, not at 3 AM when production goes down.

Traditional APIs limit you to a simple request and response. You ask for something, and you wait for the answer. gRPC shatters this limitation by supporting four distinct modes of communication.

The first is Unary, which is your classic request and response. But then it gets incredibly interesting with Streaming.

With Server Streaming, the client sends a single request, such as search for logs, and the server pushes back a continuous stream of results over time.

With Client Streaming, the client uploads a massive amount of continuous data, and the server replies just once at the end. This is absolutely perfect for large file uploads or sensor telemetry.

Finally, we have the holy grail: Bi-directional Streaming. Both sides can send messages whenever they want, entirely independently. The server can send a message while it is still reading requests from the client. Imagine building a live multiplayer video game server or a real-time chat application. You cannot easily do this with standard REST. This unmatched flexibility is what makes gRPC an absolute powerhouse.

Now, let us address the elephant in the room. If REST works so well, why should anyone switch?

It is crucial to understand that this is not about one technology being universally better than the other. It is about choosing the right tool for the job. REST has been the undisputed champion of the web for over a decade because it is flexible and runs absolutely everywhere. gRPC is the modern challenger, built specifically for speed, scale, and strict structure.

Let us compare them head to head across a few key categories to see exactly where each one shines.

The first major difference is entirely philosophical. REST is resource oriented. You are forced to think in terms of Nouns like Books, Users, or Orders, and you manipulate them with standard HTTP verbs like GET or POST. This is very elegant for simple data management. gRPC, on the other hand, is action oriented. You think in Verbs. You define exact methods like GetBook or TriggerWorkflow. If your backend performs complex operations that do not fit neatly into a simple PUT or DELETE, gRPC will feel much more natural to your engineering team.

We touched on the network differences earlier, but it bears repeating. REST typically rides on HTTP/1.1 and sends heavy text payloads. Even if you compress it, it is still text. gRPC is binary all the way down. It uses HTTP/2 advanced features like header compression and binary framing by default. You do not have to spend days configuring these optimizations; they come straight out of the box.

However, here is where REST wins hands down: Browser Support. Every single web browser in the world speaks REST and JSON natively. You can open your developer console right now and type a fetch request. gRPC is much stricter. Because modern browsers do not give developers fine grained control over HTTP/2 frames, you cannot call a gRPC service directly from Chrome or Safari today. You need a translation layer, a proxy, and a special library called gRPC-Web. If you are building a public website, this definitely adds a layer of complexity.

Similarly, debugging REST is easy. You can literally read the network traffic with your own eyes. Debugging gRPC requires a slight mindset shift. Since the wire data is binary, it looks like absolute garbage to human eyes. You need specialized tools that have access to your .proto files to decode the messages back into something readable.

But when we talk about pure performance, the numbers are striking. For internal microservices communication, benchmarks consistently show gRPC being 7 to 10 times faster than REST with JSON. This is not just network speed; it is CPU speed. Your servers spend significantly less time parsing text and more time processing actual business logic. If you are paying for cloud compute by the millisecond, gRPC effectively lowers your infrastructure bill.

So, what is the final verdict?

Stick with REST for your Public APIs. If you are building the next Stripe or Twitter API for the entire world to consume, stick to REST and JSON because it is universal. Use it for browser heavy applications or quick prototypes where you do not want to manage strict contracts.

Choose gRPC for your Internal Microservices. It is the de facto standard for a reason. If you have Service A talking to Service B inside your Kubernetes cluster, use gRPC. It is perfect for polyglot environments and fantastic for mobile apps where saving battery and bandwidth is crucial.

And remember, you do not always have to choose just one. The industry standard is the Hybrid Approach. You expose a friendly REST API to the outside world for browsers and public clients. But as soon as that request hits your API Gateway, it is instantly translated into lightning fast gRPC for all internal server communication.

Enough theory. Let us write some actual code. In this section, we are going to build a real gRPC service from scratch.

We will build a Route Guide system, which is a classic example where clients can send their coordinates and get information about points of interest.

It all starts with the contract. You open a simple text file and name it route_guide.proto. Here, we define our RouteGuide service. We will define a simple Unary method called GetFeature. It takes a Point containing latitude and longitude, and it returns a Feature containing a name and a location.

Notice how incredibly clean this is? We are not worrying about JSON structures or URL paths. We are just defining clear data and behavior.

Next, we run the magic command: protoc. This compiler tool reads our proto file and generates all the heavy lifting code for us automatically. If we are using Python, it creates two files. One contains our data structures, and the other contains the complex networking logic.

There is a golden rule in gRPC development: You never touch these generated files. If you need to change your API, you change the .proto file and simply regenerate them.

Now we switch over to our own Python file to write the server. The generator provided a base class called RouteGuideServicer. All we have to do is inherit from it and write the business logic for GetFeature.

Look how beautifully simple the implementation is. The framework hands us the request object ready to use. We do our business logic, like looking up a database, and then we just return a Feature object. We do not have to serialize anything to JSON. We do not have to set HTTP status codes manually. We just return the Python object, and gRPC handles the rest.

To actually start the server, we create a gRPC server object, define a thread pool to handle concurrent requests, register our class, bind it to a port like 50051, and hit start. That is it. Our high performance binary server is officially running.

Building the client is just as easy. We need two things: a Channel and a Stub. The Channel is the active connection to the server. It handles the TCP connection and security. The Stub is the object we actually interact with.

We open a channel to localhost on port 50051. Then, we just call feature = stub.GetFeature(my_point). This is the true magic of RPC. It looks exactly like a local Python function call. But behind the scenes, gRPC serialized that point, fired it over HTTP/2, waited for the server, and unpacked the resulting Feature. All of that network complexity is completely hidden from you.

What happens if the server is down? In REST, you check for an HTTP 404 or 500 status code. In gRPC, we catch exceptions. If the call fails, the client raises an error, and you can inspect the canonical status codes to see if it was a timeout, a permission issue, or a missing record.

The development loop summary:

We have built a simple service together. That is a great start, but the real world is rarely simple. Sometimes data does not come in a single chunk; it flows continuously like water. Sometimes we need to secure that data against prying eyes. And eventually, we need to scale our application to handle millions of concurrent users.

In this section, we are going to unlock the advanced capabilities that make gRPC a true superhero in the microservices world.

Let us start with Server-Side Streaming. Imagine you want to show a live feed of stock prices or a real time server log. In the REST world, you would have to constantly poll the server every single second, asking if there is new data. It is incredibly wasteful. With gRPC, the client sends a single request saying “I want to subscribe.” The server then keeps the channel open and pushes data continuously whenever an event occurs. You simply add the stream keyword to the return type in your proto file.

Implementing this is deceptively simple. On the server side, instead of returning a static list, we use the Python yield keyword. Every time we yield, a message is flushed to the network immediately. On the client side, it looks exactly like iterating over a list in memory. The loop just blocks and waits for the next item to arrive. It feels entirely intuitive.

Now let us flip that scenario around. What if the client is generating the massive flood of data? Maybe you are uploading a 1GB video file or you have a temperature sensor sending readings every millisecond. You absolutely do not want to open a new HTTP connection for every single reading. With Client-Side streaming, the client writes to the stream continuously. The server listens, aggregates the data, and when the client finally says it is done, the server sends back a single summary response.

Then we have the crown jewel: Bi-directional streaming. In this mode, both the client and the server can send messages whenever they want. They do not have to take turns. The server can send a message while it is still actively reading requests from the client. This is essential for things like a live chat application. I can type a message to you, but I also need to receive messages from others at the exact same time. It enables truly reactive architectures.

Let us switch gears to Security. By default, when we develop locally on our machines, we use insecure channels. But in a production environment, you cannot send unencrypted data. gRPC makes enabling encryption trivial. It uses standard SSL and TLS. You literally change one line of code: swap insecure_channel for secure_channel and provide your SSL certificates. Now your entire connection is encrypted.

Encryption protects the wire itself, but how do we know who is actually calling our API? In REST, we use HTTP Headers for authentication tokens. In gRPC, we call this concept Metadata. Metadata is just key and value pairs that travel alongside your request. You can attach things like Authorization Tokens, Trace IDs, or User IDs. The server can then inspect this metadata to decide whether to allow the request or return a Permission Denied error.

You definitely do not want to write “Check Authentication” code inside every single function you write. That gets messy fast. In web frameworks like Express or Django, we use Middleware. In gRPC, we use Interceptors. An interceptor sits right in front of your service. You can use it to log traffic, validate tokens, or measure performance metrics globally.

Finally, we need to talk about scaling. This is historically the trickiest part of gRPC. In the old HTTP/1.1 world, you make a request, get a response, and close the connection. A standard Load Balancer simply distributes these new connections. But remember that gRPC uses HTTP/2, which keeps a single connection open forever. If you start a client, it connects to Server A and stays there. Even if you spin up Server B and C to help with the load, the client completely ignores them. This is known as the Sticky Connection problem.

How do we fix this? We have two main strategies. First is Client-Side Load Balancing, where the client becomes smart, queries a registry, and rotates its own requests. The second and much more popular approach in Kubernetes environments is using a Layer 7 Proxy like Envoy or Nginx. The proxy sits in the middle, terminates the long lived connection, and distributes the individual internal requests fairly across your backend servers.

We know how it works. Now let us talk about where it works and how to avoid shooting yourself in the foot.

gRPC is the backbone of companies like Netflix, Uber, and Cisco. It is the ultimate Microservices Backbone.

It is also perfect for IoT and Mobile devices because the tiny binary payloads save crucial bandwidth and battery life.

It is the ultimate Polyglot Bridge, allowing your Python Data Science team to talk flawlessly to your Java backend team.

If you are going to use it, you need to follow the best practices. I call these the Golden Rules.

Rule number one: Always set a deadline. Default gRPC calls can wait forever. If a server hangs, your client hangs. You must set a timeout on every call to force the system to fail fast and recover.

Rule number two: Respect the schema. The true power of Protobuf is backward compatibility. You can add new fields, but you must never change the numeric tag of an existing field. If ‘Name’ is field number 1, it stays field number 1 forever.

Rule number three: Centralize your Protos. Do not copy and paste your proto files into every single project. Keep them in a Central Repository, use a CI/CD pipeline to generate the code, and publish them as versioned libraries.

Rule number four: Use Rich Error Handling. Standard error codes are rarely enough. Attach detailed metadata to your errors so the client developer knows exactly which field failed validation.

Rule number five: Mind the size. gRPC is heavily optimized for small and incredibly fast messages. By default, it will not let you send a message larger than 4 megabytes. If you hit this limit, do not just increase the configuration. Use streaming to break your large object into chunks.

Working with binary protocols can be intimidating at first because you cannot just read the network traffic like you do with JSON. Fortunately, the ecosystem has exploded with incredible tools.

If you love the command line, you need grpcurl. It allows you to talk to a server right from your terminal, automatically translating your JSON input into binary Protobuf behind the scenes.

If you prefer a visual interface, Postman now supports gRPC natively. For heavy duty streaming debugging, tools like Kreya or BloomRPC are absolutely fantastic.

The technology is not standing still either. The next frontier is HTTP/3, moving to UDP with the QUIC protocol to handle packet loss on unreliable mobile networks even better. By investing time in gRPC today, you are learning a technology that is heavily future proofed.

We have covered a massive amount of ground today.

If you want to dive deeper, I highly recommend bookmarking the official docs at grpc.io. The Awesome gRPC repository on GitHub is a goldmine, and O’Reilly’s ‘gRPC: Up and Running’ is the standard textbook for mastering this stack.

Also, if you want to revisit any of the topics we discussed today or see the live Q&A session, you can catch the full replay of the stream below:

Thank you so much for joining me on this deep dive. Writing this out after my session with the Cloud Native Addis Ababa community has reminded me just how powerful our modern tooling has become. I hope you feel empowered to go open your code editor and build something significantly faster and stronger using gRPC.

If you build something cool, or if you just want to talk about system architecture, please connect with me on LinkedIn or Twitter. I absolutely love seeing what the community builds.

Happy coding ❤

DEV Community

Stop Overloading Your REST APIs: A Practical Guide to gRPC

The development loop summary:

Top comments (0)