Confluent Cloud

Confluent is the name of a company that provides commercial support for Kafka. When enterprises use open source software, they often look for product support on the basis of payment. For example if the software has bugs or security issues, enterprises need tech support in a time bound manner. Confluent is one of the companies that provide such support.

Kafka is a stream processing framework. There are many of them out there, both proprietary and open-source.

Kafka is popular because it’s open-source, highly performant and flexible. I’m not going to go into lengthy comparisons with other frameworks. Instead, I’ll try to explain why you should use stream processing in the first place.

Why stream processing?

If you’re building a system that handles large volumes of data, which is increasingly common these days, you need to take advantage of distributed computing to horizontally scale your processing across multiple servers. Vertical scaling can only take you so far.

In order to achieve this, you should aim for small, stateless services. Each service takes an input and produces an output without depending on storage. This way, you can run the same process on many servers, processing events in parallel. You can think of each service as a simple input/output system, or a function.

Such a system would have decentralized orchestration. Instead of having a centralized agent orchestrate the work to be done, the services communicate with messages. The output of one service becomes a message, which can trigger other services that consume it.

There are many advantages with this approach. One is that you don’t rely as much on storage, which is notoriously tricky to scale horizontally. Another is that you can scale dynamically based on messaging load. By using cloud services and on-demand computing, you can also greatly reduce cost, since you only pay for CPU time.

To ingest data into this system and send messages between the services, you need a stream processing framework. In principle, it works like this:

The event stream, which is a distributed publish/subscribe message queue, exchanges messages between all the different parts of the system. Ideally, each sub-component is a well-defined function that takes an input and produces a consistent output, without depending on state. It’s like functional programming, except on a higher abstraction level.

The messages from the stream can be queued up and re-processed at any time, which means the services, or functions, are easily testable.

Stream processing works beautifully for highly event-based scenarios with high-velocity data, but it also works well for a number of other scenarios.

Who are biggest adopters of Confluent Kafka and Apache Kafka?

Apache Kafka is adopted across all industries in the meantime. “Big” can mean different things for this question:

Biggest workloads: Early adopters from the silicon valley like LinkedIn, Uber, Netflix, etc., report about their massive (and still growing) volumes regularly.
IoT solutions naturally generate massive volumes of data, too. Tesla is a famous example processing trillions of messages from their IoT data (cars, energy, etc.) with Kafka.
Global banks have have some of the biggest deployments in terms of widespread locations across many regions and continents. These big global deployments usually contain hundreds of Kafka clusters provided via a self-service API.
Another perspective on big deployments is event streaming as platform strategy. Some enterprises use Kafka a lot while also have many other solutions like RabbitMQ, Pulsar, MSK, and so on. A big Kafka deployment is when an enterprise strategically decides to focus on Kafka (and a single vendor behind it) for all their platforms (where it makes sense to use Kafka!).

DEV Community

Confluent Cloud

Top comments (0)