Elasticsearch Overview（2）- Cluster & Terminology

Richard Zhang — Fri, 15 Nov 2024 08:10:44 +0000

What is Elasticsearch Cluster?

1. Introduction

We discussed Elasticsearch earlier. What is an Elasticsearch cluster? A cluster is a pool of nodes that provide Elasticsearch functionality. In an Elasticsearch cluster, you will have different nodes, which may be different computers, docker containers, or different physical machines. The nodes may be located in the same or different geographical locations. All these nodes work together to provide you with Elasticsearch functionality.

2. Cluster models

When it comes to clusters, there are many models. For example, there are popular models such as all-in-one clusters and role-based multi-node clusters. On the left, you can see a multi-node cluster. It has three master nodes and four green data nodes, and then a coordinator node and two gray nodes.

3. Node roles

Master nodes have a special purpose. They manage the cluster by receiving and sending information about the cluster to stabilize the cluster instance. They do not do any data processing. Data nodes are where logs or data are stored. They are the actual storage nodes. The coordinator node acts as a client. It accepts requests and processes them by getting results from the data nodes. Ingest nodes help get data into the Elasticsearch data nodes. Coordinator and collection annotations are part of Data Annotation 2 and are optional but useful.

4. Best Practices

A three-node all-in-one cluster means that the three nodes act as master, data nodes, and coordinator at the same time. However, for easy scalability and better Elasticsearch clusters, it is better to use multi-node and role-based clusters.

Multi-Node Cluster

3 Node Cluster

Terminology & How it works

1. Raise the question

We have been discussing the Elasticsearch cluster. Now, let's think about what is inside the cluster. Specifically, how is the data stored and how does the data flow occur in Elasticsearch?

2. The question about the internal structure of the cluster

The cluster we have discussed is full of nodes. There are data nodes inside the cluster, where the data is stored. The data exists in the form of indexes, which are logical aliases for data. Indexes can be split into different shards. Shards are where the data is actually stored. Documents are the real entities of data in Elasticsearch.

3. The relationship between document storage and indexing

All documents are in JSON format and have key-value pairs. They are stored in the form of shards. There is another technology called segments, where the data is actually stored. A group of segments constitutes a shard. When we combine all the shards together, we get an index. The index is a logical entity that you can search for data.

4. Data Inflow

In the picture below, you can see two colors, blue represents how data flows into Elasticsearch. Data sources can be various entities. Elasticsearch supports different data types, including unstructured data. Data sources may be logs, caches, or directly from services or infrastructure, such as Windows, Linux, and service servers such as ENGINETICS and Apache. These data sources send data to Elasticsearch. After the initial evaluation, data processing is done.

5. Data Characteristics and Processing

So, what does your data look like? What key-value pairs does it have? What is its data type? All this metadata information will be processed and extracted into the Elasticsearch index. The index is the logical entity we search. On the other side, green represents the Elasticsearch searcher. For example, Kibana is a tool provided by the Elastic Stack. You can connect through the API or integrate with enterprise search in a web application. Elasticsearch provides multiple clients for different languages. These clients connect to the Elasticsearch cluster. The cluster connects to the index, and then you get the results you expect. This is how the data flow works.

Kafka Overview（1）- Topic

Richard Zhang — Wed, 06 Nov 2024 03:05:41 +0000

Agenda

Kafka Architecture
- Basic concepts(topic/partition/consumer group/commit log/offset)
Delver into the process of sending a message to the broker
Technical Highlights

Architecture - Overview

Architecture - Detail

Architecture - Basic Concepts

Topics

A Kafka topic defines a channel through which data is streamed.Producers publish messages to topics,and consumers read messages from the topic they subscribe to.

Topics organize and structure messages,with particular types of messages published to particular topics.Topics are identified by unique names within a Kafka cluster, and there is no limit on the number of topics that can be created.

DEV Community: Richard Zhang