DEV Community: Evaldas Buinauskas

Vinted Search Scaling Chapter 6: 4th generation of Elasticsearch metrics

Evaldas Buinauskas — Wed, 22 Dec 2021 06:39:59 +0000

An old fact from 2014 December 12th was significant for Vinted: the company switched from Sphinx search engine to Elasticsearch 1.4.1. At the time of writing this post, we use Elasticsearch 7.15. Without a doubt, a lot has happened in between. This chapter will focus on Elasticsearch metrics, we will share our accumulated experiences from four generations of collecting metrics.

Managing search requires both product and engineering efforts – two complementary parts. At Vinted, search engineers work with product, maintain a sound infrastructure, set up a scalable indexing pipeline and upscale search throughput performance; this is unattainable without proper metrics.

The first generation of metrics were collected by parsing /_cat API's using Ruby scripts. Nothing sophisticated, we ran Elasticsearch version 1.x - 2.x and monitoring was done on demand with Elasticsearch plugins, such as ElasticHQ.

The second generation ran on the Sensu observability pipeline. Graphite was used as a persistence layer for storing collected Elasticsearch metrics, which we used to collect metrics of indices, shards and segments. We ran Elasticsearch version 2.x - 5.x at that time, monitoring segments was important as fine-tuning of the segmentation policy improved reading performance. The drawback was in the storage layer. Graphite would apply sampling on metrics spanning over a longer period. Sampling was the main pain point in moving to another storage engine.

The third generation of metrics ran on the pull-based metrics collection system, Prometheus. Prometheus was not yet established as a de facto monitoring system. At that time, we ran Elasticsearch version 5.x, just as open-source Elasticsearch exporters emerged. Numerous open-source Elasticsearch exporters were tried out. We used the video streaming company’s one. During that time, as our infrastructure grew, the exporter failed to deliver by occasionally running out of memory or timing out on /metrics endpoint requests. The exporter lacked fine-grained configuration functionality, such as limiting unnecessary metrics and configuring polling time per subsystem. Metrics were static, and the naming of metrics was inconsistent. The authors did not accept code change requests from the OSS community, no active delivery was done, and new metrics from recent Elasticsearch versions weren’t introduced. The in house fork was branched out from the upstream Elasticsearch exporter repository, it became apparent that the forked exporter was beginning to look like a complete rewrite. The efforts of rewriting were so significant, we decided to write a completely new exporter instead.

The fourth generation of Elasticsearch exporter solves multiple problems. It can:

Work with large clusters (400+ nodes)
Handle large amounts of metrics without crashing (exporter was able to export 940,272 metrics)
Inject custom metric labels
Handle ephemeral states of nodes, indices and shards
Automatically generate new metrics and remove stale ones based on user-configurable lifetime
Keep track of cluster state metadata node changes, cluster version and shards relocations

The new Elasticsearch exporter is written in the Rust programming language and is open-sourced on GitHub: github.com/vinted/elasticsearch-exporter-rs. The exporter uses asynchronous Tokio runtime, Rust Prometheus instrumentation library and the official Elasticsearch client library. Metrics collection is decoupled from the serving /metrics endpoint. In addition, Elasticsearch time-based metrics in milliseconds are converted into seconds to comply with Prometheus best practices (metrics ending in “millis” are replaced by “seconds”, “_bytes” and “_seconds” and postfixes are added where appropriate).

Namespace of subsystem (/_cat{aliases,shards,nodes,..}, /_nodes/{stats,info,usage}), / is preserved to the last leaf e.g.: elasticsearch /_nodes/info jvm mem heap_max. The JSON tree metric is converted to Prometheus format elasticsearch_nodes_info_jvm_mem_heap_max_in_bytes, this makes metric names intuitive to use.

Custom labels are injected into associated subsystems. For example, say the cluster version changes because of injection, one can distinguish metrics by the namespaced label vin_cluster_version. This allows the comparison of Elasticsearch performance between clusters and nodes during a rolling version upgrade. Cluster metadata is updated every 5 minutes by default, it is configurable by the user. Metadata updates are useful when the shard is relocated to another node or the node IP changes.

We reindex content completely and frequently (at most up to 2-3 times per day). Reindexing leaves a trace of state changes that exporters collect. Elasticsearch ephemeral state of nodes added/removed/drained, indices created, removed, and shards relocation, re-sharding, alias changes leave a lot of traces. The exporter handles this by keeping track of custom lifetimes per subsystem. Stale metrics are deleted when the last heartbeat of a metric with a unique label passes a predefined lifetime by the user. Metrics lifetimes help eliminate outdated metrics, save storage engine space and keep dashboards up-to-date.

Extensive configuration is possible via CLI flags. For instance, one can include or skip labels, skip metrics, define polling intervals between subsystems, enable/disable different subsystems and control metadata refresh intervals.

The best part of the exporter is that it does not define static metrics. Instead, metrics are dynamic, based on atomic structures, so subsystem metrics are constructed on every request. The dynamic nature of metrics does not require us to define new metrics when the Elasticsearch version changes or remove deprecated ones, so no metrics maintenance is needed. Adding new metric subsystems is easy, but that rarely happens.

At the moment, the exporter supports the following subsystems:

=^.^=
/_cat/allocation
/_cat/shards
/_cat/indices
/_cat/segments
/_cat/nodes
/_cat/recovery
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/nodeattrs
/_cat/repositories
/_cat/templates
/_cat/transforms

/_cluster/health
/_cluster/stats

/_nodes/stats
/_nodes/usage
/_nodes/info

/_stats

Metrics generation code is a little over 1600 lines of code, split between 36 files.

You can try running the exporter from a docker container by using the command below:

$ docker run --network=host -it vinted/elasticsearch_exporter --elasticsearch_url=http://IP:PORT

The exporter also exposes metrics about itself.

Subsystem scraping times

Exporters resident memory

This is a sneak peek of Vinted’s Elasticsearch metrics infrastructure topology. The search infrastructure spans 3 data centers. We run one Elasticsearch cluster per data center. Each exporter is triplicated for high availability between data centers and racks inside a data center. Exporters generally serve 1 purpose: either scraping the full subsystem or parts of a single subsystem as in the case of a rich subsystem such as /_nodes/stats. Exporters run side-by-side to Elasticsearch nodes in docker containers and are configured using Chef.

This is a production subsystems list of used exporters

Exporter scrapes cluster health and aliases in master nodes

Exporter scrapes thread pools in client nodes

Exporter scrapes nodes stats partitioned per path parameters in data nodes

Exporter scrapes nodes usage and info in data nodes

Exporter scrapes /_cat subsystems in data nodes

We are also open-sourcing 13 dashboards that collectively have about 323 panels exposed by various exporter subsystems.

Available exporter dashboards

Enjoy.

If you liked what you just read, we are hiring Search engineers! Find out more on our Jobs page.

Vinted Search Scaling Chapter 5: Divide and Conquer

Evaldas Buinauskas — Wed, 22 Dec 2021 06:34:10 +0000

Elasticsearch is really fast out of the box, but it doesn’t know your data model or access patterns. It’s no secret that Elasticsearch powers our catalogue, but with the ever-increasing amount of listings and queries, we’ve had a number of problems to solve.

In the middle of 2020, we noticed an increasing amount of failing search requests during peak hours. Looking at node-exporter metrics in Grafana indicated nothing was really wrong and that the cluster load was fairly balanced, but it clearly wasn’t. Only after logging to the actual servers did we realise that the load distribution was uneven across all the Elasticsearch nodes we ran there.

Following documented guidelines for optimizing cluster and JVM settings, we experimented with shard sizes and the number of replicas while following the number_of_shards * (number_of_replicas + 1) < #data_nodes recommendation. But this gave us no significant improvements, so we had to approach the issue with a different strategy.

We’ve decided to apply the ‘divide and conquer’ strategy by creating separate indices for specific types of queries. These could exist in a different cluster, have different numbers of shards or replicas, or different index-time sorting. We started by redirecting all of our /_count queries to another cluster, which significantly reduced the portion of failing requests. We then realized that we've discovered something important:

Vinted supports multilingual search and uses an index per language strategy. We also allow users to browse our catalogue by applying filters without requiring any search text. We’ve created a version of the listings index to support such kinds of queries. By doing that, we could put all of the listings into a single index, no longer analyse textual fields and redirect all browsing queries to them.

Surprisingly, it helped to distribute cluster load more evenly despite the fact that the new indices were in the same cluster. The extra disk space and increased complexity was a worthy tradeoff. We decided to go even further and partition this index by date to see if we can control data growth easier, improve query performance, and reduce refresh times. Splitting the index by date allowed us to leverage the shards skipping technique. For example, say search queries last for 2 weeks of data; Elasticsearch would optimise them by skipping shards, querying only the last month’s index partition and skipping the rest.

You may already know that we use Kafka Connect for our data ingestion and it comes with Single Message Transforms (SMTs), allowing us to transform messages as they flow through Connect. Two of these SMTs were particularly interesting:

Timestamp Router allows the construction of a new topic field using the record’s topic name and timestamp field with the ability to format it.

Regex Router allows updating the record’s topic using the configured regular expression and replacement string.

To use these routers, our records – and tombstone messages – must contain a fixed timestamp. Otherwise, we will never be able to ingest data to Elasticsearch consistently. By default, each new record is produced with the current timestamp, but ruby-kafka allows it to be overwritten with any value. We believed that overwriting it with the listing’s creation time would work. We also considered trying ILM and data streams but quickly rejected them, as both of these techniques target append-only data.

We built a small application that mimics our ingestion pipeline, but with a couple of changes:

Overrides create_time with a fixed value for each key
Uses Timestamp Router to partition records into monthly partitions

{
  "transforms": "TimestampRouter",
  "transforms.TimestampRouter.timestamp.format": "yyyy.MM",
  "transforms.TimestampRouter.topic.format": "${topic}-${timestamp}",
  "transforms.TimestampRouter.type": "org.apache.kafka.connect.transforms.TimestampRouter"
}

The config change is fairly simple but very powerful – it’ll suffix the record’s topic with the year and month using its timestamp.

This is great, but we have lots of listings from the past, and creating a monthly bucket for very few records would be expensive. To combat this, we’ve used the Regex Router to put all the partitions up to 2019 into a single old one. It wouldn’t touch newer items.

{
  "transforms": "TimestampRouter,MergeOldListings",
  "transforms.TimestampRouter.timestamp.format": "yyyy.MM",
  "transforms.TimestampRouter.topic.format": "${topic}-${timestamp}",
  "transforms.TimestampRouter.type": "org.apache.kafka.connect.transforms.TimestampRouter",
  "transforms.MergeOldListings.regex": "(?-mix:(.+)-((2010|2011|2012|2013|2014|2015|2016|2017|2018).*))",
  "transforms.MergeOldListings.replacement": "$1-old",
  "transforms.MergeOldListings.type": "org.apache.kafka.connect.transforms.RegexRouter"
}

After confirming this works as expected, we implemented this into our application, which was fairly simple:

Create new topics for our listings (we don’t want to affect the current ingestion pipeline)
Write to both old and new topics at the same time
Use the listing to create time when sending records to the new topics
Test ingestion with the new topics and routing strategy
Switch to new topics once we confirm that ingestion works as expected

The initial switch to partitioned indices didn’t work out, so we’ve had to tailor our queries for them, tweak the number of shards and the number of replicas. After plenty of unsuccessful attempts, we finally have a version that works great.

We’ve managed to not only shed 20ms off the P99 but have had other great results too:

Query and request cache hit ratio is above 90% due to older indices refreshing less often
Having load split between multiple indices results in a more even load across the whole cluster, so we no longer have hot nodes
We’ve hit fewer index shards when querying for the top-k latest items

In retrospect, this abnormal behavior was most likely caused by bug in the Adaptive Replica Selection formula. But back in the day, we mitigated this bug by creatively applying the divide and conquer strategy, which helps us to this day.

Validating JSON input in Rust web services

Evaldas Buinauskas — Mon, 15 Feb 2021 06:46:34 +0000

One of the key features of a good web service is user input validation. External input cannot be trusted and the application must prevent malicious data from being processed.

It's not only a matter of security but as well as for service usability and development experience. When something goes wrong, at the very least service should respond with 400 Bad Request, but good service will also respond with exact details of what went wrong.

In order not to invent its own error response format, a service should use RFC7807. RFC 7807 provides a standard format for returning problem details from HTTP APIs.

In this tutorial, we'll implement a web service in Rust using warp web framework and add request validation using validator

Warp is a super-easy, composable, web server framework for warp speeds.

Validator is a simple validation library for Rust structs.

Project creation

For starters, create a new project using cargo

cargo new warp-validation --bin
cd warp-validation

Dependencies

Then edit Cargo.toml file and add these dependencies:

[dependencies]
tokio = { version = "1", features = ["full"] }
warp = "0.3"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
validator = { version = "0.12", features = ["derive"] }
http-api-problem = { version =  "0.21.0", features = ["hyper", "warp"] }

tokio is asynchronous runtime for Rust applications
warp is web framework of choice
serde is serialization and deserialization framework
validator is validation library
http-api-problem implements RFC7807 Problem Details for HTTP APIs

Now that when all dependencies are ready, we can start by implementing the most basic service.

Implementing web service

Basic web service

This is the most basic web service you can build in Warp. All it does is responds to GET /hello/:name requests with Hello, :name, where :name is path variable. We'll build on top of that.

use warp::Filter;

#[tokio::main]
async fn main() {
    let hello = warp::path!("hello" / String)
        .map(|name| format!("Hello, {}!", name));

    warp::serve(hello)
        .run(([127, 0, 0, 1], 3030))
        .await;
}

So if we called our endpoint using cURL:

curl http://localhost:3030/hello/evaldas

This would be the response:

Hello, evaldas!.

Accepting `POST` requests

The fundamental building block of warp is the Filter: they can be combined and composed to express rich requirements on requests.

Thanks to its Filter system, warp provides these out of the box:

Path routing and parameter extraction

Header requirements and extraction

Query string deserialization

JSON and Form bodies

Multipart form data

Static Files and Directories

Websockets

Access logging

Gzip, Deflate, and Brotli compression

A prebuilt list of filters can be found here.

In order to accept POST requests with JSON payloads, the following pieces are necessary:

Request struct needed for strongly typed requests. Notice that it derives the Deserialize trait, meaning, it can be deserialized from JSON
warp::post() filter to accept only POST requests
warp::body::json() to read JSON payload
.and(..) filter to chain them together and ensure that a successful request must match all of the required filters

use serde::Deserialize;
use warp::Filter;

#[derive(Deserialize, Debug)]
pub struct Request {
    pub name: String,
    pub email: String,
    pub age: u8,
}

#[tokio::main]
async fn main() {
    let hello = warp::path!("hello" / String)
        .and(warp::post())
        .and(warp::body::json())
        .map(|name: String, request: Request| {
            format!("Hello, {}! This is request: {:?}", name, request)
        });

    warp::serve(hello).run(([127, 0, 0, 1], 3030)).await;
}

See what happens when we send a valid request

curl http://localhost:3030/hello/evaldas \
    -X POST \
    -H "Content-Type: application/json"
    -d '{"name":"name","email":"email","age":1}'

We'd get this response back:

Hello, evaldas! This is request: Request { name: "name", email: "email", age: 1 }

Validating requests

Validating requests using the validator library is a breeze. All we really need to do is to derive the Validate trait and add #validate attributes to our fields.

#[derive(Deserialize, Debug, Validate)]
pub struct Request {
    #[validate(length(min = 1))]
    pub name: String,
    #[validate(email)]
    pub email: String,
    #[validate(range(min = 18, max = 100))]
    pub age: u8,
}

And then we can call .validate() method on our request to validate it.

#[tokio::main]
async fn main() {
    let hello = warp::path!("hello" / String)
        .and(warp::post())
        .and(warp::body::json())
        .map(|name: String, request: Request| {
            let result = request.validate();

            format!(
                "Hello, {}! This is request: {:?} and its validation result: {:?}",
                name, request, result
            )
        });

    warp::serve(hello).run(([127, 0, 0, 1], 3030)).await;
}

Sending exactly the same request would result in a response with a validation failure.

Hello, evaldas! This is request: Request { name: "name", email: "email", age: 1 } and its validation result: Err(ValidationErrors({"email": Field([ValidationError { code: "email", message: None, params: {"value": String("email")} }]), "age": Field([ValidationError { code: "range", message: None, params: {"value": Number(1), "max": Number(100.0), "min": Number(18.0)} }])}))

This is already great, but in order to completely implement validation, all we really need to do is to make it a part of the request chain.

Implementing a custom validation filter

To achieve that, we'll start by creating a custom Error enum type that contains only ValidationErrors.

#[derive(Debug)]
enum Error {
    Validation(ValidationErrors),
}

It will also implement warp::reject::Reject, this is needed to turn an Error into a custom warp Rejection.

impl warp::reject::Reject for Error {}

Then we'll create a custom method that accepts a generic value constrained to implement the Validate trait. It's going to validate the value and either return the value itself or validation errors.

fn validate<T>(value: T) -> Result<T, Error>
where
    T: Validate,
{
    value.validate().map_err(Error::Validation)?;

    Ok(value)
}

And finally we'll create a custom with_validated_json filter which does a couple of things:

Limits payload sizes to 16KiB using warp::body::content_length_limit
Uses built-in warp::body::json() filter to deserialize JSON into a struct
And then passes deserialized value onto validate method we just built

fn with_validated_json<T>() -> impl Filter<Extract = (T,), Error = Rejection> + Clone
where
    T: DeserializeOwned + Validate + Send,
{
    warp::body::content_length_limit(1024 * 16)
        .and(warp::body::json())
        .and_then(|value| async move { validate(value).map_err(warp::reject::custom) })
}

It's now enough to replace warp::body::json() filter in our request chain with with_validated_json. We also can remove validation from handler.

Sending that exact same request results in a validation failure, but this time it doesn't even reach the handler and returns early.

Unhandled rejection: Validation(ValidationErrors({"email": Field([ValidationError { code: "email", message: None, params: {"value": String("email")} }]), "age": Field([ValidationError { code: "range", message: None, params: {"value": Number(1), "max": Number(100.0), "min": Number(18.0)} }])}))

Notice that it says that rejection is unhandled - this is the case when warp doesn't know what to do with the Rejection and simply returns it. Luckily for us, there's an easy way to handle them and return appropriate responses. There's a excellent rejection example. We'll improve it by making sure that rejections return an instance of Problem Details instead of only status code and plain error text.

We'll use Rejection::find method to find whether our custom Error is the reason why request has failed, convert that into Problem Details, and in every other case return internal server error.

async fn handle_rejection(err: Rejection) -> Result<impl Reply, Infallible> {
    let response = if let Some(e) = err.find::<Error>() {
        handle_crate_error(e)
    } else {
        HttpApiProblem::with_title_and_type_from_status(StatusCode::INTERNAL_SERVER_ERROR)
    };

    Ok(response.to_hyper_response())
}

fn handle_crate_error(e: &Error) -> HttpApiProblem {
    match e {
        Error::Validation(errors) => {
            let mut problem =
                HttpApiProblem::with_title_and_type_from_status(StatusCode::BAD_REQUEST)
                    .set_title("One or more validation errors occurred")
                    .set_detail("Please refer to the errors property for additional details");

            let _ = problem.set_value("errors", errors.errors());

            problem
        }
    }
}

We then finally need to use Filter::recover to recover from the rejection and pass handle_rejection as an argument to it.

If we send exactly the same request again, this time it would return an instance of problem details with an appropriate status code:

{
  "type": "https://httpstatuses.com/400",
  "status": 400,
  "title": "One or more validation errors occurred",
  "detail": "Please refer to the errors property for additional details",
  "errors": {
    "age": [
      {
        "code": "range",
        "message": null,
        "params": {
          "max": 100.0,
          "min": 18.0,
          "value": 1
        }
      }
    ],
    "email": [
      {
        "code": "email",
        "message": null,
        "params": {
          "value": "email"
        }
      }
    ]
  }
}

The complete application looks as follows:

use http_api_problem::{HttpApiProblem, StatusCode};
use serde::de::DeserializeOwned;
use serde::Deserialize;
use std::convert::Infallible;
use validator::{Validate, ValidationErrors};
use warp::{Filter, Rejection, Reply};

#[derive(Deserialize, Debug, Validate)]
pub struct Request {
    #[validate(length(min = 1))]
    pub name: String,
    #[validate(email)]
    pub email: String,
    #[validate(range(min = 18, max = 100))]
    pub age: u8,
}

#[tokio::main]
async fn main() {
    let hello = warp::path!("hello" / String)
        .and(warp::post())
        .and(with_validated_json())
        .map(|name: String, request: Request| {
            format!("Hello, {}! This is request: {:?}", name, request)
        })
        .recover(handle_rejection);

    warp::serve(hello).run(([127, 0, 0, 1], 3030)).await;
}

fn with_validated_json<T>() -> impl Filter<Extract = (T,), Error = Rejection> + Clone
where
    T: DeserializeOwned + Validate + Send,
{
    warp::body::content_length_limit(1024 * 16)
        .and(warp::body::json())
        .and_then(|value| async move { validate(value).map_err(warp::reject::custom) })
}

fn validate<T>(value: T) -> Result<T, Error>
where
    T: Validate,
{
    value.validate().map_err(Error::Validation)?;

    Ok(value)
}

#[derive(Debug)]
enum Error {
    Validation(ValidationErrors),
}

impl warp::reject::Reject for Error {}

async fn handle_rejection(err: Rejection) -> Result<impl Reply, Infallible> {
    let response = if let Some(e) = err.find::<Error>() {
        handle_crate_error(e)
    } else {
        HttpApiProblem::with_title_and_type_from_status(StatusCode::INTERNAL_SERVER_ERROR)
    };

    Ok(response.to_hyper_response())
}

fn handle_crate_error(e: &Error) -> HttpApiProblem {
    match e {
        Error::Validation(errors) => {
            let mut problem =
                HttpApiProblem::with_title_and_type_from_status(StatusCode::BAD_REQUEST)
                    .set_title("One or more validation errors occurred")
                    .set_detail("Please refer to the errors property for additional details");

            let _ = problem.set_value("errors", errors.errors());

            problem
        }
    }
}

Vinted Search Scaling Chapter 3: Elasticsearch Index Management

Evaldas Buinauskas — Mon, 08 Feb 2021 08:31:00 +0000

Managing Elasticsearch indices at scale isn’t easy.

Index settings were part of our core application monolith logic. Making the slightest change to them meant that the whole application had to be redeployed, lots of unrelated tests had to run. Also, it was built to be only aware of a single language. We needed to be more flexible and have fine-grained control over our Elasticsearch clusters.

At the same time, we were planning to break our single Elasticsearch cluster into multiple ones for better load distribution and the ability to upgrade smaller clusters. Managing all of that from a single application was difficult. See how we managed to manage search traffic across multiple Elasticsearch clusters in second chapter of these series.

We’ve decided to decouple search setup and ingestion logic into a separate application, very obviously called Elasticsearch Index Management, or EIM for short. It would turn out to be a set of CLI utilities, automated jobs, and webhook listeners to automate our routine work. I’ll share how we solved these problems and what we learned.

Goals

There were no well-defined and clear goals when we started this project. All we really wanted was to solve our pain points:

Manage indices and their changes with ease
Control the data indexing pipeline
Detect Elasticsearch version inconsistencies
Automation of the search infrastructure

Index management

Vinted operates in 13 countries, we support the search for our feed, promotions, forums, FAQ search, and more. Naturally, we need to support multiple languages.

Elasticsearch templates are well suited for this use case. They’re composable, have great flexibility, allow dynamic field detection and mapping.
We identified that language-agnostic analyzers, language analyzers, index settings, index mappings, and ingestion pipelines are the key elements to construct a template. Indices we create based on templates are consistent and guaranteed to be as per spec. To keep templates correct and up to date, we test them during each CI build and refresh them on clusters during every release.
The way we construct templates also reflects on our code structure. Analysis, indices, pipelines, and other potential parts are grouped together and later merged into a single template.

elasticsearch/
├── analysis/
│   ├── common.rb
│   ├── english.rb
│   ├── french.rb
│   ├── lithuanian.rb
├── indices/
│   ├── faq.rb
│   ├── feed.rb
│   ├── forum.rb
│   └── promotions.rb
└── pipelines/
    └── brands.rb

To prevent human errors and catch issues before deployment, we heavily test all of the template pieces:

Field index and search time analyzers
Field settings, such as its type, norms, doc_fields or copy_to values.
Index settings, such as number of shards and replicas, refresh intervals
We test ingestion pipelines by simulating them and comparing before and after payloads

We harness Elasticsearch /_analyze and /_ingest/pipeline/_simulate APIs results by creating custom RSpec matchers. They allow us to write our domain-specific tests, which are elegant, short, intention-revealing.

The following matcher extracts tokens produced by /_analyze API and allows comparing them to expectation.

RSpec::Matchers.define :contain_tokens do |expected|
  match do |actual|
    tokens(actual).sort == expected.sort
  end

  failure_message do |actual|
    "expected: #{expected}\n" \
    "     got: #{tokens(actual)}\n" \
    "\n" \
    '(compared using ==)'
  end

  def tokens(response)
    response['tokens'].map { |token| token['token'] }
  end
end

And this is an example of how it is being used in specs:

context 'with special letters' do
  let(:text) { 'šermukšnis' }

  it { is_expected.to contain_tokens ['sermuksnis'] }
end

Data ingestion

From chapter 1, you already know that we leverage Kafka and Kafka Connect for data ingestion, meaning they both are an integral part of Elasticsearch Index Management. We also make heavy use of Elasticsearch Sink Connector and use it to reindex data whenever it is needed.

We built a set of abstractions that help us to achieve that. To reindex data, we'd begin with creating connectors for a specific type of indices. The following code would generate a collection of connectors with automatically generated names and configurations:

eim connector create --indices=Promotion Feed

Once the connector has caught up with indexing, it can be tuned for idle indexing to reduce indexing pressure on the Elasticsearch cluster:

eim connector update cluster-core-feed_20201217140000 --idle-indexing

In case of an emergency, a connector can be paused.

eim connector pause cluster-core-feed_20201217140000

We use Elasticsearch aliases API to atomically switch aliases between indices. After carefully observing metrics and being sure that everything's in place we'd run the following command to assign aliases to new indices and remove them from old ones.

eim alias assign --indices=Promotion Feed

Testing

We have first-hand experience that Elasticsearch upgrades are not always successful and rolling back might not be possible at all. One of the key reasons Elasticsearch Index Management exists is the ability to easily test cluster and indices settings against different Elasticsearch versions.
To avoid hassle of setting up multiple Elasticsearch versions locally, testing environments are containerized and consist of shared docker-compose files. This setup allows us to create environments for development, testing, and experiments with very little effort. For instance:

docker-compose \
    -p stack-7 \
    -f docker/docker-compose.es-7-base.yml \
    -f docker/docker-compose.es-7.yml \
    -f docker/docker-compose.kafka-base.yml \
    -f docker/docker-compose.kafka.yml \
    up

At some point, we had to support Elasticsearch 5.x, 6.x, and 7.x. The following diagram showcases how a single environment looks like. CI build would create an isolated environment for every Elasticsearch version we support and run over 600 tests for each environment in parallel.

Figure 1. Testing environment

Webhook listeners

Kafka Connect connector tasks occasionally fail for various reasons, such as a network hiccup, out of memory exception. Initially, we would restart them manually, however, that wasn't sustainable and we needed a better solution.

Vinted uses Alertmanager for infrastructure monitoring. Out of the box, it can send alerts to email, Slack, additionally it allows configuring generic webhook receivers.

We built such a receiver on top of Elasticsearch Index Management, deployed, and made it a part of Alertmanager configuration. From now on, when a connector task fails, Alertmanager sends a RESTful request to restart it.

Wrap up

One year later, Elasticsearch Index Management has become a vital part of our job. Before that, everything from cluster upgrades to a simple change of index configuration was an adventure. Now it's predictable, automated, and boring. During one year of development, we caught most of the edge cases, ironed-out most annoying bugs, and built ourselves a companion that lets us stay productive and look after Elasticsearch estate with ease.

Vinted Search Scaling Chapter 2: Routing Elasticsearch requests with srouter

Evaldas Buinauskas — Fri, 29 Jan 2021 09:02:15 +0000

It began with Elasticsearch cluster upgrades. Every major Elasticsearch version brings new features, free performance, compatibility issues, and uncertainty.
In addition, we had an itch to dissect requests, conjoin request with a response for analysis, record queries for stress testing, and account for each request per second.
With these requirements in mind, we have built a sidecar component next to primary application.

The leftmost component is the core application that is not aware that search requests are being handled by multiple Elasticsearch clusters.
Next component is sidecar service written in Rust programming language, we call srouter.

Figure 1. srouter

Srouter sidecar provides us fine-grained control of routing search requests to multiple Elasticsearch clusters which allows us to test and compare different Elasticsearch versions using the real-time production workload without impacting the production.
Service is configured in a simple and readable way, configuration may be granular up to the specific HTTP request path,
or greedy as required by the use case.
A simplified configuration sample can be viewed below.

--------
whitelist_ends_with:
  - /_count
  - /_search
routes:
  main-items/_search:
    main:
      - http://127.0.0.1:1000
    mirror:
      - http://127.0.0.1:1010
    amplify: 4
    timeout: 311

  lt-main-items/_count:
    main:
      - http://127.0.0.1:1100
    timeout: 100
    sampling: 10
    storage:
      - http://dc1-host.example.com:8082/topics/storage

  default:
    main:
      - http://127.0.0.1:2000
    timeout: 200

At the time of writing in production, we have defined 10 custom routes, it takes up to 3minutes to change any route in production, most of the time is spent in deployment.
Configuration is kept in a git repository to keep track of changes. In the configuration sample, we have greedy main-items/_search route with mirror traffic amplified 4x times.
Second lt-main-items/_count route has a timeout of 100ms and 10 percents of requests are sampled to Kafka.
Unmatched routes fall-back to default route. Srouter is picky about the HTTP method used, service allows any path HTTP GET requests, while HTTP POST requests are white-listed, in any other case service returns 405: method not allowed status.
The HTTP method limits protect from accidental index removal, direct indexing and serves as a gateway for engineers to "safely" access Elasticsearch.
Configuration flexibility unlocks imagination to tame search scale by shaping Elasticsearch clusters dedicated to handling specific types of requests.

Configuration routes are updated seamlessly without service interruption using SIGHUP signal.
After receiving update signal, update is accomplished by using atomic pointer to update new routes.
An atomic pointer is synchronized by relaxed memory ordering to point to new updated HashMap of routes.
Srouter is OSI layer 7 router built on top of low level hyper server and tokio asynchronous runtime.
As Rust is famous for performance this service is no different, it can provide more than 500k requests per second with 200 route updates every 10ms.
Srouter single instance production resources footprint is 1 tokio core thread per 1 CPU with a total 124 megabytes of virtual memory, a large part of the memory (97%) is used by Prometheus metrics service exposes.

Metrics provided by the service are labeled per request path.
Centralization of labeling in sidecar service makes a primary application simpler.
Labeled metrics provides a base understanding of the frequency and volume of the queries, it helps to detect bad requests before production, measure each request percentile, track an exact number of requests.
In addition to metrics, we are sampling a portion of queries and all unsuccessful responses.
Sampled requests are enriched with requests, response headers, service time it took for srouter to complete the request and Elasticsearch response.
Stored request/response combination allowed us to deprecate Elasticsearch slow query logs, the change reduced I/O activity, population sample of queries is improved.
Population sample of queries is improved by sampling not only "slow" (where slow is defined by internal ES measure which depends on many factors such as search request queuing, indexing load, GC, etc) queries but also fast and even failed queries again in a uniform way.

Srouter proved to be a critical service to our planned scaling strategy. Anthony Williams in C++ Concurrency in Action[1] defines scalability as "reducing the time it takes to perform an action (reduce time) or increasing the amount of data that can be processed in a given time (increase throughput) as more processors are added."
We have summarized this in two simple scaling cubes.

Figure 2. Scaling cubes

Future chapters will explore how we are using "scaling cubes" to scale Elasticsearch.
Liked what you have read? We’re currently hiring Search Engineers, find out more here.

Reference:

[1] C++ Concurrency in Action by Anthony Williams, https://www.manning.com/books/c-plus-plus-concurrency-in-action
Hands-On Concurrency with Rust: Confidently build memory-safe, parallel, and efficient software in Rust by Brian L. Troutwine https://www.amazon.com/Hands-Concurrency-Rust-Confidently-memory-safe/dp/1788399978
Tokio https://github.com/tokio-rs/tokio
Hyper https://hyper.rs, https://github.com/hyperium/hyper
Scaling cubes: https://en.wikipedia.org/wiki/Scale_cube
Greedy regular expressions: https://www.regular-expressions.info/repeat.html

Vinted Search Scaling Chapter 1: Indexing

Evaldas Buinauskas — Mon, 18 Jan 2021 15:19:27 +0000

Data ingestion into Elasticsearch at scale is hard. In this post, I’m going to share a success story of leveraging the Kafka Connect Elasticsearch Sink Connector to support the ever-increasing usage of the Vinted platform and to help our search engineers to ship features quickly all while having an operational and reliable system.

Prehistory

Several times postponed and endless times patched with magic workarounds like future jobs, delta jobs, deferred deduplication, delta indexing, etc. - it is time to rethink this solution.

--- Ernestas Poškus, Search SRE Team Lead

Scope of the Post

The job of the indexing pipeline is to bridge the gap between the primary datastore that is MySQL and the search indices backed by Elasticsearch. For the building blocks of the indexing pipeline, we picked Apache Kafka and Kafka Connect.

Figure 1. Architecture of the indexing pipeline

The term Async Jobs in the chart above refers to a collection of Ruby scripts that periodically check the database for updates that happened during the last several minutes. Updates are then sent to the appropriate Kafka topics.

We have a separate Kafka cluster for handling the search use cases. Kafka serves as a high-performance, durable, and fault-tolerant buffer.

Kafka Connect is a scalable and reliable tool for streaming data between Apache Kafka and other systems. It allows to quickly define connectors that move data into and out of Kafka. Luckily for us, there is an open-source connector that sends data from Kafka topics to Elasticsearch indices.

From now on, this post will focus on the details of our Kafka Connect deployment: how we use the system, and its properties. Also, we'll share the battle stories and lessons we learned while implementing the solution.

Requirements of the Data Indexing Pipeline

We started the project with a list of requirements. The following list captures the most important ones:

Capturing the ordered set of data indexing operations (to act as a source-of-truth);
Scalability and high-availability;
Ensuring the delivery and consistency of the data;
Indexing data to multiple Elasticsearch clusters that are of different versions;
A programmable interface that allows to control the indexing pipeline;
Support of manual throttling and pausing of the indexing;
Monitoring and alerting;
Handling indexing errors and data inconsistencies;

We researched several available open-source tools for the indexing pipeline. Also, we considered the possibility of implementing such a system ourselves from scratch. After evaluating, Kafka Connect seemed to be the way to go.

Capturing the ordered set of updates to the Vinted Catalogue

We expect to store the data indexing requests for as long as needed. To achieve that we set retention.ms to -1 for all the relevant topics, i.e. we instructed Kafka not to delete old records. But what about storage requirements of the Kafka cluster if we don't delete anything? We leverage Kafka's log compaction¹ feature. Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition. Or in other words - Kafka will delete old records with the same key in the background. This strategy allows to "compact" all the updates of a catalog item into one update request.

Kafka topics contain records where the record key is an ID of the Elasticsearch document, and the record body is the _source of the Elasticsearch document. The body is stored as a JSON string. Note that the record body is always a full document, meaning that we don't support partial updates to the Elasticsearch documents. There is a special case when the Kafka record body is null. Such messages are called tombstone messages. Tombstone messages are used to delete documents from the Elasticsearch indices.

Scalability and High-availability

Kafka Connect is scalable. Kafka Connect has a concept of a worker, which is a JVM process that is connected to the Kafka cluster and a Kafka Connect cluster can have many workers, and each connector is backed-by a Kafka consumer group which can have multiple consumers instances. These consumer instances in Kafka Connect are called tasks, and tasks are spread across the workers in the Kafka Connect cluster. However, the concurrency is limited by the partition count of source topics: it is not recommended to have more tasks per connector than target topics have partitions.

On top of that, the Elasticsearch connector supports multiple in-flight requests to Elasticsearch to increase concurrency. The amount of data sent can be configured with the batch.size parameter. How often the data is sent can be configured with the linger.ms parameter.

Figure 2. Scalability schema of Kafka Connect

It is worth mentioning that one connector can index data to one Elasticsearch cluster. To support multiple Elasticsearch clusters you just need to create multiple connectors.

High availability is ensured by deploying Kafka Connect workers over multiple servers, so that each task of the connector fails independently. An advantage of this is that failed tasks can be restarted one-by-one.

Elasticsearch Client Node Bug

We used Elasticsearch client nodes to perform the load balancing during indexing of the data into Elasticsearch clusters. However, occasionally, we had to kill and restart client nodes because they failed with an error:

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[parent] Data too large, data for [<http_request>] would be [33132540088/30.8gb], which is larger than the limit of [31653573427/29.4gb], real usage: [33132540088/30.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=739583758/705.3mb, accounting=0/0b]",
        "bytes_wanted": 33132540088,
        "bytes_limit": 31653573427,
        "durability": "TRANSIENT"
      }
    ],
    "type": "circuit_breaking_exception",
    "reason": "[parent] Data too large, data for [<http_request>] would be [33132540088/30.8gb], which is larger than the limit of [31653573427/29.4gb], real usage: [33132540088/30.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=739583758/705.3mb, accounting=0/0b]",
    "bytes_wanted": 33132540088,
    "bytes_limit": 31653573427,
    "durability": "TRANSIENT"
  },
  "status": 429
}

The error above looks as though we simply sent too much data to Elasticsearch at once. Normally, this can be easily mitigated, for example by reducing the batch.size parameter in the connector or decreasing indexing concurrency. However, the error still didn't disappear. After some investigation we discovered that the problem was being caused by a bug in Elasticsearch client nodes.

Our workaround was to send indexing requests directly to the data nodes.

Handling Data Consistency

To ensure that Elasticsearch contains correct data we leveraged the Elasticsearch's optimistic concurrency control machanism combined with at-least-once delivery guarantees of Kafka Connect and the fact that Kafka ensures message ordering per topic partition when records have an ID. The trick is to use Kafka record offset as a _version of the Elasticsearch document and set the version_type parameter to external for indexing operations. Handling of the details falls on Kafka Connect and Elasticsearch.

Elasticsearch Connector Bug

While testing the Kafka Connect we discovered that it is still possible to end up having inconsistent data in Elasticsearch. The problem was that the connector did not use the record offset as a _version for delete operations. We registered the bug in the Elasticsearch sink connector repository.

The bug affected setups where Kafka Records have non-null keys, send multiple indexing requests with data from the same partition in parallel, where they represent either index or delete operation. In this case, Elasticsearch can end up having either data that it should not have (e.g. item is sold but is still discoverable), or not having the data that it should have (e.g. item is not in the catalogue when it should be).

We patched the connector and contributed the fix back to upstream. As of this writing, we are maintaining and deploying the fork of the connector ourselves, while we wait for the Elasticsearch connector developers to accept our pull request and release a new version with the fix.

UPDATE: the issue seems to be fixed but is not released. However, it was done not by accepting out pull request but using a different library to handle the Elasticsearch indexing.

Corner Case Adventure

On a regular day, the described setup is worry-free from the data consistency point-of-view. But what would happen when an engineer increases the number of partitions of Kafka topics for the sake of better throughput in the Kafka cluster? Data inconsistencies happen. The reason behind it is that the record routing into partition changes and (most likely) records ends up being recorded in a different partition and even more importantly: with smaller offset. From Elasticsearch perspective the new records are "older" then the actually older records, i.e. updates to the index are not applied.

Unfortunately, the described situation happened in our setup. Fixing the issue required to reingest all the data into the new topics (because current topics are "corrupted") and to reindex the data into Elasticsearch. Instead of going this path, we used this "opportunity" to upgrade the Kafka cluster to the newer version. The benefit of this approach was that the topic names could remain unchanged. Which required no changes in our indexing management code. On the flip side, we had to maintain two Kafka clusters for the transition period.

Handling of failures

Out of various options to handle errors with the Kafka Connect, we opted to continue processing despite the fact that some error occured while manually investigating the errors asynchronously in the background to minimize the impact to the indexing pipeline.

In case of an error, the idea is to send the faulty Kafka record to a separate Dead Letter Queue (DLQ) topic with all the details in the Kafka message record² header, log the problem, and when more than a couple of errors happens during a couple of minutes, send an error message to the Slack. It is said that a picture is worth a thousand words, so let's visualize that last longish sentence with a neat diagram:

Figure 2. Error handling

Which errors are handled by the process above? Two cases: (1) when the data is in the wrong format (e.g. the record body is not a valid JSON) and (2) when single message transforms (SMT) are failing. A notable exception is that Kafka Connect doesn't handle errors that happen when sending requests to the Elasticsearch bulk API, i.e. if Elasticsearch rejects request then errors are not handled as previously described, just logged. Kafka Connect logging is set up to send logs to common logging infrastructure that is described in excellent detail in the post "One Year of Log Management at Vinted".

A snippet of relevant settings from the connector configuration³:

"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.replication.factor": "3",
"errors.deadletterqueue.topic.name": "dlq_kafka_connect",
"errors.deadletterqueue.context.headers.enable": "true",
"behavior.on.malformed.documents": "warn",
"drop.invalid.message": "true",

Future tasks

A solution to synchronize data between MySQL and Kafka begs to be improved. For example, we could get rid of the Ruby scripts by leveraging the Change Data Capture (CDC). The stream changes directly from MySQL to Kafka using another Kafka Connect connector called Debezium.

Running Elasticsearch, Kafka, and Kafka Connect, etc. on the workstation requires a lot of CPU and RAM and the Kafka Connect is the most resource intensive of the bunch. Resource consumption gets problematic when the components are run on a MacBook in Docker containers: the battery drains fast, machine runs hot, fans get loud, etc. To mitigate the situation we will try to run Kafka Connect compiled to the native-image using GraalVM.

Wrapping Up

As of now, the data indexing pipeline is one of the most boring and least problematic parts of the search stack at Vinted. And it should be. The reliability and scalability of the Kafka Connect allows our engineers to be agile and focus on working on Vinted's mission of making second-hand fashion first choice.

Footnotes

Kafka log compaction ↩
We also use Kafka REST Proxy and, unfortunately, headers with failure details can't be fetched through it because Kafka REST Proxy does not support Record headers. ↩
For detailed description of the settings consult the documentation here and here ↩

Unbearably slow API gateway calls between Azure App Services

Evaldas Buinauskas — Tue, 17 Sep 2019 19:08:36 +0000

Hello everyone,

I'd like to ask for help debugging ASP.NET Core Web API.

That's the situation:

We've got an authorization service that exposes multiple RESTful APIs to check if authenticated user can access certain business objects. In my case the business is object is a concept of market and they are authorized by project. So API call would look as follows:

GET https://authorization.service.com/markets?project=123456

Successful response will return array of markets, unauthorized response will return a 403 status code.

It all works great when I call it through Swagger, call it from Postman - responses are quick and come back in tens of milliseconds.

Now my other service is calling this service in authorization filter and based on response either continues or responds with Forbidden.

I've implemented a reusable client for HTTP calls:

public class ApiClient : IApiClient
{
    private readonly HttpClient _client;
    private readonly JsonSerializer _jsonSerializer;

    public ApiClient(HttpClient httpClient, JsonSerializer jsonSerializer)
    {
        _client = httpClient ?? throw new ArgumentNullException(nameof(httpClient));
        _jsonSerializer = jsonSerializer ?? throw new ArgumentNullException(nameof(jsonSerializer));
    }

    public async Task<T> GetDataAsync<T>(string uri, string accessToken)
        where T : class
    {
        var request = CreateRequest(uri, accessToken);
        var response = await _client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);

        if (!response.IsSuccessStatusCode)
        {
            return default(T);
        }

        using (var responseStream = await response.Content.ReadAsStreamAsync())
        using (var streamReader = new StreamReader(responseStream))
        using (var jsonTextReader = new JsonTextReader(streamReader))
        {
            return _jsonSerializer.Deserialize<T>(jsonTextReader);
        }
    }

    private HttpRequestMessage CreateRequest(string uri, string accessToken)
    {
        var locationUri = new Uri(uri);

        var message = new HttpRequestMessage(HttpMethod.Get, locationUri);

        message.Headers.Add("Authorization", $"Bearer {accessToken}");

        return message;
    }
}

My authorization filter looks as follows:

public class AuthorizeProjectFilter : IAsyncActionFilter
{
    private readonly IApiClient _apiClient;
    private readonly IMarketsContext _marketsContext;

    public AuthorizeProjectFilter(IApiClient apiClient, IMarketsContext marketsContext)
    {
        _apiClient = apiClient ?? throw new ArgumentNullException(nameof(apiClient));
        _marketsContext = marketsContext ?? throw new ArgumentNullException(nameof(marketsContext));
    }

    public async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
    {
        if (
            context.HttpContext.Request.Query.TryGetValue(nameof(RequestBase.Project), out var value) &&
            context.HttpContext.Request.Headers.TryGetValue("Authorization", out var authorizationHeader)
            )
        {
            var project = value.ToString();
            var accessToken = authorizationHeader.ToString().Split(" ")[1];

            var markets = await _apiClient.GetDataAsync<IEnumerable<Market>>($"{_authorizationServiceOptions.Authorization}/markets?project={project}", accessToken);

            var authorized = markets.Any();

            if (!authorized)
            {
                context.Result = new ForbidResult();
            }
            else
            {
                _marketsContext.Markets = markets;
                await next();
            }
        }
    }
}

And this works great locally, requests are quick when I call this service, however when I deploy this to Azure App Service - responses take 20+ seconds, sometimes more than a minute. App plan isn't also the cheapest one and it should be fast.

That doesn't seem like a cold start issue because I can keep on calling the service - it still remains slow.

I've run a trace on Azure diagnostics and got the following:

Seems like most of the time is spent on BLOCKED_TIME within system.net.http and I'm literally lost here.

Tracing has also caught some exceptions with the following message:

An attempt was made to access a socket in a way forbidden by its access permissions

Also my HttpClient is added using IHttpClientFactory:

services
    .AddHttpClient<IApiClient, ApiClient>(client =>
    {
        client.DefaultRequestHeaders.Add("Accept", "application/json");
    })
    .ConfigurePrimaryHttpMessageHandler(() => new HttpClientHandler
    {
        UseProxy = false
    });

I'm literally lost now, I couldn't find anything on Google that would've helped me.

Please help :)

What are the most common tools for data pre-calculation and aggregation?

Evaldas Buinauskas — Sun, 21 Oct 2018 16:41:10 +0000

Company, I work at, does data research and scraping which is later aggregated and published to our clients. We also try to denormalize data in order to provide faster data lookup in web applications.

Until now, we used mechanisms within SQL Server to do these aggregations. But recently this has became a bottleneck and processes take too much time to execute and overlap to business hours.

What are other tools that market uses to perform aggregations and pre-calculation outside of relational database? My discoveries include:

Apache Hadoop MapReduce
Apache Pig
Apache Spark

Hiring for the first time

Evaldas Buinauskas — Thu, 06 Sep 2018 18:28:06 +0000

With recent promotion, one of my tasks will be filling up a team with a couple of junior developers. This is going to be the first time I'll interview someone. Could anyone share posts, books, videos, personal experience, basically anything that I could consume in order to prepare myself for it better and be as much as stress free as possible?

I also should mention that it's going to be internal recruitment and I'll have to interview people I work with daily.

Any kind of comments and questions are welcome!

Help designing web application UI

Evaldas Buinauskas — Tue, 07 Aug 2018 18:57:16 +0000

We're a small team of backend developers building our first web application (a really small one) and we don't really have a dedicated UI person to help us out.

We're using Angular together with Angular Material and try to strictly follow it. Would there be someone who'd be able to chat and give some basic guidance?

What kind of peripherals do you use?

Evaldas Buinauskas — Wed, 04 Apr 2018 15:32:42 +0000

How much attention do you pay for your input devices, normally keyboard and a mouse?

I always loved mechanical keyboards and that clack sound they make and that keystrokes are now lighter, but I never really cared about mouse, however I noticed that my wrist starts hurting a bit and I started looking into more ergonomic ones.

Logitech MX Master 2S seemed like a good candidate but there are some complaints about its' build quality.

Logitech MX ERGO is praised but I'm not really sure about trackball.

Vertical mouses?

What do you use in your day-to-day job and if you're an fan of ergonomic mice that are built for productivity, what would you recommend?

Maximum line length in your code

Evaldas Buinauskas — Thu, 29 Mar 2018 09:02:13 +0000

Until now I've always tried to stick to 80 characters per line and it worked just fine in plain ol' JavaScript, however after adopting TypeScript it feels like it's not enough with all these type declarations, sometimes code becomes harder to read, an example:

export const setCommunity: ActionCreator<SetCommunityAction> = (
  id: string
) => ({
  type: '@@community/SET_COMMUNITY',
  payload: {
    id
  }
});

Whereas if limit would be 100, this becomes more readable:

export const setCommunity: ActionCreator<SetCommunityAction> = (id: string) => ({
  type: '@@community/SET_COMMUNITY',
  payload: {
    id
  }
});

Of course this is a single example that was frustrating me.

What max line length do you use and what are the reasons behind it?