Posted on Dec 22, 2018 • Edited on Jan 1, 2019

EnqueueZero Techshack 2018-51

#enqueuezero #kubernetes #opensource #cnab

EnqueueZero Techshack 2018-51

Kia ora! I am the creator of Enqueue Zero (https://enqueuezero.com), a site that explains code principles to Developers/DevOps/Sysadmins. This site is updated under a single man team since 2018. The EnqueueZero Updates ${YEAR}.${WEEK} series is comprised of a bunch of interesting posts I wrote or found on the Internet. More importantly, I hope these posts would please you.

To help me more motivated and keep this website ad-free, appreciate your kind donation!

Enjoy reading this week's posts.

Status Site

✨ enqueuezero.com ✨

The status site is an individual website listing the particular component statuses that make up the product.

It shows below two information:

Display the status of each function of the business.
Display A list of incidents organized on a daily basis. If nothing happens, show "No incidents reported.", otherwise, show the details of the incidents, such as when the incidents were detected, how the incidents were handled, and when the incidents were resolved.

It has below goals:

Show how reliable the platform is.
Show how well the platform is recovering from failure.
Show how performance is as it evolves.

The status site shows the statuses of components of the product. It's about to be transparent to users. Users know exactly where to look where there is downtime and staffs will be acting on the information they know is up-to-date.

Availability

✨ enqueuezero.com ✨

The availability clearly defines how well the system succeeds in providing services to the customers. Improving the availability of the service is hard. Note that the important thing is not to deliver the agreed SLA but to happy customers. If the numbers are good, but customers are not satisfied, then you need to reconsider the whole thing, set the right measurement target, and validate that again.

5 ways Facebook improved compression at scale with Zstandard

code.fb.com

Use Case: Compress hundreds of gigabytes of data for backup\ Combining -T, --long, --adapt is perfect for server backups.
- MultiThreaded Compression. Use -T0 option to enable on all the available cores.
- Long Range Mode (--long). Long range mode is a serial preprocessor which finds large matches and passes the rest to a parallel zstd backend.
- Automatic Level Determination (--adapt). Adaptive mode measures the speed of the input and output files, or pipes, and adjusts the compression level to match the bottleneck.
Use Case: Compress data in warehouse?\
- Increaseing Block Size to 256KB intead of the default 32KB generates significant incremental compression benefits.
Use Case: Compressed filesystems.
- Added zstd support to SquashFS in read-only cases.
- Added zstd support to Btrfs in copy-on-write cases.
Use Case: Databases.
- Hybrid compression: MyRocks is MySQL on RocksDB, Rocksandra is Cassandra on RocksDB, and ZippyDB is an internal database built on top of RocksDB..
- Dictionary Compression: Analyze sample data, presume that the rest of the data to compress will be similar..Once the patterns have been captured, the dictionary assumes future data will be similar and begins compression immediately.
Use Case: Managed Compression
- To overcome the dictionary becoming stateful, zstd --train is introduced for pre-generating states. It simplifies the interfaces by keeping using compress() / decompress().

How Does setState Know What to Do?

overreacted.io

Q: When you call setState in a component, what do you think happens?\
A: Every renderer sets a special field on the created class. This field is called updater.

// // Inside React DOM
const inst = new YourComponent();
inst.props = props;
inst.updater = ReactDOMUpdater;

// Inside React DOM Server
const inst = new YourComponent();
inst.props = props;
inst.updater = ReactDOMServerUpdater;

// Inside React Native
const inst = new YourComponent();
inst.props = props;
inst.updater = ReactNativeUpdater;

// A bit simplified
setState(partialState, callback) {
  // Use the `updater` field to talk back to the renderer!
  this.updater.enqueueSetState(this, partialState, callback);
}

Scheduler Algorithm in Kubernetes

github.com

This document explains how to select a node for a Pod unscheduled. The two steps are:

Filter nodes that do not meet certain requirements of the Pod. It could due to below reasons:
- No disk, no volume zone, or maximum volumes count reached.
- No CPU, no memory.
- HostPort already occupied.
- HostName not match, Label not match.
- Node is reporting memory/disk pressure.
Rank the remaining nodes to find the best fit.
- The prioritization is performed by a set of priority functions.
- A prioritization function return 0 to 10, signifies from least preferred to most preferred.
- finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + ...
- Altorithms:
  - Least requested
  - Balance resource
  - Minimizing the number of Pods on the same node.
  - Prefer nodes with image pulled already.
  - Node affinity

The Kubernetes Scheduler

github.com

The Kubernetes scheduler runs as a process alongside the other master components such as the API server.

Predicates are a set of policies applied one by one to filter out inappropriate nodes.

Priorities are a set of policies applied one by one to rank nodes (that made it through the filter of the predicates).

For given pod:

    +---------------------------------------------+
    |               Schedulable nodes:            |
    |                                             |
    | +--------+    +--------+      +--------+    |
    | | node 1 |    | node 2 |      | node 3 |    |
    | +--------+    +--------+      +--------+    |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+

    Pred. filters: node 3 doesn't have enough resource

    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+
    |             remaining nodes:                |
    |   +--------+                 +--------+     |
    |   | node 1 |                 | node 2 |     |
    |   +--------+                 +--------+     |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+

    Priority function:    node 1: p=2
                          node 2: p=5

    +-------------------+-------------------------+
                        |
                        |
                        v
            select max{node priority} = node 2

The DRY Principle: Its Cost Explained with Examples

web-techno.net

Knowledge duplication is always a DRY principle violation.
Code duplication doesn’t necessarily mean violation of the DRY principle.
- Trying to apply DRY everywhere can have two results: Unnecessary coupling & Unnecessary complexity.
DRY states that you shouldn’t duplicate knowledge, not that you should code to be able to reuse everything.
DRY is a principle, not a rule. Everything has a cost in development. DRY is no exception. Obviously you shouldn't repeat your business logic all over the place, but you shouldn't neither tightly couple everything because you don’t want to repeat your code.
Be careful not to extract your duplicate code and make everything depend on it.

Events, the DNA of Kubernetes

mgasch.com

Kubernetes: Autonomous processes reacting to events from the API server.
Processes: Control loops (observe -> analyze -> act), or producers and/or consumers of events (consumers can be producers as well, and vice versa), including the scheduler, deployment controller, endpoint controller, Kubelet, etc.
Event: an immutable fact that happened, e.g. "pod created".
API server: an immutable (replicated) log (or queue), each representing a stream of events.
Object & Namespace: a dedicated (virtual) event queue the API server handles.

What is HTTP/3 ?

dev.to

HTTP/3 = HTTP-over-QUIC.

QUIC ( Quick UDP Internet Connections) is a new transport which reduces latency compared to that of TCP.
It uses fewer handshakes, improves forward error correction and congestion control, and avoids HoL blocking.

At old times, the Internet sucks, and so TCP protocol set a large timeout and window.
Everything has changed except the timeout is still 20 seconds nowadays.
If you kill a TCP socket that hangs for 5 seconds, your OS will still leave it open until the 20 seconds expired, consuming system resources.

Microservices are for humans, not machines

medium.com

This post explores the organizational and cultural benefits of moving to a microservices architecture.

The advantage is the ability to scale each microservice in isolation, based on its specific load requirements.
The disadvantage is distributed systems are hard.

From an organizational perspective, it leads to smaller (and more maintainable) code repos, clearer boundaries between teams, etc.

One of the goals of microservices architecture is to let development teams achieve better results.

The author thinks starting from monolithic and moving to microservices when it makes sense.

Does the Service Mesh spell the end for Middleware?

www.cloudops.com

The short answer is NOT until a framework of DevOps toolings and practices is put in place.

Middleware works by flying messages from various applications into centralized pools of communication. These messages are then carried through a pipeline of functions, ending with the user registry service. Messages are transported on enterprise service buses (ESBs).
Messages are transported within the service mesh, but the messaging functions are performed alongside the services receiving the messages. Each instance is attached to a proxy that relays messages to and fro the service mesh.

Service Mesh is getting more mature, while middleware can be ideal for transporting information between monolithic applications. Migrating from Middleware to Service Mesh might need to uproot the ways their application and infrastructure.

Visualizing Systems with Statemaps

Slide

Some interesting points:

System goes wrong in silent and unseeable.
We amplify the signals via latencies, etc.
Observation on a system is necessary.
Aggregation on the instrumenting data is essential for scaling.
There is not one way to visualize the data for the system, but many. And visualization is an essential facet of system observability.
Statemap: a visualization of entity state over time. It operates on a payload of JSON where each line corresponds to a state transition for an entity. For example:

  {time: 1545005553, entity: 1, state: 0}
  {time: 1545005554, entity: 1, state: 1}

States are described in JSON metadata header. For example:

  {
    title: "",
    states: {
        state1: {value: 0, color: #fff},
        state1: {value: 0, color: #fff},
    }
  }

Airbnb SOA Migration

Presentation and Transcript

Because of the tangling modules, the growth of the business, and the increasing number of incidents, they decided to migrate to SOA.
Why SOA? SOA is a network of loosely-coupled services. It's well proofed in many organizations.
Two design principles:
- Services should own both reads and writes to their own data.
- Data mutations should publish via standard events.
Getting started:
- Customize an ActiveRecord adaptor.
- Re-route queries to services.
- Organize services into data services, middle-tier services, etc, and design their calling directions.
- Incremental migration.
- Let the framework auto-generate code, do standard testing and deployment that replay production traffic via the utility Diffy.
- Write Thrift IDL for the stability of the interfaces of the services.
Thoughts:
- SOA is a high investment cost.
- Service orchestration becomes complex. Kubernetes comes in for scaling services.
- It's a long commitment and journey. Ensure you don't break functionality on the way by comparing slowly, carefully, and piecemeal.

Evolution of Application Data Caching: From RAM to SSD

The Netflix Tech Blog

Context:
- Move the data swiftly and securely across all regions came at a considerable increase in cost and complexities.
Goal:
- Provide a global caching solution which was not only fast but was also cost-effective.
Solution:
- SSDs for Caching. The cost to store 1 TB of data on SSD is much lower than storing the same amount in RAM.
- Store all the data on SSD (RocksDB) and the active/hot data in RAM (Memcached). The problem is the poor performance for the large cluster when overwriting existing data on RocksDB (for compactions).
- Memcached External Storage (extstore): All the metadata (key & other metadata) is stored in RAM whereas the actual data is stored on flash.
- Move some of the large RAM based memcached clusters to considerably smaller extstore clusters.
Conclusions:
- All production clusters are running extstore.

Implementing the Netflix Media Database

The Netflix Tech Blog

Overview:
- This post discusses the key traits of NMDB and how the design aims to address them.
System requirements:
- Support for Structured Data
- Multi-tenancy and Access Control
- Integration with the legacy system
- SLA Guarantees
- Flexibility of Queries

Loki

GitHub Project |
The design document.

Context:
- Extra context switch is needed for engineering between the metric (time series) dashboard and the logs.
How it solves the problem:
- Install the agent promtail to each server or pod, gathering logs and sending them to loki.
- The central server loki indexes the metadata of log streams and store them.
- Grafana as the dashboard for querying logs to loki in prometheus-like queries, for example, {app="mysql",name="mysql-backup"}.
The challenges:
- Obtain reliable metadata that is consistent with the metadata associated with the time series / metrics.
- Solution: Use same service discovery and label relabeling libraries as Prometheus.
- Choose the chunk size. Is it going to be 4kb/s, or 8kb/s, or more?
- Solution: It depends on operation cost vs storage costs, chunk index load, Cost of memory for building chunks v/s risk of loss, Compression effectiveness.
- Chunks are many orders of magnitude larger than Prometheus/Cortex chunks
- Solution: Support streaming and iterating over them.

Docker App

Introduction |
GitHub Project

Context:
- You have several environments where you want to deploy the application, with small configuration differences.
- You have lots of similar applications.
- Compose files are not easy to share.
Solution:
- Make your Docker Compose applications reusable, and share them on Docker Hub
Usage:

  # create a `docker-compose.yaml` and `hello.dockerapp`
  $ docker-app init --single-file hello

  # generate final docker-compose.yaml
  $ docker-app render | docker-compose -f - up

  # share the .dockerapp
  $ docker-app push --namespace ${myNamespace} --tag latest

Read the source code:

Loader

metadata, compose, settings = readFile(filePath).Split("---")
return NewApp([metadata, compose, settings], options)

Render:

# pseudo code
composeConfigFiles = Load(app.MetadataRaw(), settings.WithPrefix("app"))
    .FromFlatten(env)
    .Merge(app.Settings(), metaPrefixed, envSettings)

render(composeConfigFiles)

Schema Check

gojsonschema.Validate(
    gojsonschema.NewStringLoader(string(schemaData)),
    gojsonschema.NewGoLoader(config)
)

What is CNAB?

www.morethanseven.net

Formalisation of a pattern we’ve seen for multi-toolchain installers. The multi-toolchain here especially means a matrix of DSL/Cloud-specific/YAML choices.
Commoditization of the packaging part of the infrastructure-as-code stack. It's like Puppet and the Forge, Chef and the Supermarket, Ansible and Ansible Galaxy, Terraform and the Module Registry, you name it. CNAB commoditize them and provide tools for anyone implementing something similar.

CNAB Spec

Specification

Goal: Bundling, installing, and managing distributed applications, that are by design, cloud agnostic.

Concepts:

CNAB: Cloud Native Application Bundles, a format.
bundle.json: It defines the metadata of the package.
bundle.cnab: A bundle.json with signing information.
thin bundle: A package with only the definitions of the package.
thick bundle: A package with the definitions of the package, and the decoded data of all images.
invocation image: The installer of the package. It installs dependencies to the host environment or platform.
execution images: The dependent images of the application.
run tool: the entry point of the invocation image.
claim: a history of the installation of the bundle.
repository: a warehouse for storing bundles.

rakanalh/scheduler

GitHub Project URL

This is a Task scheduler for Golang. It executes callbacks (goroutines) after a pre-defined amount of time.

Basic usage:

// Instantiate
noOpStorage := storage.NewNoOpStorage()
s := scheduler.New(storage)

// Define your own function.
func MyFunc(arg1 string, arg2 string)

// Run after 5 seconds
taskID := s.RunAfter(5*time.Second, MyFunc, "Hello", "World")

// Run every minute.
taskID := s.RunEvery(1 * time.Minute, MyFunc, "Hello", "World")

Read the source code:

The data structure Task defines what will be executed. From the task, we can instruct how to run the task and derive when the next time the task will run.

type Schedule struct {
    IsRecurring bool
    LastRun     time.Time
    NextRun     time.Time
    Duration    time.Duration
}

type Task struct {
    Schedule
    Func   FunctionMeta
    Params []Param
}

The package storage adds, fetches, removes tasks, performing on certain storage. Noop means keeping in-memory.

The data structure Scheduler is the runtime of scheduling all tasks.

type Scheduler struct {
    funcRegistry *task.FuncRegistry
    stopChan     chan bool
    tasks        map[task.ID]*task.Task
    taskStore    storeBridge
}

The entry point of Scheduler is a loop running forever. It tries to run all pending jobs every second. The pending jobs are from the storage selecting those supposed to run in this second.

// entryPoint:
go func() {
    ticker := time.NewTicker(1 * time.Second)
    for {
        select {
        case <-ticker.C:
            scheduler.runPending()
        case <-sigChan:
            scheduler.stopChan <- true
        case <-scheduler.stopChan:
            close(scheduler.stopChan)
        }
    }
}

...

// runPending:
for _, task := range scheduler.tasks {
    if task.IsDue() {
        go task.Run();
        if !task.IsRecurring {
            _ = scheduler.taskStore.Remove(task)
            delete(scheduler.tasks, task.Hash())
        }
    }
}

Refactor Large Functions

robert.muth.org

Strategies For Breaking Up Functions:

Closures and loops are good candidates to be refactored.
Pattern like do_x(), do_y() can be refactored.

  // do x
  ...

  // do y
  ...

Long Parameter Lists: Bundle pass through parameters.by creating structs.
Complex if: move the predicate to a function.
Early Out Programming.

  if !xxx:
      return

  if !yyy:
      return

  // main code

Refactoring is a good occasion to make sure that all the code inside the function ends up at roughly the same high level of abstraction.
Make small changes each time.

Original post: https://enqueuezero.com/readings/2018-51.html

DEV Community

EnqueueZero Techshack 2018-51

EnqueueZero Techshack 2018-51

Status Site

Availability

5 ways Facebook improved compression at scale with Zstandard

How Does setState Know What to Do?

Scheduler Algorithm in Kubernetes

The Kubernetes Scheduler

The DRY Principle: Its Cost Explained with Examples

Events, the DNA of Kubernetes

What is HTTP/3 ?

Microservices are for humans, not machines

Does the Service Mesh spell the end for Middleware?

Visualizing Systems with Statemaps

Airbnb SOA Migration

Evolution of Application Data Caching: From RAM to SSD

Implementing the Netflix Media Database

Loki

Docker App

What is CNAB?

CNAB Spec

rakanalh/scheduler

Refactor Large Functions

Latest comments (0)