<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Dzyoba</title>
    <description>The latest articles on DEV Community by Alex Dzyoba (@dzeban).</description>
    <link>https://dev.to/dzeban</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F83314%2F24678d60-d25c-4970-9c7b-9caad4f83f37.jpeg</url>
      <title>DEV Community: Alex Dzyoba</title>
      <link>https://dev.to/dzeban</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dzeban"/>
    <language>en</language>
    <item>
      <title>Envoy first impression</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Fri, 25 Jan 2019 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/envoy-first-impression-3871</link>
      <guid>https://dev.to/dzeban/envoy-first-impression-3871</guid>
      <description>&lt;p&gt;When I was doing &lt;a href="https://alex.dzyoba.com/blog/nginx-mirror/"&gt;traffic mirroring with nginx&lt;/a&gt; I’ve stumbled upon a surprising problem – nginx was delaying original request if mirror backend was slow. This is really bad because you expect that mirroring is “fire and forget”. Anyway, I’ve solved this by mirroring only part of the traffic but this drove me to find another proxy that could have mirror traffic without such problems. This is when I finally found time and energy to look into &lt;a href="https://www.envoyproxy.io/"&gt;Envoy&lt;/a&gt; – I’ve heard a lot of great things about it and always wanted to get my hands dirty with it.&lt;/p&gt;

&lt;p&gt;Just in case you’ve never heard about it – Envoy is a proxy server that is most commonly used in a service mesh scenario but it’s also can be an edge proxy.&lt;/p&gt;

&lt;p&gt;In this post, I will look only for &lt;strong&gt;edge proxy scenario&lt;/strong&gt; because I’ve never maintained service mesh. Keep that use case in mind. Also, I will inevitably compare Envoy to nginx because that’s what I know and use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s great about Envoy
&lt;/h2&gt;

&lt;p&gt;The main reason why I wanted to try Envoy was its several compelling features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Advanced load balancing policies&lt;/li&gt;
&lt;li&gt;Active checks&lt;/li&gt;
&lt;li&gt;Extensibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s unpack that list!&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;Observability is one of the most thorough features in Envoy. One of its design principles is to provide the transparency in network communication given how complex modern systems is built with all this microservices madness.&lt;/p&gt;

&lt;p&gt;Out of the box it provides &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/statistics"&gt;lots of metrics&lt;/a&gt; for various metrics system including Prometheus.&lt;/p&gt;

&lt;p&gt;To get that kind of insight in nginx you have to buy &lt;a href="https://www.nginx.com/products/nginx/live-activity-monitoring"&gt;nginx plus&lt;/a&gt; or use VTSmodule, thus compiling nginx on your own. Hopefully, my project &lt;a href="https://github.com/alexdzyoba/nginx-vts-build"&gt;nginx-vts-build&lt;/a&gt; will help – I’m building nginx with VTS module as a drop-in replacement for stock nginx with systemd service and basic configs. Think about it as nginx distro. Currently, it had only one release for Debian 9 but I’m open for suggestions. If you have a feature request, please let me know. But let’s get back to Envoy.&lt;/p&gt;

&lt;p&gt;In addition to metrics, Envoy &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/tracing"&gt;can be integrated with distributed tracing systems&lt;/a&gt; like Jaeger.&lt;/p&gt;

&lt;p&gt;And finally, it can &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/operations/traffic_capture"&gt;capture the traffic&lt;/a&gt; for further analysis with wireshark.&lt;/p&gt;

&lt;p&gt;I’ve only looked at Prometheus metrics and they are quite nice!&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced load balancing
&lt;/h3&gt;

&lt;p&gt;Load balancing in Envoy is &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/load_balancers"&gt;very feature-rich&lt;/a&gt;. Not only it supports round-robin, weighted and random policies but also load balancing using consistent hashing algorithms like ketama and maglev. The point of the latter is fewer changes in traffic patterns in case of rebalancing in &lt;br&gt;
the upstream cluster.&lt;/p&gt;

&lt;p&gt;Again, you can get the same &lt;a href="https://www.nginx.com/products/nginx/load-balancing"&gt;advanced features in nginx&lt;/a&gt; but only if you pay for nginx plus.&lt;/p&gt;
&lt;h3&gt;
  
  
  Active checks
&lt;/h3&gt;

&lt;p&gt;To &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/operations/traffic_capture"&gt;check the health&lt;/a&gt; of the upstream endpoints Envoy will actively send the request and expect the valid answer so this endpoint will remain in the upstream cluster. This is a very nice feature that open source nginx lacks (but &lt;a href="https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/#active-health-checks"&gt;nginx plus has&lt;/a&gt;).&lt;/p&gt;
&lt;h3&gt;
  
  
  Extensibility
&lt;/h3&gt;

&lt;p&gt;You can configure Envoy as a &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/redis"&gt;Redis proxy&lt;/a&gt;, &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/dynamo"&gt;DynamoDB filter&lt;/a&gt;, &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/mongo"&gt;MongoDB filter&lt;/a&gt;, &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/grpc"&gt;grpc proxy&lt;/a&gt;, &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/mysql_proxy_filter"&gt;MySQL filter&lt;/a&gt;, &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/thrift_proxy_filter"&gt;Thrift filter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is not a killer feature, imho, given that most of these protocols support is experimental but anyway it’s nice to have and shows that Envoy is extensible.&lt;/p&gt;

&lt;p&gt;It also supports &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/lua_filter#config-http-filters-lua"&gt;Lua scripting out of the box&lt;/a&gt;. For nginx you have to use &lt;a href="https://openresty.org/"&gt;OpenResty&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  What’s not so great about Envoy
&lt;/h2&gt;

&lt;p&gt;The features above alone make a very good reason to use Envoy. However, I found a few things that keep me from switching to Envoy from nginx:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No caching&lt;/li&gt;
&lt;li&gt;No static content serving&lt;/li&gt;
&lt;li&gt;Lack of flexible configuration&lt;/li&gt;
&lt;li&gt;Docker-only packaging&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  No caching
&lt;/h3&gt;

&lt;p&gt;Envoy &lt;a href="https://github.com/envoyproxy/envoy/issues/868"&gt;doesn’t support caching of responses&lt;/a&gt;. This is a must-have feature for the edge proxy and nginx implements it really good.&lt;/p&gt;
&lt;h3&gt;
  
  
  No static content serving
&lt;/h3&gt;

&lt;p&gt;While Envoy does networking really well, it doesn’t access filesystem apart from initial config file loading and &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/runtime#config-runtime"&gt;runtime configuration handling&lt;/a&gt;. If you thought about serving static files like frontend things (js, html, css) then you’re out of luck - Envoy doesn’t support that. Nginx, again, does it very well.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lack of flexible configuration
&lt;/h3&gt;

&lt;p&gt;Envoy is configured via YAML and for me its configuration feels very explicit though I think it’s actually a good thing – explicit is better than implicit. But I feel that Envoy configuration is bounded by features specifically implemented in Envoy. &lt;em&gt;Maybe it’s a lack of experience with Envoy and old habits&lt;/em&gt; but I feel that in nginx with maps, rewrite module (with &lt;code&gt;if&lt;/code&gt; directive) and other nice modules I have a very flexible config system that allows me to implement anything. The cost of this flexibility is, of course, a good portion of complexity – nginx configuration requires some learning and practice but in my opinion it’s worth it.&lt;/p&gt;

&lt;p&gt;Nevertheless, Envoy supports dynamic configuration, though it’s not like you can change some configuration part via REST call, it’s about the discovery of configuration settings – that’s what the whole XDS protocol is all about with its EDS, CDS, RDS and what-not-DS.&lt;/p&gt;

&lt;p&gt;Citing &lt;a href="https://github.com/envoyproxy/data-plane-api/blob/master/XDS_PROTOCOL.md"&gt;docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Envoy discovers its various dynamic resources via the filesystem or by &lt;em&gt;querying&lt;/em&gt; one or more management servers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Emphasis is mine – I wanted to note that you have to provide a server that will respond to the Envoy discovery (XDS) requests.&lt;/p&gt;

&lt;p&gt;However, there is no ready-made solution that implements Envoys’ XDS protocol. There was a &lt;a href="https://github.com/turbinelabs/rotor"&gt;rotor&lt;/a&gt; but the company behind it shut down so the project is mostly dead.&lt;/p&gt;

&lt;p&gt;There is an Istio but it’s a monster I don’t want to touch right now. Also, if you’re on Kubernetes then there is a &lt;a href="https://github.com/heptio/contour"&gt;Heptio Contour&lt;/a&gt;, but not everybody needs and usesKubernetes.&lt;/p&gt;

&lt;p&gt;In the end, you could implement your own XDS service using &lt;a href="https://github.com/envoyproxy/go-control-plane"&gt;go-control-plane stubs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But that’s doesn’t seem to be used. What I saw most people do is using DNS forEDS and CDS. Especially, remembering that Consul has DNS interface, it seems that we can use Consul for dynamically providing the list of hosts to the Envoy. This isn’t big news because I can (and do) use Consul to provide the list of backends for nginx by using DNS name in &lt;code&gt;proxy_pass&lt;/code&gt; and &lt;code&gt;resolver&lt;/code&gt; directive.&lt;/p&gt;

&lt;p&gt;Also, &lt;a href="https://www.consul.io/docs/connect/proxies/envoy.html"&gt;Consul Connect supports Envoy&lt;/a&gt; for proxying requests but this is not about Envoy – this is about how awesome Consul is!&lt;/p&gt;

&lt;p&gt;So this whole dynamic configuration thing of Envoy is really confusing and hard to follow because whenever you try to google it you’ll get bombarded with posts about Istio which is distracting.&lt;/p&gt;
&lt;h3&gt;
  
  
  Docker-only packaging
&lt;/h3&gt;

&lt;p&gt;This is a minor thing but it just annoys me. Also, I don’t like that Docker images don’t have tags with versions. Maybe it’s intended so you always run the latest version but it seems very strange.&lt;/p&gt;
&lt;h3&gt;
  
  
  Conclusion on not-so-great parts
&lt;/h3&gt;

&lt;p&gt;In the end, I’m not saying Envoy is bad in any way – from my point of view it just has a different focus on advanced proxying and out of process service mesh data plane. The edge proxy part is just a bonus that is suitable in some but not many situations.&lt;/p&gt;
&lt;h2&gt;
  
  
  What about mirroring?
&lt;/h2&gt;

&lt;p&gt;With that being said let’s see Envoy in practice and repeat mirroring experiments from my previous post.&lt;/p&gt;

&lt;p&gt;Here are 2 minimal configs – one for nginx and the other Envoy. Both doing the same – simply proxying requests to some backend service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# nginx proxy config

upstream backend {
    server backend.local:10000;
}

server {
    server_name proxy.local;
    listen 8000;

    location / {
        proxy_pass http://backend;
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Envoy proxy config&lt;/span&gt;
&lt;span class="na"&gt;static_resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;listeners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;listener_0&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;socket_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
        &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0&lt;/span&gt;
        &lt;span class="na"&gt;port_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8001&lt;/span&gt;
    &lt;span class="na"&gt;filter_chains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;envoy.http_connection_manager&lt;/span&gt;
        &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;stat_prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress_http&lt;/span&gt;
          &lt;span class="na"&gt;route_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;virtual_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local_service&lt;/span&gt;
              &lt;span class="na"&gt;domains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
              &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
                &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
          &lt;span class="na"&gt;http_filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;envoy.router&lt;/span&gt;
  &lt;span class="na"&gt;clusters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STATIC&lt;/span&gt;
    &lt;span class="na"&gt;connect_timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1s&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;socket_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;port_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;They perform identical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ # Load test nginx
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:    10.0006 secs
  Slowest:  0.0229 secs
  Fastest:  0.0002 secs
  Average:  0.0004 secs
  Requests/sec: 996.7418

  Total data:   36881600 bytes
  Size/request: 3700 bytes

Response time histogram:
  0.000 [1] |
  0.002 [9963]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [3] |
  0.007 [0] |
  0.009 [0] |
  0.012 [0] |
  0.014 [0] |
  0.016 [0] |
  0.018 [0] |
  0.021 [0] |
  0.023 [1] |

...

Status code distribution:
  [200] 9968 responses


$ # Load test Envoy
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001

Summary:
  Total:    10.0006 secs
  Slowest:  0.0307 secs
  Fastest:  0.0003 secs
  Average:  0.0007 secs
  Requests/sec: 996.1445

  Total data:   36859400 bytes
  Size/request: 3700 bytes

Response time histogram:
  0.000 [1] |
  0.003 [9960]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 [0] |
  0.009 [0] |
  0.012 [0] |
  0.015 [0] |
  0.019 [0] |
  0.022 [0] |
  0.025 [0] |
  0.028 [0] |
  0.031 [1] |

...

Status code distribution:
  [200] 9962 responses

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Anyway, let’s check the crucial part – mirroring to the backend with a delay. A quick reminder – nginx, in that case, will throttle original request thus affecting your production users.&lt;/p&gt;

&lt;p&gt;Here is the mirroring config for Envoy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Envoy mirroring config&lt;/span&gt;
&lt;span class="na"&gt;static_resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;listeners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;listener_0&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;socket_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
        &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0&lt;/span&gt;
        &lt;span class="na"&gt;port_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8001&lt;/span&gt;
    &lt;span class="na"&gt;filter_chains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;envoy.http_connection_manager&lt;/span&gt;
        &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;stat_prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress_http&lt;/span&gt;
          &lt;span class="na"&gt;route_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;virtual_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local_service&lt;/span&gt;
              &lt;span class="na"&gt;domains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
              &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
                &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
                  &lt;span class="na"&gt;request_mirror_policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mirror&lt;/span&gt;
          &lt;span class="na"&gt;http_filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;envoy.router&lt;/span&gt;
  &lt;span class="na"&gt;clusters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STATIC&lt;/span&gt;
    &lt;span class="na"&gt;connect_timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1s&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;socket_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;port_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mirror&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STATIC&lt;/span&gt;
    &lt;span class="na"&gt;connect_timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1s&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;socket_address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;port_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Basically, we’ve added &lt;code&gt;request_mirror_policy&lt;/code&gt; to the main route and defined the cluster for mirroring. Let’s load test it!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001

Summary:
  Total:    10.0012 secs
  Slowest:  0.0046 secs
  Fastest:  0.0003 secs
  Average:  0.0008 secs
  Requests/sec: 997.6801

  Total data:   36918600 bytes
  Size/request: 3700 bytes

Response time histogram:
  0.000 [1] |
  0.001 [2983]  |■■■■■■■■■■■■■■■■■
  0.001 [6916]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [72]    |
  0.002 [2] |
  0.002 [0] |
  0.003 [0] |
  0.003 [3] |
  0.004 [0] |
  0.004 [0] |
  0.005 [1] |

...

Status code distribution:
  [200] 9978 responses

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Zero errors and amazing latency! This is a victory and it proves that Envoy’smirroring is truly “fire and forget”!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Envoy’s networking is of exceptional quality – its mirroring is well thought, its load balancing is very advanced and I like the active health check feature.&lt;/p&gt;

&lt;p&gt;I’m not convinced to use it in the edge proxy scenario because you might need features of a web server like caching, content serving and advanced configuration.&lt;/p&gt;

&lt;p&gt;As for the service mesh – I’ll surely evaluate Envoy for that when the opportunity arises, so stay tuned – subscribe to the &lt;a href="https://alex.dzyoba.com/feed"&gt;blog Atom feed&lt;/a&gt; and check &lt;a href="https://twitter.com/AlexDzyoba/"&gt;my twitter @AlexDzyoba&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That’s it for now, till the next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>nginx mirroring tips and tricks</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Mon, 14 Jan 2019 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/nginx-mirroring-tips-and-tricks-17c2</link>
      <guid>https://dev.to/dzeban/nginx-mirroring-tips-and-tricks-17c2</guid>
      <description>&lt;p&gt;Lately, I’ve been playing with nginx and its relatively new &lt;a href="http://nginx.org/en/docs/http/ngx_http_mirror_module.htm"&gt;&lt;strong&gt;mirror&lt;/strong&gt; module&lt;/a&gt; which appeared in 1.13.4. The mirror module allows you to copy requests to another backend while ignoring answers from it. The example use cases for this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-production testing by observing how your new system handle real production traffic&lt;/li&gt;
&lt;li&gt;Logging of requests for security analysis. This is &lt;a href="https://docs.wallarm.com/en/admin-en/mirror-traffic-en.htm"&gt;what Wallarm tool do&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Copying requests for data science research&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ve used it for pre-production testing of the new rewritten system to see how well (if at all ;-) it can handle the production workload. There are some non-obvious problems and tips that I didn’t find when I started this journey and now I wanted to share it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic setup
&lt;/h2&gt;

&lt;p&gt;Let’s begin with a simple setup. Say, we have some backend that handles production workload and we put a proxy in front of it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yVb-uywo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://alex.dzyoba.com/img/nginx-mirror-basic-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yVb-uywo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://alex.dzyoba.com/img/nginx-mirror-basic-setup.png" alt="nginx basic setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the nginx config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upstream backend {
    server backend.local:10000;
}

server {
    server_name proxy.local;
    listen 8000;

    location / {
        proxy_pass http://backend;
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There are 2 parts – backend and proxy. The proxy (nginx) is listening on port 8000 and just passing requests to the backend on port 10000. Nothing fancy, but let’s do a quick load test to see how it performs. I’m using &lt;a href="https://github.com/rakyll/hey"&gt;&lt;code&gt;hey&lt;/code&gt;tool&lt;/a&gt; because it’s simple and allows generating constant load instead of bombarding as hard as possible like many other tools do (wrk, apache benchmark, siege).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -n 100000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:    10.0016 secs
  Slowest:  0.0225 secs
  Fastest:  0.0003 secs
  Average:  0.0005 secs
  Requests/sec: 995.8393

  Total data:   6095520 bytes
  Size/request: 612 bytes

Response time histogram:
  0.000 [1] |
  0.003 [9954]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [4] |
  0.007 [0] |
  0.009 [0] |
  0.011 [0] |
  0.014 [0] |
  0.016 [0] |
  0.018 [0] |
  0.020 [0] |
  0.022 [1] |

Latency distribution:
  10% in 0.0003 secs
  25% in 0.0004 secs
  50% in 0.0005 secs
  75% in 0.0006 secs
  90% in 0.0007 secs
  95% in 0.0007 secs
  99% in 0.0009 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0003 secs, 0.0225 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0008 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0003 secs
  resp wait:    0.0004 secs, 0.0002 secs, 0.0198 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0012 secs

Status code distribution:
  [200] 9960 responses
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Good, most of the requests are handled in less than a millisecond and there are no errors – that’s our baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic mirroring
&lt;/h2&gt;

&lt;p&gt;Now, let’s put another test backend and mirror traffic to it&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DUI8Ed7x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://alex.dzyoba.com/img/nginx-mirror-mirror-setup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DUI8Ed7x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://alex.dzyoba.com/img/nginx-mirror-mirror-setup.png" alt="nginx mirror setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The basic mirroring is configured like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upstream backend {
    server backend.local:10000;
}

upstream test_backend {
    server test.local:20000;
}

server {
    server_name proxy.local;
    listen 8000;

    location / {
        mirror /mirror;
        proxy_pass http://backend;
    }

    location = /mirror {
        internal;
        proxy_pass http://test_backend$request_uri;
    }

}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We add &lt;code&gt;mirror&lt;/code&gt; directive to mirror requests to the internal location and define that internal location. In that internal location we can do whatever nginx allows us to do but for now we just simply proxy pass all requests.&lt;/p&gt;

&lt;p&gt;Let’s load test it again to check how mirroring affects the performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -n 100000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:    10.0010 secs
  Slowest:  0.0042 secs
  Fastest:  0.0003 secs
  Average:  0.0005 secs
  Requests/sec: 997.3967

  Total data:   6104700 bytes
  Size/request: 612 bytes

Response time histogram:
  0.000 [1] |
  0.001 [9132]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [792]   |■■■
  0.001 [43]    |
  0.002 [3] |
  0.002 [0] |
  0.003 [2] |
  0.003 [0] |
  0.003 [0] |
  0.004 [1] |
  0.004 [1] |

Latency distribution:
  10% in 0.0003 secs
  25% in 0.0004 secs
  50% in 0.0005 secs
  75% in 0.0006 secs
  90% in 0.0007 secs
  95% in 0.0008 secs
  99% in 0.0010 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0003 secs, 0.0042 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0009 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait:    0.0004 secs, 0.0002 secs, 0.0041 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0021 secs

Status code distribution:
  [200] 9975 responses
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;It’s pretty much the same – millisecond latency and no errors. And that’s good because it proves that mirroring itself doesn’t affect original requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mirroring to buggy backend
&lt;/h2&gt;

&lt;p&gt;That’s all nice and dandy but what if mirror backend has some bugs and sometimes replies with errors? What would happen to the original requests?&lt;/p&gt;

&lt;p&gt;To test this I’ve made a &lt;a href="https://github.com/dzeban/mirror-backend"&gt;trivial Go service&lt;/a&gt; that can inject errors randomly. Let’s launch it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ mirror-backend -errors
2019/01/13 14:43:12 Listening on port 20000, delay is 0, error injecting is true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and see what load testing will show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -n 100000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:    10.0008 secs
  Slowest:  0.0027 secs
  Fastest:  0.0003 secs
  Average:  0.0005 secs
  Requests/sec: 998.7205

  Total data:   6112656 bytes
  Size/request: 612 bytes

Response time histogram:
  0.000 [1] |
  0.001 [7388]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [2232]  |■■■■■■■■■■■■
  0.001 [324]   |■■
  0.001 [27]    |
  0.002 [6] |
  0.002 [2] |
  0.002 [3] |
  0.002 [2] |
  0.002 [0] |
  0.003 [3] |

Latency distribution:
  10% in 0.0003 secs
  25% in 0.0003 secs
  50% in 0.0004 secs
  75% in 0.0006 secs
  90% in 0.0007 secs
  95% in 0.0008 secs
  99% in 0.0009 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0003 secs, 0.0027 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0008 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:    0.0004 secs, 0.0002 secs, 0.0026 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0006 secs

Status code distribution:
  [200] 9988 responses
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Nothing changed at all! And that’s great because errors in the mirror backend don’t affect the main backend. nginx mirror module ignores responses to the mirror subrequests so this behavior is nice and intended.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mirroring to a slow backend
&lt;/h2&gt;

&lt;p&gt;But what if our mirror backend is not returning errors but just plain slow? How original requests will work? Let’s find out!&lt;/p&gt;

&lt;p&gt;My mirror backend has an option to delay every request by configured amount of seconds. Here I’m launching it with a 1 second delay:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ mirror-backend -delay 1
2019/01/13 14:50:39 Listening on port 20000, delay is 1, error injecting is false
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So let’s see what load test show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -n 100000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:    10.0290 secs
  Slowest:  0.0023 secs
  Fastest:  0.0018 secs
  Average:  0.0021 secs
  Requests/sec: 1.9942

  Total data:   6120 bytes
  Size/request: 612 bytes

Response time histogram:
  0.002 [1] |■■■■■■■■■■
  0.002 [0] |
  0.002 [1] |■■■■■■■■■■
  0.002 [0] |
  0.002 [0] |
  0.002 [0] |
  0.002 [1] |■■■■■■■■■■
  0.002 [1] |■■■■■■■■■■
  0.002 [0] |
  0.002 [4] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [2] |■■■■■■■■■■■■■■■■■■■■

Latency distribution:
  10% in 0.0018 secs
  25% in 0.0021 secs
  50% in 0.0022 secs
  75% in 0.0023 secs
  90% in 0.0023 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0007 secs, 0.0018 secs, 0.0023 secs
  DNS-lookup:   0.0003 secs, 0.0002 secs, 0.0006 secs
  req write:    0.0001 secs, 0.0001 secs, 0.0002 secs
  resp wait:    0.0011 secs, 0.0007 secs, 0.0013 secs
  resp read:    0.0002 secs, 0.0001 secs, 0.0002 secs

Status code distribution:
  [200] 10 responses

Error distribution:
  [10]  Get http://proxy.local:8000: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What? 1.9 rps? Where is my 1000 rps? We’ve got errors? What’s happening?&lt;/p&gt;

&lt;p&gt;Let me explain how mirroring in nginx works.&lt;/p&gt;

&lt;h3&gt;
  
  
  How mirroring in nginx works
&lt;/h3&gt;

&lt;p&gt;When the request is coming to nginx and if mirroring is enabled, nginx will create a mirror subrequest and do what mirror location specifies – in our case, it will send it to the mirror backend.&lt;/p&gt;

&lt;p&gt;But the thing is that subrequest is linked to the original request, so &lt;em&gt;as far as I understand&lt;/em&gt; unless that mirror subrequest is not finished the original requests will throttle.&lt;/p&gt;

&lt;p&gt;That’s why we get ~2 rps in the previous test – &lt;code&gt;hey&lt;/code&gt; sent 10 requests, got responses, sent next 10 requests but they stalled because previous mirror subrequests were delayed and then timeout kicked in and errored the last 10 requests.&lt;/p&gt;

&lt;p&gt;If we increase the timeout in hey to, say, 10 seconds we will receive no errors and 1 rps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hey -z 10s -q 1000 -n 100000 -c 1 -t 10 http://proxy.local:8000

Summary:
  Total:    10.0197 secs
  Slowest:  1.0018 secs
  Fastest:  0.0020 secs
  Average:  0.9105 secs
  Requests/sec: 1.0978

  Total data:   6732 bytes
  Size/request: 612 bytes

Response time histogram:
  0.002 [1] |■■■■
  0.102 [0] |
  0.202 [0] |
  0.302 [0] |
  0.402 [0] |
  0.502 [0] |
  0.602 [0] |
  0.702 [0] |
  0.802 [0] |
  0.902 [0] |
  1.002 [10]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Latency distribution:
  10% in 1.0011 secs
  25% in 1.0012 secs
  50% in 1.0016 secs
  75% in 1.0016 secs
  90% in 1.0018 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0020 secs, 1.0018 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0005 secs
  req write:    0.0001 secs, 0.0000 secs, 0.0002 secs
  resp wait:    0.9101 secs, 0.0008 secs, 1.0015 secs
  resp read:    0.0002 secs, 0.0001 secs, 0.0003 secs

Status code distribution:
  [200] 11 responses
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So the point here is that &lt;strong&gt;if mirrored subrequests are slow then the original requests will be throttled&lt;/strong&gt;. I don’t know how to fix this but I know the workaround – mirror only some part of the traffic. Let me show you how.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mirroring part of the traffic
&lt;/h2&gt;

&lt;p&gt;If you’re not sure that mirror backend can handle the original load you can mirror only some part of the traffic – for example, 10%.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mirror&lt;/code&gt; directive is not configurable and replicates all requests to the mirror location so it’s not obvious how to do this. The key point in achieving this is the internal mirror location. If you remember I’ve said that you can anything to mirrored requests in its location. So here is how I did this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 1  upstream backend {
 2      server backend.local:10000;
 3  }
 4  
 5  upstream test_backend {
 6      server test.local:20000;
 7  }
 8  
 9  split_clients $remote_addr $mirror_backend {
10      50% test_backend;
11      *   "";
12  }
13  
14  server {
15      server_name proxy.local;
16      listen 8000;
17  
18      access_log /var/log/nginx/proxy.log;
19      error_log /var/log/nginx/proxy.error.log info;
20  
21      location / {
22          mirror /mirror;
23          proxy_pass http://backend;
24      }
25  
26      location = /mirror {
27          internal;
28          if ($mirror_backend = "") {
29              return 400;
30          }
31  
32          proxy_pass http://$mirror_backend$request_uri;
33      }
34  
35  }
36  
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;First of all, in mirror location we proxy pass to the upstream that is taken from variable &lt;code&gt;$mirror_backend&lt;/code&gt; (line 32). This variable is set in &lt;code&gt;split_client&lt;/code&gt;block (lines 9-12) based on client remote address. What &lt;code&gt;split_client&lt;/code&gt; does is it sets right variable value based on left variable distribution. In our case, we look at requests remote address (&lt;code&gt;$remote_addr&lt;/code&gt; variable) and for 50% of remote addresses we set&lt;code&gt;$mirror_backend&lt;/code&gt; to the &lt;code&gt;test_backend&lt;/code&gt;, for other requests it’s set to empty string. Finally, the partial part is performed in mirror location – if&lt;code&gt;$mirror_backend&lt;/code&gt; variable is empty we reject that mirror subrequest, otherwise we&lt;code&gt;proxy_pass&lt;/code&gt; it. Remember that failure in mirror subrequests doesn’t affect original requests so it’s safe to drop request with error status.&lt;/p&gt;

&lt;p&gt;The beauty of this solution is that you can split traffic for mirroring based on any variable or combination. If you want to really differentiate your users then remote address may not be the best split key – user may use many IPs or change them. In that case, you’re better off using some user-sticky key like API key. For mirroring 50% of traffic based on &lt;code&gt;apikey&lt;/code&gt; query parameter we just change key in &lt;code&gt;split_client&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;split_clients $arg_apikey $mirror_backend {
    50% test_backend;
    * "";
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;When we’ll query apikeys from 1 to 20 only half of it (11) will be mirrored. Here is the curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ for i in {1..20};do curl -i "proxy.local:8000/?apikey=${i}" ;done
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and here is the log of mirror backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
2019/01/13 22:34:34 addr=127.0.0.1:47224 host=test_backend uri="/?apikey=1"
2019/01/13 22:34:34 addr=127.0.0.1:47230 host=test_backend uri="/?apikey=2"
2019/01/13 22:34:34 addr=127.0.0.1:47240 host=test_backend uri="/?apikey=4"
2019/01/13 22:34:34 addr=127.0.0.1:47246 host=test_backend uri="/?apikey=5"
2019/01/13 22:34:34 addr=127.0.0.1:47252 host=test_backend uri="/?apikey=6"
2019/01/13 22:34:34 addr=127.0.0.1:47262 host=test_backend uri="/?apikey=8"
2019/01/13 22:34:34 addr=127.0.0.1:47272 host=test_backend uri="/?apikey=10"
2019/01/13 22:34:34 addr=127.0.0.1:47278 host=test_backend uri="/?apikey=11"
2019/01/13 22:34:34 addr=127.0.0.1:47288 host=test_backend uri="/?apikey=13"
2019/01/13 22:34:34 addr=127.0.0.1:47298 host=test_backend uri="/?apikey=15"
2019/01/13 22:34:34 addr=127.0.0.1:47308 host=test_backend uri="/?apikey=17"
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And the most awesome thing is that partitioning in &lt;code&gt;split_client&lt;/code&gt; is consistent – requests with &lt;code&gt;apikey=1&lt;/code&gt; will always be mirrored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So this was my experience with nginx mirror module so far. I’ve shown you how to simply mirror all of the traffic, how to mirror part of the traffic with the help of &lt;code&gt;split_client&lt;/code&gt; module. I’ve also covered error handling and non-obvious problem when normal requests are throttled in case of slow mirror backend.&lt;/p&gt;

&lt;p&gt;Hope you’ve enjoyed it! Subscribe to the &lt;a href="https://alex.dzyoba.com/feed"&gt;Atom feed&lt;/a&gt;. I also post &lt;a href="https://twitter.com/AlexDzyoba/"&gt;on twitter @AlexDzyoba&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That’s it for now, till the next time!&lt;/p&gt;

</description>
      <category>nginx</category>
      <category>devops</category>
    </item>
    <item>
      <title>tzconv - convert time between timezones</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Wed, 15 Aug 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/tzconv---convert-time-between-timezones-53d</link>
      <guid>https://dev.to/dzeban/tzconv---convert-time-between-timezones-53d</guid>
      <description>&lt;p&gt;I made a nice little thing called &lt;code&gt;tzconv&lt;/code&gt; – &lt;a href="https://github.com/alexdzyoba/tzconv"&gt;https://github.com/alexdzyoba/tzconv&lt;/a&gt;. It’s a CLI tool that converts time between timezones and it’s useful (at least for me) when you investigate done incident and need to match times.&lt;/p&gt;

&lt;p&gt;Imagine, you had an incident that happened at 11:45 your local time but your logs in ELK or Splunk are in UTC. So, what time was 11:45 in UTC?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ tzconv utc 11:45
08:45
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Boom! You got it!&lt;/p&gt;

&lt;p&gt;You can add the third parameter to convert time from specific timezone, not from your local. For instance, your alert system sent you an email with a central European time and your server log timestamps are in Eastern time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ tzconv neyork 20:20 cet
14:20

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note, that I’ve mistyped New York and it still worked. That’s because locations are not matched exactly but fuzzy searched!&lt;/p&gt;

&lt;p&gt;You can find more examples in the &lt;a href="https://github.com/alexdzyoba/tzconv/blob/master/README.md#examples"&gt;project README&lt;/a&gt;. Feel free to contribute, I’ve got a couple of things I would like to see implemented – check the &lt;a href="https://github.com/alexdzyoba/tzconv/issues"&gt;issues page&lt;/a&gt;. The tool itself is written in Go and quite simple yet useful.&lt;/p&gt;

&lt;p&gt;That’s it for now, till the next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Peculiarities of c10k client</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Wed, 04 Jul 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/peculiarities-of-c10k-client-1ain</link>
      <guid>https://dev.to/dzeban/peculiarities-of-c10k-client-1ain</guid>
      <description>&lt;p&gt;There is a well-known problem called &lt;a href="https://en.wikipedia.org/wiki/C10k_problem"&gt;c10k&lt;/a&gt;. The essence of it is to handle 10000 concurrent clients on a single server. This problem was conceived &lt;a href="http://www.kegel.com/c10k.html"&gt;in 1999 by Dan Kegel&lt;/a&gt; and at that time it made the industry to rethink the way the web servers were handling connections. Then-state-of-the-art solution to allocate a thread for each client started to leak facing the upcoming web scale. Nginx was born to solve this problem by embracing event-driven I/O model provided by a shiny new &lt;a href="https://en.wikipedia.org/wiki/Epoll"&gt;epoll&lt;/a&gt; system call (in Linux).&lt;/p&gt;

&lt;p&gt;Times were different back then and now we can have a really beefy server with the 10G network, 32 cores and 256 GiB RAM that can easily handle that amount of clients, so c10k is not much of a problem even with threaded I/O. But, anyway, I wanted to check how various solutions like threads and non-blocking async I/O will handle it, so I started to write some silly servers in my &lt;a href="https://github.com/dzeban/c10k"&gt;c10k repo&lt;/a&gt; and then I’ve stuck because I needed some tools to test my implementations.&lt;/p&gt;

&lt;p&gt;Basically, I needed a &lt;strong&gt;c10k client&lt;/strong&gt;. And I actually wrote a couple – one in Go and the other in C with &lt;em&gt;libuv&lt;/em&gt;. I’m going to also write the one in Python 3 with &lt;em&gt;asyncio&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;While I was writing each client I’ve found 2 peculiarities – how to make it bad and how to make it slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to make it bad
&lt;/h2&gt;

&lt;p&gt;By making bad I mean making it really c10k – creating a lot of connections to the server thus saturation its resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Go client
&lt;/h3&gt;

&lt;p&gt;I started with the client in Go and quickly stumbled upon the first roadblock. When I was making 10 concurrent HTTP request with simple &lt;code&gt;"net/http"&lt;/code&gt; requests there were only 2 TCP connections&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ lsof -p $(pgrep go-client) -n -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
go-client 11959 avd cwd DIR 253,0 4096 1183846 /home/avd/go/src/github.com/dzeban/c10k
go-client 11959 avd rtd DIR 253,0 4096 2 /
go-client 11959 avd txt REG 253,0 6240125 1186984 /home/avd/go/src/github.com/dzeban/c10k/go-client
go-client 11959 avd mem REG 253,0 2066456 3151328 /usr/lib64/libc-2.26.so
go-client 11959 avd mem REG 253,0 149360 3152802 /usr/lib64/libpthread-2.26.so
go-client 11959 avd mem REG 253,0 178464 3151302 /usr/lib64/ld-2.26.so
go-client 11959 avd 0u CHR 136,0 0t0 3 /dev/pts/0
go-client 11959 avd 1u CHR 136,0 0t0 3 /dev/pts/0
go-client 11959 avd 2u CHR 136,0 0t0 3 /dev/pts/0
go-client 11959 avd 4u a_inode 0,13 0 12735 [eventpoll]
go-client 11959 avd 8u IPv4 68232 0t0 TCP 127.0.0.1:55224-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 11959 avd 10u IPv4 68235 0t0 TCP 127.0.0.1:55230-&amp;gt;127.0.0.1:80 (ESTABLISHED)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same with &lt;code&gt;ss&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ss -tnp dst 127.0.0.1:80
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:55224 127.0.0.1:80 users:(("go-client",pid=11959,fd=8))
ESTAB 0 0 127.0.0.1:55230 127.0.0.1:80 users:(("go-client",pid=11959,fd=10))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason for this is quite simple – HTTP 1.1 is using persistent connections with TCP keepalive for clients to avoid the overhead of TCP handshake on each HTTP request. Go’s &lt;code&gt;"net/http"&lt;/code&gt; fully implements this logic – it multiplexes multiple requests over a handful of TCP connections. It can be tuned via &lt;a href="https://golang.org/pkg/net/http/#Transport"&gt;&lt;code&gt;Transport&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But I don’t need to tune it, I need to avoid it. And we can avoid it by explicitly creating TCP connection via &lt;code&gt;net.Dial&lt;/code&gt; and then sending a single request over this connection. Here the function that does it and runs concurrently inside a dedicated goroutine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"dial error "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/index.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed to create http request"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"localhost"&lt;/span&gt;

    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed to send http request"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bufio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'\n'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"read error "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s check it’s working&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ lsof -p $(pgrep go-client) -n -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
go-client 12231 avd cwd DIR 253,0 4096 1183846 /home/avd/go/src/github.com/dzeban/c10k
go-client 12231 avd rtd DIR 253,0 4096 2 /
go-client 12231 avd txt REG 253,0 6167884 1186984 /home/avd/go/src/github.com/dzeban/c10k/go-client
go-client 12231 avd mem REG 253,0 2066456 3151328 /usr/lib64/libc-2.26.so
go-client 12231 avd mem REG 253,0 149360 3152802 /usr/lib64/libpthread-2.26.so
go-client 12231 avd mem REG 253,0 178464 3151302 /usr/lib64/ld-2.26.so
go-client 12231 avd 0u CHR 136,0 0t0 3 /dev/pts/0
go-client 12231 avd 1u CHR 136,0 0t0 3 /dev/pts/0
go-client 12231 avd 2u CHR 136,0 0t0 3 /dev/pts/0
go-client 12231 avd 3u IPv4 71768 0t0 TCP 127.0.0.1:55256-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 4u a_inode 0,13 0 12735 [eventpoll]
go-client 12231 avd 5u IPv4 73753 0t0 TCP 127.0.0.1:55258-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 6u IPv4 71769 0t0 TCP 127.0.0.1:55266-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 7u IPv4 71770 0t0 TCP 127.0.0.1:55264-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 8u IPv4 73754 0t0 TCP 127.0.0.1:55260-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 9u IPv4 71771 0t0 TCP 127.0.0.1:55262-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 10u IPv4 71774 0t0 TCP 127.0.0.1:55268-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 11u IPv4 73755 0t0 TCP 127.0.0.1:55270-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 12u IPv4 71775 0t0 TCP 127.0.0.1:55272-&amp;gt;127.0.0.1:80 (ESTABLISHED)
go-client 12231 avd 13u IPv4 73758 0t0 TCP 127.0.0.1:55274-&amp;gt;127.0.0.1:80 (ESTABLISHED)

$ ss -tnp dst 127.0.0.1:80
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:55260 127.0.0.1:80 users:(("go-client",pid=12231,fd=8))
ESTAB 0 0 127.0.0.1:55262 127.0.0.1:80 users:(("go-client",pid=12231,fd=9))
ESTAB 0 0 127.0.0.1:55270 127.0.0.1:80 users:(("go-client",pid=12231,fd=11))
ESTAB 0 0 127.0.0.1:55266 127.0.0.1:80 users:(("go-client",pid=12231,fd=6))
ESTAB 0 0 127.0.0.1:55256 127.0.0.1:80 users:(("go-client",pid=12231,fd=3))
ESTAB 0 0 127.0.0.1:55272 127.0.0.1:80 users:(("go-client",pid=12231,fd=12))
ESTAB 0 0 127.0.0.1:55258 127.0.0.1:80 users:(("go-client",pid=12231,fd=5))
ESTAB 0 0 127.0.0.1:55268 127.0.0.1:80 users:(("go-client",pid=12231,fd=10))
ESTAB 0 0 127.0.0.1:55264 127.0.0.1:80 users:(("go-client",pid=12231,fd=7))
ESTAB 0 0 127.0.0.1:55274 127.0.0.1:80 users:(("go-client",pid=12231,fd=13))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  C client
&lt;/h3&gt;

&lt;p&gt;I also decided to make a C client built on top of libuv for a convenient event loop.&lt;/p&gt;

&lt;p&gt;In my C client, there is no HTTP library so we’re making TCP connections from the start. It works well by creating a connection for each request so it doesn’t have the problem (more like feature :-) of the Go client. But when it finishes reading response it stucks and doesn’t return the control to the event loop until the very long timeout.&lt;/p&gt;

&lt;p&gt;Here is the response reading callback that seems stuck:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;on_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uv_stream_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ssize_t&lt;/span&gt; &lt;span class="n"&gt;nread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;uv_buf_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nread&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nread&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;UV_EOF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"close stream"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;uv_connect_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uv_handle_get_data&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;uv_handle_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;uv_close&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;uv_handle_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;free_close_cb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;return_uv_err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nread&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It appears like we’re stuck here and wait for some (quite long) time until we finally got EOF.&lt;/p&gt;

&lt;p&gt;This &lt;em&gt;“quite long time”&lt;/em&gt; is actually HTTP keepalive timeout set &lt;a href="https://nginx.ru/en/docs/http/ngx_http_core_module.html#keepalive_timeout"&gt;in nginx and by default it’s 75 seconds&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We can control it on the client though with &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Connection"&gt;&lt;code&gt;Connection&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Keep-Alive"&gt;&lt;code&gt;Keep-Alive&lt;/code&gt;&lt;/a&gt;HTTP headers which are part of HTTP 1.1.&lt;/p&gt;

&lt;p&gt;And that’s the only sane solution because on the libuv side I had no way to close the convection – I don’t receive EOF because it is sent only when connection actually closed.&lt;/p&gt;

&lt;p&gt;So what is happening is that my client creates a connection, send a request, nginx replies and then nginx is keeping connection because it waits for the subsequent requests. Tinkering with libuv showed me that and that’s why I love making things in C – you have to dig really deep and really understand how things work.&lt;/p&gt;

&lt;p&gt;So to solve this hanging requests I’ve just set &lt;code&gt;Connection: close&lt;/code&gt; header to enforce the new connection on each request from the same client and to disable HTTP keepalive. As an alternative, I could just insist on HTTP 1.0 where there is no keep-alive.&lt;/p&gt;

&lt;p&gt;Now, that it’s creating lots of connections let’s make it keep those connections for a client-specified delay to appear a slow client.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to make it slow
&lt;/h2&gt;

&lt;p&gt;I needed to make it slow because I wanted my server to spend some time handling the requests while avoiding putting sleeps in the server code.&lt;/p&gt;

&lt;p&gt;Initially, I thought to make reading on the client side slow, i.e. reading one byte at a time or delaying reading the server response. Interestingly, none of these solutions worked.&lt;/p&gt;

&lt;p&gt;I tested my client with nginx by watching access log with the &lt;a href="https://nginx.ru/en/docs/http/ngx_http_core_module.html#var_request_time"&gt;&lt;code&gt;$request_time&lt;/code&gt;&lt;/a&gt;variable. Needless to say, all of my requests were served in 0.000 seconds. Whatever delay I’ve inserted, nginx seemed to ignore it.&lt;/p&gt;

&lt;p&gt;I started to figure out why by tweaking various parts of the request-response pipeline like the number of connections, response size, etc.&lt;/p&gt;

&lt;p&gt;Finally, I was able to see my delay only when nginx was serving really big file like 30 MB and that’s when it clicked.&lt;/p&gt;

&lt;p&gt;The whole reason for this delay ignoring behavior were socket buffers. Socket buffers are, well, buffers for sockets, in other words, it’s the piece of memory where the Linux kernel buffers the network requests and responses for performance reason – to send data in big chunks over the network and to mitigate slow clients, and also for other things like TCP retransmission. Socket buffers are like page cache – all network I/O (with page cache it’s disk I/O) is made through it unless explicitly skipped.&lt;/p&gt;

&lt;p&gt;So in my case, when nginx received a request, the response written by send/write syscall was merely stored in the socket buffer but from nginx point of view, it was done. Only when the response was large enough to not fit in the socket buffer, nginx would be blocked in syscall and wait until the client delay was elapsed, socket buffer was read and freed for the next portion of data.&lt;/p&gt;

&lt;p&gt;You can check and tube the size of the socket buffers in&lt;code&gt;/proc/sys/net/ipv4/tcp_rmem&lt;/code&gt; and &lt;code&gt;/proc/sys/net/ipv4/tcp_wmem&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So after figuring this out, I’ve inserted delay after establishing the connection and before sending a request.&lt;/p&gt;

&lt;p&gt;This way the server will keep around client connections (yay, c10k!) for a client-specified delay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;So in the end, I have a 2 c10k clients – one written &lt;a href="https://github.com/dzeban/c10k/blob/master/go-client.go"&gt;in Go&lt;/a&gt; and the other written &lt;a href="https://github.com/dzeban/c10k/blob/master/libuv-client.c"&gt;in C with libuv&lt;/a&gt;. The Python 3 client is on its way.&lt;/p&gt;

&lt;p&gt;All of these clients connect to the HTTP server, waits for a specified delay and then sends GET request with &lt;code&gt;Connection: close&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;This makes HTTP server keep a dedicated connection for each request and spend some time waiting to emulate I/O.&lt;/p&gt;

&lt;p&gt;That’s how my c10k clients work.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Configuring JMX exporter for Kafka and Zookeeper</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Sat, 12 May 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/configuring-jmx-exporter-for-kafka-and-zookeeper-2440</link>
      <guid>https://dev.to/dzeban/configuring-jmx-exporter-for-kafka-and-zookeeper-2440</guid>
      <description>&lt;p&gt;I’ve been using Prometheus for quite some time and really enjoying it. Most of the things are quite simple – installing and configuring Prometheus is easy, setting up exporters is launch and forget, &lt;a href="https://dev.to/blog/go-prometheus-service/"&gt;instrumenting your code&lt;/a&gt; is a bliss. But there are 2 things that I’ve really struggled with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Grokking data model and PromQL to get meaningful insights.&lt;/li&gt;
&lt;li&gt;Configuring jmx-exporter.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this post, I’ll share the JMX part because I don’t feel that I’ve fully understood the data model and PromQL. So let’s dive into that jmx-exporter thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is jmx-exporter
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/prometheus/jmx_exporter"&gt;jmx-exporter&lt;/a&gt; is a program that reads JMX data from JVM based applications (e.g. Java and Scala) and exposes it via HTTP in a simple text format that Prometheus understand and can scrape.&lt;/p&gt;

&lt;p&gt;JMX is a common technology in Java world for exporting statistics of running application and also to control it (you can trigger GC with JMX, for example).&lt;/p&gt;

&lt;p&gt;jmx-exporter is a Java application that uses JMX APIs to collect the app and JVM metrics. It is a Java agent which means it is running inside the same JVM. This gives you a nice benefit of not exposing JMX remotely – jmx-exporter will just collect the metrics and exposes it over HTTP in read-only mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing jmx-exporter
&lt;/h2&gt;

&lt;p&gt;Because it’s written in Java, jmx-exporter is distributed as a jar, so you just need to download it &lt;a href="https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar"&gt;from maven&lt;/a&gt; and put it somewhere on your target host.&lt;/p&gt;

&lt;p&gt;I have an Ansible role for this – &lt;a href="https://github.com/alexdzyoba/ansible-jmx-exporter"&gt;https://github.com/alexdzyoba/ansible-jmx-exporter&lt;/a&gt;. Besides downloading the jar it’ll also put the configuration file for jmx-exporter.&lt;/p&gt;

&lt;p&gt;This configuration file contains rules for rewriting JMX MBeans to the Prometheus exposition format metrics. Basically, it’s a collection of regexps to convert MBeans strings to Prometheus strings.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/prometheus/jmx_exporter/tree/master/example_configs"&gt;example_configs directory&lt;/a&gt; in jmx-exporter sources contains examples for many popular Java apps including Kafka and Zookeeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Zookeeper with jmx-exporter
&lt;/h2&gt;

&lt;p&gt;As I’ve said jmx-exporter runs inside other JVM as java agent to collect JMX metrics. To demonstrate how it all works, let’s run it within Zookeeper.&lt;/p&gt;

&lt;p&gt;Zookeeper is a crucial part of many production systems including Hadoop, Kafka and Clickhouse, so you really want to monitor it. Despite the fact that you can do this with &lt;a href="https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands"&gt;4lw commands&lt;/a&gt;(&lt;code&gt;mntr&lt;/code&gt;, &lt;code&gt;stat&lt;/code&gt;, etc.) and that there &lt;a href="https://github.com/dln/zookeeper_exporter"&gt;are&lt;/a&gt; &lt;a href="https://github.com/lucianjon/zk-exporter"&gt;dedicated&lt;/a&gt; &lt;a href="https://github.com/dabealu/zookeeper-exporter"&gt;exporters&lt;/a&gt; I prefer to use JMX to avoid constant Zookeeper querying (they add noise to metrics because 4lw commands counted as normal Zookeeper requests).&lt;/p&gt;

&lt;p&gt;To scrape Zookeeper JMX metrics with jmx-exporter you have to pass the following arguments to Zookeeper launch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-javaagent:/opt/jmx-exporter/jmx-exporter.jar=7070:/etc/jmx-exporter/zookeeper.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use the Zookeeper that is distributed with Kafka (you shouldn’t) then pass it via &lt;code&gt;EXTRA_ARGS&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ export EXTRA_ARGS="-javaagent:/opt/jmx-exporter/jmx-exporter.jar=7070:/etc/jmx-exporter/zookeeper.yml"
$ /opt/kafka_2.11-0.10.1.0/bin/zookeeper-server-start.sh /opt/kafka_2.11-0.10.1.0/config/zookeeper.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use standalone Zookeeper distribution then add it as SERVER_JVMFLAGS to the zookeeper-env.sh:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# zookeeper-env.sh
SERVER_JVMFLAGS="-javaagent:/opt/jmx-exporter/jmx-exporter.jar=7070:/etc/jmx-exporter/zookeeper.yml"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anyway, when you launch Zookeeper you should see the process listening on the specified port (7070 in my case) and responding to &lt;code&gt;/metrics&lt;/code&gt; queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ netstat -tlnp | grep 7070
tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 892/java

$ curl -s localhost:7070/metrics | head
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 16.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 12.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 16.0
# HELP jvm_threads_started_total Started thread count of a JVM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Configuring Kafka with jmx-exporter
&lt;/h2&gt;

&lt;p&gt;Kafka is a message broker written in Scala so it runs in JVM which in turn means that we can use jmx-exporter for its metrics.&lt;/p&gt;

&lt;p&gt;To run jmx-exporter within Kafka, you should set &lt;code&gt;KAFKA_OPTS&lt;/code&gt; environment variable like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ export KAFKA_OPTS='-javaagent:/opt/jmx-exporter/jmx-exporter.jar=7071:/etc/jmx-exporter/kafka.yml'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then launch the Kafka (I assume that Zookeeper is already launched as it’s required by Kafka):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ /opt/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh /opt/kafka_2.11-0.10.1.0/conf/server.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check that jmx-exporter HTTP server is listening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ netstap -tlnp | grep 7071
tcp6 0 0 :::7071 :::* LISTEN 19288/java
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And scrape the metrics!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl -s localhost:7071 | grep -i kafka | head
# HELP kafka_server_replicafetchermanager_minfetchrate Attribute exposed for management (kafka.server&amp;lt;type=ReplicaFetcherManager, name=MinFetchRate, clientId=Replica&amp;gt;&amp;lt;&amp;gt;Value)
# TYPE kafka_server_replicafetchermanager_minfetchrate untyped
kafka_server_replicafetchermanager_minfetchrate{clientId="Replica",} 0.0
# HELP kafka_network_requestmetrics_totaltimems Attribute exposed for management (kafka.network&amp;lt;type=RequestMetrics, name=TotalTimeMs, request=OffsetFetch&amp;gt;&amp;lt;&amp;gt;Count)
# TYPE kafka_network_requestmetrics_totaltimems untyped
kafka_network_requestmetrics_totaltimems{request="OffsetFetch",} 0.0
kafka_network_requestmetrics_totaltimems{request="JoinGroup",} 0.0
kafka_network_requestmetrics_totaltimems{request="DescribeGroups",} 0.0
kafka_network_requestmetrics_totaltimems{request="LeaveGroup",} 0.0
kafka_network_requestmetrics_totaltimems{request="GroupCoordinator",} 0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how to run jmx-exporter java agent if you are running Kafka under systemd:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
[Service]
Restart=on-failure
Environment=KAFKA_OPTS=-javaagent:/opt/jmx-exporter/jmx-exporter.jar=7071:/etc/jmx-exporter/kafka.yml
ExecStart=/opt/kafka/bin/kafka-server-start.sh /etc/kafka/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
TimeoutStopSec=600
User=kafka
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;With jmx-exporter you can scrape the metrics of running JVM applications. jmx-exporter runs as a Java agent (inside the target JVM) scrapes JMX metrics, rewrite it according to config rules and exposes it in Prometheus exposition format.&lt;/p&gt;

&lt;p&gt;For a quick setup check my Ansible &lt;a href="https://github.com/alexdzyoba/ansible-jmx-exporter"&gt;role for jmx-exporter&lt;/a&gt; &lt;a href="https://galaxy.ansible.com/alexdzyoba/jmx-exporter/"&gt;alexdzyoba.jmx-exporter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That’s all for now, stay tuned by &lt;a href="https://alex.dzyoba.com/feed"&gt;subscribing to the RSS&lt;/a&gt; or follow me on &lt;a href="https://twitter.com/alexdzyoba"&gt;Twitter @AlexDzyoba&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Redis cluster with cross replication</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Sat, 21 Apr 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/redis-cluster-with-cross-replication-2h31</link>
      <guid>https://dev.to/dzeban/redis-cluster-with-cross-replication-2h31</guid>
      <description>&lt;p&gt;In my previous post on &lt;a href="https://dev.to/blog/redis-ha/"&gt;Redis high availability&lt;/a&gt;, I’ve said that Redis cluster has some sharp corners and promised to tell about it.&lt;/p&gt;

&lt;p&gt;This post will cover tricky cases with cross-replicated cluster only because that’s what I use. If you have a plain flat topology with single Redis instances on the dedicated nodes you’ll be fine. But it’s not my case.&lt;/p&gt;

&lt;p&gt;So let’s dive in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;First, let’s define some terms so we understand each other.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node – physical &lt;strong&gt;server&lt;/strong&gt; or VM where you will run the Redis instance.&lt;/li&gt;
&lt;li&gt;Instance – Redis server &lt;strong&gt;process&lt;/strong&gt; in a cluster mode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Second, let me describe how my Redis cluster topology looks like and what is cross-replication.&lt;/p&gt;

&lt;p&gt;Redis cluster is built from multiple Redis instances that are run in a cluster mode. Each instance is isolated because it serves a particular subset of keys in a master or slave &lt;strong&gt;role&lt;/strong&gt;. The emphasis on the role is intentional – there is separate Redis instance for every shard master and every shard replica, e.g. if you have 3 shards with replication factor 3 (2 additional replicas) you have to run 9 Redis instances. This was my first naive attempt to create a cluster on 3 nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-trib create --replicas 2 10.135.78.153:7000 10.135.78.196:7000 10.135.64.55:7000
&amp;gt;&amp;gt;&amp;gt; Creating cluster
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 3 nodes and 2 replicas per node.
*** At least 9 nodes are required.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;code&gt;redis-trib&lt;/code&gt; is an “official” tool to create a Redis cluster)&lt;/p&gt;

&lt;p&gt;The important point here is that all of the Redis tools operate with Redis instances, not nodes, so it’s your responsibility to put the instances in the right redundant topology.&lt;/p&gt;

&lt;h2&gt;
  
  
  The motivation for cross replication
&lt;/h2&gt;

&lt;p&gt;Redis cluster requires at least 3 nodes because to survive network partition it needs a masters majority (like in Sentinel). If you want 1 replica than add another 3 nodes and boom! now you have a 6 nodes cluster to operate.&lt;/p&gt;

&lt;p&gt;It’s fine if you work in the cloud where you can just spin up a dozen of small nodes that cost you a little. Unfortunately, not everyone joined the cloud party and have to operate real metal nodes and server hardware usually starts with something like 32 GiB of RAM and 8 core CPU which is a real overkill for a Redis node.&lt;/p&gt;

&lt;p&gt;So to save on hardware we can make a trick and run several instances on a single node (and probably colocate it with other services). But remember that in that case, you have to distribute masters among nodes manually and configure &lt;strong&gt;cross-replication&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Cross replication simply means that you don’t have dedicated nodes for replicas, you just replicate the data to the next node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fredis-cluster-cross-replication.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fredis-cluster-cross-replication.png" alt="Redis cluster with cross replication"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This way you save on the cluster size – you can make a Redis cluster with 2 replicas on 3 nodes instead of 9. So you have fewer things to operate and nodes are better utilized – instead of one single-threaded lightweight Redis process per 9 nodes now you’ll have 3 such processes on 3 nodes.&lt;/p&gt;

&lt;p&gt;To create a cluster you have to run a &lt;code&gt;redis-server&lt;/code&gt; with &lt;code&gt;cluster-enabled&lt;br&gt;
yes&lt;/code&gt; parameter. With a cross-replicated cluster you run multiple Redis instances on a node, so you have to run it on separate ports. You can check these &lt;a href="https://linode.com/docs/applications/big-data/how-to-install-and-configure-a-redis-cluster-on-ubuntu-1604/" rel="noopener noreferrer"&gt;two&lt;/a&gt; &lt;a href="http://codeflex.co/configuring-redis-cluster-on-linux/" rel="noopener noreferrer"&gt;manuals&lt;/a&gt; for details but the essential part are configs. This is the config file I’m using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;protected-mode no
port {{ redis_port }}
daemonize no
loglevel notice
logfile ""
cluster-enabled yes
cluster-config-file nodes-{{ redis_port }}.conf
cluster-node-timeout 5000
cluster-require-full-coverage no
cluster-slave-validity-factor 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;redis_port&lt;/code&gt; variable takes 7000, 7001 and 7002 values for each shard. Launch 3 instances of Redis server with 7000, 7001 and 7002 on each of 3 nodes so you’ll have 9 instances total and let’s continue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a cross-replicated cluster
&lt;/h2&gt;

&lt;p&gt;The first surprise may hit you when you’ll build the cluster. If you invoke the&lt;code&gt;redis-trib&lt;/code&gt; like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-trib create --replicas 2 10.135.78.153:7000 10.135.78.196:7000 10.135.64.55:7000 10.135.78.153:7001 10.135.78.196:7001 10.135.64.55:7001 10.135.78.153:7002 10.135.78.196:7002 10.135.64.55:7002
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then it may put all your master instances on a single node. This is happening because, again, it assumes that each instance lives on the separate node.&lt;/p&gt;

&lt;p&gt;So you have to distribute masters and slaves by hand. To do so, first, create a cluster from masters and then add slaves for each master.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create a cluster with masters
$ redis-trib create 10.135.78.153:7000 10.135.78.196:7001 10.135.64.55:7002
&amp;gt;&amp;gt;&amp;gt; Creating cluster
&amp;gt;&amp;gt;&amp;gt; Performing hash slots allocation on 3 nodes...
Using 3 masters:
10.135.78.153:7000
10.135.78.196:7001
10.135.64.55:7002
M: 763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000
slots:0-5460 (5461 slots) master
M: f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001
slots:5461-10922 (5462 slots) master
M: 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002
slots:10923-16383 (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
&amp;gt;&amp;gt;&amp;gt; Nodes configuration updated
&amp;gt;&amp;gt;&amp;gt; Assign a different config epoch to each node
&amp;gt;&amp;gt;&amp;gt; Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join.
&amp;gt;&amp;gt;&amp;gt; Performing Cluster Check (using node 10.135.78.153:7000)
M: 763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000
slots:0-5460 (5461 slots) master
0 additional replica(s)
M: 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002
slots:10923-16383 (5461 slots) master
0 additional replica(s)
M: f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001
slots:5461-10922 (5462 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
&amp;gt;&amp;gt;&amp;gt; Check for open slots...
&amp;gt;&amp;gt;&amp;gt; Check slots coverage...
[OK] All 16384 slots covered.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is our cluster now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;127.0.0.1:7000&amp;gt; CLUSTER NODES                
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524041299000 1 connected 0-5460
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524041299426 2 connected 5461-10922
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 master - 0 1524041298408 3 connected 10923-16383
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now add 2 replicas for each master:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-trib add-node --slave --master-id 763646767dd5492366c3c9f2978faa022833b7af 10.135.78.196:7000 10.135.78.153:7000
$ redis-trib add-node --slave --master-id 763646767dd5492366c3c9f2978faa022833b7af 10.135.64.55:7000 10.135.78.153:7000

$ redis-trib add-node --slave --master-id f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.153:7001 10.135.78.153:7000
$ redis-trib add-node --slave --master-id f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.64.55:7001 10.135.78.153:7000

$ redis-trib add-node --slave --master-id 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.78.153:7002 10.135.78.153:7000
$ redis-trib add-node --slave --master-id 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.78.196:7002 10.135.78.153:7000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, this is our brand new cross replicated cluster with 2 replicas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524041947000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524041948515 1 connected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524041947094 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524043602115 2 connected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524043601595 2 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524043600057 2 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 master - 0 1524041948515 3 connected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 0 1524041948000 3 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 slave 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 0 1524041947094 3 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Failover of a cluster node
&lt;/h2&gt;

&lt;p&gt;If we fail (with &lt;code&gt;DEBUG SEGFAULT&lt;/code&gt; command) our third node (10.135.64.55) cluster will continue to work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;127.0.0.1:7000&amp;gt; CLUSTER NODES
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524043923000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524043924569 1 connected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave,fail 763646767dd5492366c3c9f2978faa022833b7af 1524043857000 1524043856593 1 disconnected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524043924874 2 connected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524043924000 2 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave,fail f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 1524043862669 1524043862000 2 disconnected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 master,fail - 1524043864490 1524043862567 3 disconnected
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524043924568 4 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master - 0 1524043924000 4 connected 10923-16383
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see that replica on 10.135.78.196:7002 took over the slot range 10923-16383 and now it’s master&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;127.0.0.1:7000&amp;gt; set a 2
-&amp;gt; Redirected to slot [15495] located at 10.135.78.196:7002
OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Should we restore Redis instances on the third node cluster will restore&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;127.0.0.1:7000&amp;gt; CLUSTER nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524044130000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524044131572 1 connected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524044131367 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524044130334 2 connected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524044131876 2 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524044131877 2 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master - 0 1524044131572 4 connected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524044131000 4 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524044131572 4 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, master was &lt;strong&gt;not restored back&lt;/strong&gt; on original node, it’s still on the second node (10.135.78.196). After reboot the third node contains only slave instances&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes | grep 10.135.64.55
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524044294347 1 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524044293138 2 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524044294553 4 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the second node serves 2 master instances.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes | grep 10.135.78.196
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524044345000 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524044345000 2 connected 5461-10922
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master - 0 1524044345000 4 connected 10923-16383
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, what is interesting is that if the second node will fail in this state we’ll lose 2 out of 3 masters and we’ll &lt;strong&gt;lose the whole cluster&lt;/strong&gt; because there is no masters quorum.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524046655000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave,fail 763646767dd5492366c3c9f2978faa022833b7af 1524046544940 1524046544000 1 disconnected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524046654010 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master,fail? - 1524046602511 1524046601582 2 disconnected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524046655039 2 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524046656075 2 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master,fail? - 1524046605581 1524046603746 4 disconnected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524046654623 4 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524046654515 4 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me reiterate that – with cross replicated cluster you may lose the whole cluster after 2 consequent reboots of the single nodes. This is the reason why you’re better off with a dedicated node for each Redis instance, otherwise, with cross replication, we should really watch for masters distribution.&lt;/p&gt;

&lt;p&gt;To avoid the situation above we should manually failover one of the slaves on the third node to become a master.&lt;/p&gt;

&lt;p&gt;To do this we should connect to the 10.135.64.55:7002 which is replica now and then issue &lt;code&gt;CLUSTER FAILOVER&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;127.0.0.1:7002&amp;gt; CLUSTER FAILOVER
OK

127.0.0.1:7002&amp;gt; CLUSTER NODES
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 master - 0 1524047703000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524047703512 1 connected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524047703512 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524047703000 2 connected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524047703000 2 connected
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524047703110 2 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 myself,master - 0 1524047703000 5 connected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 0 1524047702510 5 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 slave 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 0 1524047702009 5 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Replacing a failed node
&lt;/h2&gt;

&lt;p&gt;Now, suppose we’ve lost our third node completely and want to replace it with a completely new node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524047906000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524047906811 1 connected
0441f7534aed16123bb3476124506251dab80747 10.135.64.55:7000@17000 slave,fail 763646767dd5492366c3c9f2978faa022833b7af 1524047871538 1524047869000 1 connected
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524047908000 2 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524047907318 2 connected 5461-10922
00eb2402fc1868763a393ae2c9843c47cd7d49da 10.135.64.55:7001@17001 slave,fail f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 1524047872042 1524047869515 2 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master - 0 1524047907000 6 connected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524047908336 6 connected
5f4bb09230ca016e7ffe2e6a4e5a32470175fb66 10.135.64.55:7002@17002 master,fail - 1524047871840 1524047869314 5 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, we have to forget the lost node by issuing &lt;code&gt;CLUSTER FORGET &amp;lt;node-id&amp;gt;&lt;/code&gt;on &lt;strong&gt;every single node&lt;/strong&gt; of the cluster (even slaves).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for id in 0441f7534aed16123bb3476124506251dab80747 00eb2402fc1868763a393ae2c9843c47cd7d49da 5f4bb09230ca016e7ffe2e6a4e5a32470175fb66; do 
    for port in 7000 7001 7002; do 
        redis-cli -c -p ${port} CLUSTER FORGET ${id}
    done
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check that we’ve forgotten the failed node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524048240000 1 connected 0-5460
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524048241342 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524048240332 2 connected 5461-10922
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524048240000 2 connected
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 master - 0 1524048241000 6 connected 10923-16383
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 0 1524048241845 6 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now spin up a new node, install redis on it and launch 3 new instances with our cluster configuration.&lt;/p&gt;

&lt;p&gt;These 3 new nodes doesn’t know anything about the cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[root@redis-replaced ~]# redis-cli -c -p 7000 cluster nodes
9a9c19e24e04df35ad54a8aff750475e707c8367 :7000@17000 myself,master - 0 0 0 connected
[root@redis-replaced ~]# redis-cli -c -p 7001 cluster nodes
3a35ebbb6160232d36984e7a5b97d430077e7eb0 :7001@17001 myself,master - 0 0 0 connected
[root@redis-replaced ~]# redis-cli -c -p 7002 cluster nodes
df701f8b24ae3c68ca6f9e1015d7362edccbb0ab :7002@17002 myself,master - 0 0 0 connected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;so we have to add these Redis instances to the cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-trib add-node --slave --master-id 763646767dd5492366c3c9f2978faa022833b7af 10.135.82.90:7000 10.135.78.153:7000
$ redis-trib add-node --slave --master-id f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.82.90:7001 10.135.78.153:7000
$ redis-trib add-node --slave --master-id 19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.82.90:7002 10.135.78.153:7000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we should failover for the third shard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[root@redis-replaced ~]# redis-cli -c -p 7002 cluster failover
OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aaaand, it’s done!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ redis-cli -c -p 7000 cluster nodes
763646767dd5492366c3c9f2978faa022833b7af 10.135.78.153:7000@17000 myself,master - 0 1524049388000 1 connected 0-5460
f90c932d5cf435c75697dc984b0cbb94c130f115 10.135.78.153:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524049389000 2 connected
af75fc17e552279e5939bfe2df68075b3b6f9b29 10.135.78.153:7002@17002 slave df701f8b24ae3c68ca6f9e1015d7362edccbb0ab 0 1524049388000 7 connected
216a5ea51af1faed7fa42b0c153c91855f769321 10.135.78.196:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524049389579 1 connected
f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 10.135.78.196:7001@17001 master - 0 1524049389579 2 connected 5461-10922
19b8c9f7ac472ecfedd109e6bb7a4b932905c4fd 10.135.78.196:7002@17002 slave df701f8b24ae3c68ca6f9e1015d7362edccbb0ab 0 1524049388565 7 connected
9a9c19e24e04df35ad54a8aff750475e707c8367 10.135.82.90:7000@17000 slave 763646767dd5492366c3c9f2978faa022833b7af 0 1524049389880 1 connected
3a35ebbb6160232d36984e7a5b97d430077e7eb0 10.135.82.90:7001@17001 slave f63c210b13d68fa5dc97ca078af6d9c167f8c6ec 0 1524049389579 2 connected
df701f8b24ae3c68ca6f9e1015d7362edccbb0ab 10.135.82.90:7002@17002 master - 0 1524049389579 7 connected 10923-16383
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;If you have to deal with bare metal servers, want a highly available Redis cluster and effectively utilize your hardware you have a good option of building cross replicated topology of Redis cluster.&lt;/p&gt;

&lt;p&gt;This will work great but there are 2 caveats:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cluster building is a manual process because you have to put masters on separate nodes.&lt;/li&gt;
&lt;li&gt;You have to monitor your masters’ distribution to avoid cluster failure after a single node failure.&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Redis high availability</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Wed, 28 Mar 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/redis-high-availability-5b7b</link>
      <guid>https://dev.to/dzeban/redis-high-availability-5b7b</guid>
      <description>&lt;p&gt;Recently, at the place where I work, we started to use Redis for session-like objects storage. Despite that these objects are small and short-lived, without them our service would stop working, so a question about Redis high availability arose. Turns out, for Redis there is no ready-made solution – there are multiple options with different tradeoffs and information sometimes is a bit scarce and distributed across documentation and blog posts, hence I’m writing this in shy hope of helping another poor soul like myself to solve such problem. I’m by no means a Redis guru but I wanted to share my experience anyway because, after all, it’s my personal blog.&lt;/p&gt;

&lt;p&gt;I’m going to describe high availability in terms of node failure and not persistence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis high availability options
&lt;/h2&gt;

&lt;p&gt;Standalone Redis, which is a good old &lt;code&gt;redis-server&lt;/code&gt; you launch after installation, is easy to setup and use, but it’s not resilient to the failure of a node it’s running on. It doesn’t matter whether you use RDB or AOF as long as a node is unavailable you are in a trouble.&lt;/p&gt;

&lt;p&gt;Over the years, Redis community came up with a few high availability options – most of them are built in Redis itself, though there are some others that are 3rd party tools. Let’s dive into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple Redis replication
&lt;/h2&gt;

&lt;p&gt;Redis has a replication support since, like, forever and it works great – just put the &lt;code&gt;slaveof &amp;lt;addr&amp;gt; &amp;lt;port&amp;gt;&lt;/code&gt; in your config file and the instance will start receiving the stream of the data from the master.&lt;/p&gt;

&lt;p&gt;You can configure multiple slaves for the master, you can configure slave for a slave, you can enable slave-only persistence, you can make replication synchronous (it’s async by default) – the list of what you can do with Redis seems like bounded only by your imagination. Just read the &lt;a href="https://redis.io/topics/replication"&gt;docs for replication&lt;/a&gt; – it’s really great.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick and simple to setup&lt;/li&gt;
&lt;li&gt;Could be automated via configuration management tools&lt;/li&gt;
&lt;li&gt;Continue to work as long as a single master instance is available - it can survive failures of all of the slave instances.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writes must go to the master&lt;/li&gt;
&lt;li&gt;Slaves may serve reads but because replication is asynchronous you may get stale reads&lt;/li&gt;
&lt;li&gt;It doesn’t shard data, so master and slaves will have unbalanced utilization&lt;/li&gt;
&lt;li&gt;In the case of master failure, you have to elect the new master manually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last thing is, IMHO, a major downside and that’s where the Redis Sentinel helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis replication with Sentinel
&lt;/h2&gt;

&lt;p&gt;Nobody wants to wake up in the middle of the night, just to issue the&lt;code&gt;SLAVEOF NO ONE&lt;/code&gt; to elect new master – it’s pretty silly and should be automated, right? Right. That’s why Redis Sentinel exists.&lt;/p&gt;

&lt;p&gt;Redis Sentinel is the tool that monitors Redis masters and slaves and automatically elects the new master from one of the slaves. It’s a really critical task so you’re better off making Sentinel highly available itself. Luckily, it has a built-in clustering which makes it a distributed system.&lt;/p&gt;

&lt;p&gt;Sentinel is a quorum system, meaning that to agree on the new master there should be a majority of Sentinel nodes alive. This has a huge implication on how to deploy Sentinel. There are basically 2 options here – colocate with Redis server or deploy on a separate cluster. Colocating with Redis server makes sense because Sentinel is a very lightweight process, so why pay for additional nodes? But in this case, we lose our resilience because if you colocate Redis server and Sentinel on, say, 3 nodes, you can only lose 1 node because Sentinel needs 2 nodes to elect the new Redis server master. Without Sentinel, we could lose 2 slave nodes. So maybe you should think about a dedicated Sentinel cluster. If you’re on the cloud you could deploy it on some sort of nano instances but maybe it’s not your case. Tradeoffs, tradeoffs, I know.&lt;/p&gt;

&lt;p&gt;Besides dealing with maintaining one more distributed system, with Sentinel, you should change the way your clients work with Redis because now your master node can move. For this case, your application should first go to Sentinel, ask it about current master and only then work with it. You can build a clever hack with HAProxy here – instead of going to Sentinel you can put a HAProxy in front of Redis servers to detect the new master with the help of TCP checks. See example &lt;a href="https://www.haproxy.com/blog/haproxy-advanced-redis-health-check"&gt;at HAProxy blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nevertheless, Sentinel colocated with Redis servers is a really common solution for Redis high availability, for example, Gitlab recommends it in &lt;a href="https://docs.gitlab.com/ee/administration/high_availability/redis.html"&gt;its admin guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically selects new master in case of its failure. Yay!&lt;/li&gt;
&lt;li&gt;Easy to setup, (seems) easy to maintain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yet another distributed system to maintain&lt;/li&gt;
&lt;li&gt;May require a dedicated cluster if not colocated with Redis server&lt;/li&gt;
&lt;li&gt;Still doesn’t shard data, so master will be overutilized in comparison to slaves&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Redis cluster
&lt;/h2&gt;

&lt;p&gt;All of the solutions above seems IMHO half-assed because they add more things and these things are not obvious at least at first sight. I don’t know any other system that solves availability problem by adding yet another cluster that must be available itself. It’s just annoying.&lt;/p&gt;

&lt;p&gt;So with recent versions of Redis came the Cluster – a builtin feature that adds sharding, replication and high availability to the known and loved Redis. Within a cluster, you have multiple master instances that serve a subset of the keyspace. Clients may send requests to any of the master instances which will redirect to the correct instance for the given key. Master instances may have as many replicas as they want, and these replicas will be promoted to master automatically even without a quorum. Note, though, that master instances quorum is required for the whole cluster work, but a quorum is not required for the shard working including the new master election.&lt;/p&gt;

&lt;p&gt;Each instance in the Redis cluster (master or slave) should be deployed on a dedicated node but you can configure cross replication where each node will contain multiple instances. There are sharp corners here, though, that I’ll illustrate in the next post, so stay tuned!&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shards data across multiple nodes&lt;/li&gt;
&lt;li&gt;Has replication support&lt;/li&gt;
&lt;li&gt;Has builtin failover of the master&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not every library supports it&lt;/li&gt;
&lt;li&gt;May not be as robust (yet) as standalone Redis or Sentinel&lt;/li&gt;
&lt;li&gt;Tooling is wack, building and maintaining (replacing a node) cluster is a manual process&lt;/li&gt;
&lt;li&gt;Introduces an extra network hop in case we missed the shard.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Twemproxy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/twitter/twemproxy"&gt;Twemproxy&lt;/a&gt; is a special proxy for in-memory databases – namely, memcached and Redis – that was built by Twitter. It adds sharding with consistent hashing, so resharding is not that painful, and also maintains persistent connections and enables requests/response pipelining.&lt;/p&gt;

&lt;p&gt;I haven’t tried it because in the era of Redis cluster it doesn’t seem relevant to me anymore, so I couldn’t tell pros and cons, but YMMV.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis Enterprise
&lt;/h2&gt;

&lt;p&gt;After the initial post, quite a few people reached out to me telling that they have great success with Redis Enterprise from Redis Labs. Check out &lt;a href="https://www.reddit.com/r/devops/comments/86tcry/redis_high_availability/dw9lulm/"&gt;this one from Reddit&lt;/a&gt;. The point is that if you have a really high workload and your data is more critical and you can afford it then you should consider their solution.&lt;/p&gt;

&lt;p&gt;You may also check their guide on &lt;a href="https://redislabs.com/redis-features/high-availability"&gt;Redis High Availability&lt;/a&gt; – it is also well written and illustrated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Choosing the right solution for Redis high availability is full of tradeoffs. Nobody knows your situation better than you, so get to know how Redis works – there is no magic here – in the end, you’ll have to maintain the solution. In my case, we have chosen a Redis cluster with cross replication after lots of testing and writing a doc with instructions on how to deal with failures.&lt;/p&gt;

&lt;p&gt;That’s all for now, stay tuned for the dedicated Redis cluster post!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to use Ansible with Terraform</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Fri, 09 Mar 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/how-to-use-ansible-with-terraform-4k89</link>
      <guid>https://dev.to/dzeban/how-to-use-ansible-with-terraform-4k89</guid>
      <description>&lt;p&gt;Recently, I’ve started using Terraform for creating a cloud test rig and it’s pretty dope. In a matter of a few days, I went from “never used AWS” to the “I have a declarative way to create an isolated infrastructure in the cloud”. I’m spinning a couple of instances in a dedicated subnet inside a VPC with a security group and dedicated SSH keypair and all of this is coded in a mere few hundred lines.&lt;/p&gt;

&lt;p&gt;It’s all nice and dandy but after creating an instance from some basic AMI I need to provision it. My go-to tool for this is Ansible but, unfortunately, Terraform doesn’t support it natively as it does for Chef and Salt. This is unlike &lt;a href="https://www.packer.io/"&gt;Packer&lt;/a&gt; that has &lt;code&gt;ansible&lt;/code&gt; (remote) and &lt;code&gt;ansible-local&lt;/code&gt; that &lt;a href="https://dev.to/blog/packer-for-docker/"&gt;I’ve used for creating a Docker image&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So I’ve spent some time and found a few ways to marry Terraform with Ansible that I’ll describe hereafter. But first, let’s talk about provisioning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do we really need provisioning in the cloud?
&lt;/h2&gt;

&lt;p&gt;Instead of using the empty AMIs you could bake your own AMI and skip the whole provisioning part completely but I see a giant flaw in this setup. Every change, even a small one, requires recreation of the whole instance. If it’s a change somewhere on the base level then you’ll need to recreate your whole fleet. It quickly becomes unusable in case of deployment, security patching, adding/removing a user, changing config and other simple things.&lt;/p&gt;

&lt;p&gt;Even more so if you bake your own AMIs then you should again provision it somehow and that’s where things like Ansible appears again. My recommendation here is again to &lt;a href="https://dev.to/blog/packer-for-docker/"&gt;use Packer with Ansible&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So in most cases, I’m strongly for the provisioning because it’s unavoidable anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use Ansible with Terraform
&lt;/h2&gt;

&lt;p&gt;Now, returning to the actual provisioning I found 3 ways to use Ansible with Terraform after reading the heated discussion at &lt;a href="https://github.com/hashicorp/terraform/issues/2661"&gt;this GitHub issue&lt;/a&gt;. Read on to find the one that’s most suitable for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inline inventory with instance IP
&lt;/h3&gt;

&lt;p&gt;One of the most obvious yet hacky solutions is to invoke Ansible within &lt;code&gt;local-exec&lt;/code&gt; provisioner. Here is how it looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provisioner "local-exec" {
    command = "ansible-playbook -i '${self.public_ip},' --private-key ${var.ssh_key_private} provision.yml"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice and simple, but there is a problem here. &lt;code&gt;local-exec&lt;/code&gt; provisioner starts without waiting for an instance to launch, so in most cases, it will fail because by the time it will try to connect there is nobody listening.&lt;/p&gt;

&lt;p&gt;As a nice workaround, you can use preliminary &lt;code&gt;remote-exec&lt;/code&gt; provisioner that will wait until the connection to the instance is established and then invoke the&lt;code&gt;local-exec&lt;/code&gt; provisioner.&lt;/p&gt;

&lt;p&gt;As a result, I have this thingy that plays the role of “Ansible provisioner”&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provisioner "remote-exec" {
    inline = ["sudo dnf -y install python"]

    connection {
      type = "ssh"
      user = "fedora"
      private_key = "${file(var.ssh_key_private)}"
    }
  }

  provisioner "local-exec" {
    command = "ansible-playbook -u fedora -i '${self.public_ip},' --private-key ${var.ssh_key_private} provision.yml" 
  }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To make &lt;code&gt;ansible-playbook&lt;/code&gt; work you have to have an Ansible code in the same directory with Terraform code like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ll infra
drwxrwxr-x. 3 avd avd 4.0K Mar 5 15:54 roles/
-rw-rw-r--. 1 avd avd 367 Mar 5 15:19 ansible.cfg
-rw-rw-r--. 1 avd avd 2.5K Mar 7 18:54 main.tf
-rw-rw-r--. 1 avd avd 454 Mar 5 15:27 variables.tf
-rw-rw-r--. 1 avd avd 38 Mar 5 15:54 provision.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This inline inventory will work in most cases, except when you need multiple hosts in inventory. For example, when you setup Consul agent you need a list of Consul servers for rendering a config and that is usually found in the usual inventory. So but it won’t work here because you have a single host in your inventory.&lt;/p&gt;

&lt;p&gt;Anyway, I’m using this approach for the basic things like setting up users and installing some basic packages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic inventory after Terraform
&lt;/h3&gt;

&lt;p&gt;Another simple solution for provisioning infrastructure created by Terraform is just don’t tie Terraform and Ansible together. Create infrastructure with Terraform and then use Ansible with dynamic inventory regardless of how your instances were created.&lt;/p&gt;

&lt;p&gt;So you first create an infra with &lt;code&gt;terraform apply&lt;/code&gt; and then you invoke &lt;code&gt;ansible-playbook -i inventory site.yml&lt;/code&gt;, where &lt;code&gt;inventory&lt;/code&gt; dir contains dynamic inventory scripts.&lt;/p&gt;

&lt;p&gt;This will work great but has a little drawback – if you need to increase the number of instances you must remember to launch Ansible after Terraform.&lt;/p&gt;

&lt;p&gt;That’s what I use complementary to the previous approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inventory from Terraform state
&lt;/h3&gt;

&lt;p&gt;There is another interesting thing that might work for you – generate static inventory from Terraform state.&lt;/p&gt;

&lt;p&gt;When you work with Terraform it maintains the state of the infrastructure that contains everything including your instances. With a &lt;a href="https://www.terraform.io/docs/backends/index.html"&gt;local backend&lt;/a&gt;, this state is stored in a JSON file that can be easily parsed and converted to the Ansible inventory.&lt;/p&gt;

&lt;p&gt;Here are 2 projects with examples that you can use if you want to go this way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/adammck/terraform-inventory"&gt;https://github.com/adammck/terraform-inventory&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ terraform-inventory -inventory terraform.tfstate
[all]
52.51.215.84

[all:vars]

[server]
52.51.215.84

[server.0]
52.51.215.84

[type_aws_instance]
52.51.215.84

[name_c10k server]
52.51.215.84

[%_1]
52.51.215.84
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/express42/terraform-ansible-example/blob/master/ansible/terraform.py"&gt;https://github.com/express42/terraform-ansible-example/blob/master/ansible/terraform.py&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ~/soft/terraform.py --root . --hostfile
## begin hosts generated by terraform.py ##
52.51.215.84 C10K Server
## end hosts generated by terraform.py ##
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;IMHO, I don’t see a point in this approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ansible plugin for Terraform that didn’t work for me
&lt;/h3&gt;

&lt;p&gt;Finally, there are few projects that try to make a native looking Ansible provisioner for Terraform like builtin Chef provisioner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/jonmorehouse/terraform-provisioner-ansible"&gt;https://github.com/jonmorehouse/terraform-provisioner-ansible&lt;/a&gt; – this was the first attempt to make such plugin but, unfortunately, it’s not currently maintained and moreover it’s not supported by the current Terraform plugin system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/radekg/terraform-provisioner-ansible"&gt;https://github.com/radekg/terraform-provisioner-ansible&lt;/a&gt; – this one is more recent and currently maintained. It enables this kind of provisioning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
provisioner "ansible" {
    plays {
        playbook = "./provision.yml"
        hosts = ["${self.public_ip}"]
    }
    become = "yes"
    local = "yes"
}
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, I wasn’t able to make it work so I blew it off because first 2 solutions cover all of my cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Terraform and Ansible is a powerful combo that I use for provisioning cloud infrastructure. For basic cloud instances setup, I invoke Ansible with&lt;code&gt;local-exec&lt;/code&gt; and later I invoke Ansible separately with dynamic inventory.&lt;/p&gt;

&lt;p&gt;You can find an example of how I do it at &lt;a href="https://github.com/dzeban/c10k/tree/master/infrastructure"&gt;c10k/infrastructure&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks! Until next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Instrumenting a Go service for Prometheus</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Sat, 03 Feb 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/instrumenting-a-go-service-for-prometheus-khp</link>
      <guid>https://dev.to/dzeban/instrumenting-a-go-service-for-prometheus-khp</guid>
      <description>&lt;p&gt;I’m the big proponent of the DevOps practices and always been keen to operate things I’ve developed. That’s why I’m really excited about DevOps, SRE, Observability, Service Discovery and other great things which I believe will transform our industry to be truly software &lt;strong&gt;engineering&lt;/strong&gt;. In this blog I’m trying to (among other cool stuff I’m doing) share examples of how you can help yourself or your grumpy Ops guys to operate your service. &lt;a href="https://dev.to/blog/go-consul-service/"&gt;Last time&lt;/a&gt; we developed a typical web service, serving data from key-value storage, and added Consul integration into it for Service Discovery. This time we are going to instrument our code for monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why instrument?
&lt;/h2&gt;

&lt;p&gt;At first, you may wonder why should we instrument our code, why not collect metrics needed for the monitoring from the outside like just install Zabbix agent or setup Nagios checks? There is nothing really wrong with that solution where you treat monitoring targets as black boxes. Though there is another way to do that – white-box monitoring – where your services provide metrics themselves as a result of instrumentation. It’s not really about choosing only one way of doing things – both of these solutions may, and should, supplement each other. For example, you may treat your database servers as a black box providing metrics such as available memory, while instrumenting your database access layer to measure DB request latency.&lt;/p&gt;

&lt;p&gt;It’s all about different points of view and it was discussed &lt;a href="https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html" rel="noopener noreferrer"&gt;in Google’s SRE book&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The simplest way to think about black-box monitoring versus white-box monitoring is that black-box monitoring is symptom-oriented and represents active—not predicted—problems: “The system isn’t working correctly, right now.” White-box monitoring depends on the ability to inspect the innards of the system, such as logs or HTTP endpoints, with instrumentation. White-box monitoring, therefore, allows detection of imminent problems, failures masked by retries, and so forth. … When collecting telemetry for debugging, white-box monitoring is essential. If web servers seem slow on database-heavy requests, you need to know both how fast the web server perceives the database to be, and how fast the database believes itself to be. Otherwise, you can’t distinguish an actually slow database server from a network problem between your web server and your database.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My point is that to gain a real observability of your system you should supplement your existing black-box monitoring with a white-box by instrumenting your services.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to instrument
&lt;/h2&gt;

&lt;p&gt;Now, after we convinced that instrumenting is a good thing let’s think about what to monitor. A lot of people say that you should instrument everything you can, but I think it’s over-engineering and you should instrument for things that really matter to avoid codebase complexity and unnecessary CPU cycles in your service for collecting the bloat of metrics.&lt;/p&gt;

&lt;p&gt;So what are those &lt;em&gt;things that really matter&lt;/em&gt; that we should instrument for? Well, the same SRE book defines the so-called &lt;strong&gt;four golden signals&lt;/strong&gt; of monitoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic or Request Rate&lt;/li&gt;
&lt;li&gt;Errors&lt;/li&gt;
&lt;li&gt;Latency or Duration of the requests&lt;/li&gt;
&lt;li&gt;Saturation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Out of these 4 signals, saturation is the most confusing because it’s not clear how to measure it or if it’s even possible in a software system. I see saturation mostly for the hardware resources which I’m not going to cover here, check the &lt;a href="http://www.brendangregg.com/usemethod.html" rel="noopener noreferrer"&gt;Brendan Gregg’s USE method&lt;/a&gt; for this.&lt;/p&gt;

&lt;p&gt;Because saturation is hard to measure in a software system, there is a service tailored version of 4 golden signals which is called &lt;a href="https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/" rel="noopener noreferrer"&gt;“the RED method”&lt;/a&gt;, which lists 3 metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;R&lt;/strong&gt;equest rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E&lt;/strong&gt;rrors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;D&lt;/strong&gt;uration (latency) distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s what we’ll instrument for in the &lt;code&gt;webkv&lt;/code&gt; service.&lt;/p&gt;

&lt;p&gt;We will use Prometheus to monitor our service because it’s a go-to tool for monitoring these days – it’s simple, easy to setup and fast. We will need &lt;a href="https://godoc.org/github.com/prometheus/client_golang/prometheus" rel="noopener noreferrer"&gt;Prometheus Go client library&lt;/a&gt; for instrumenting our code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting HTTP handlers
&lt;/h2&gt;

&lt;p&gt;Prometheus works by pulling data from &lt;code&gt;/metrics&lt;/code&gt; HTTP handler that serves metrics in a simple text-based exposition format so we need to calculate RED metrics and export it via a dedicated endpoint.&lt;/p&gt;

&lt;p&gt;Luckily, all of these metrics can be easily exported with an &lt;code&gt;InstrumentHandler&lt;/code&gt; helper.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gh"&gt;diff --git a/webkv.go b/webkv.go
index 94bd025..f43534f 100644
&lt;/span&gt;&lt;span class="gd"&gt;--- a/webkv.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/webkv.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -9,6 +9,7 @@&lt;/span&gt; import (
        "strings"
        "time"
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+       "github.com/prometheus/client_golang/prometheus"
&lt;/span&gt;        "github.com/prometheus/client_golang/prometheus/promhttp"
&lt;span class="err"&gt;
&lt;/span&gt;        "github.com/alexdzyoba/webkv/service"
&lt;span class="p"&gt;@@ -32,7 +33,7 @@&lt;/span&gt; func main() {
        if err != nil {
                log.Fatal(err)
        }
&lt;span class="gd"&gt;-       http.Handle("/", s)
&lt;/span&gt;&lt;span class="gi"&gt;+       http.Handle("/", prometheus.InstrumentHandler("webkv", s))
&lt;/span&gt;        http.Handle("/metrics", promhttp.Handler())
&lt;span class="err"&gt;
&lt;/span&gt;        l := fmt.Sprintf(":%d", *port)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and now to export the metrics via &lt;code&gt;/metrics&lt;/code&gt; endpoint just add another 2 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gh"&gt;diff --git a/webkv.go b/webkv.go
index 1b2a9d7..94bd025 100644
&lt;/span&gt;&lt;span class="gd"&gt;--- a/webkv.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/webkv.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -9,6 +9,8 @@&lt;/span&gt; import (
        "strings"
        "time"
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+       "github.com/prometheus/client_golang/prometheus/promhttp"
+
&lt;/span&gt;        "github.com/alexdzyoba/webkv/service"
 )
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;@@ -31,6 +33,7 @@&lt;/span&gt; func main() {
                log.Fatal(err)
        }
        http.Handle("/", s)
&lt;span class="gi"&gt;+       http.Handle("/metrics", promhttp.Handler())
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;        l := fmt.Sprintf(":%d", *port)
        log.Print("Listening on ", l)
&lt;span class="err"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s it!&lt;/p&gt;

&lt;p&gt;No, seriously, that’s all you need to do to make your service observable. It’s so nice and easy that you don’t have excuses for not doing it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;InstrumentHandler&lt;/code&gt; conveniently wraps your handler and export the following metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;http_request_duration_microseconds&lt;/code&gt; summary with 50, 90 and 99 percentiles&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;http_request_size_bytes&lt;/code&gt; summary with 50, 90 and 99 percentiles&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;http_response_size_bytes&lt;/code&gt; summary with 50, 90 and 99 percentiles&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;http_requests_total&lt;/code&gt; counter labeled by status code and handler&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;promhttp.Handler&lt;/code&gt; also exports Go runtime information like a number of goroutines and memory stats.&lt;/p&gt;

&lt;p&gt;The point is that you export simple metrics that you can easily calculate on the service and everything else is done with Prometheus and its powerful query language PromQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scraping metrics with Prometheus
&lt;/h2&gt;

&lt;p&gt;Now you need to tell Prometheus about your services so it will start scraping them. We could’ve hard code our endpoint with &lt;a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#&amp;lt;static_config" rel="noopener noreferrer"&gt;&lt;code&gt;static_configs&lt;/code&gt;&lt;/a&gt; pointing it to the ‘localhost:8080’. But remember how &lt;a href="https://dev.to/blog/go-consul-service/"&gt;we previously registered out service in Consul&lt;/a&gt;? Prometheus can discover targets for scraping from Consul for our service and any other services with a single job definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;consul'&lt;/span&gt;
  &lt;span class="na"&gt;consul_sd_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;localhost:8500'&lt;/span&gt;
  &lt;span class="na"&gt;relabel_configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;source_labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;__meta_consul_service&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;target_label&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;job&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the pure awesomeness of Service Discovery! Your ops buddy will thank you for that :-)&lt;/p&gt;

&lt;p&gt;(&lt;code&gt;relabel_configs&lt;/code&gt; is needed because otherwise all services would be scraped as &lt;code&gt;consul&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Check that Prometheus recognized new targets:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fprometheus-consul-discovery.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fprometheus-consul-discovery.png" alt="Consul services in Prometheus"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yay!&lt;/p&gt;

&lt;h2&gt;
  
  
  The RED method metrics
&lt;/h2&gt;

&lt;p&gt;Now let’s calculate the metrics for the RED method. First one is the request rate and it can be calculated from &lt;code&gt;http_requests_total&lt;/code&gt; metric like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rate(http_requests_total{job="webkv",code=~"^2.*"}[1m])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We filter HTTP request counter for the &lt;code&gt;webkv&lt;/code&gt; job and successful HTTP status code, get a vector of values for the last 1 minute and then take a rate, which is basically a diff between first and last values. This gives us the amount of request that was successfully handled in the last minute. Because counter is accumulating we’ll never miss values even if some scrape failed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-request-rate.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-request-rate.png" alt="Request rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The second one is the errors that we can calculate from the same metric as a rate but what we actually want is a percentage of errors. This is how I calculate it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sum(rate(http_requests_total{job=“webkv”,code!~“^2.*“}[1m])) 
/ sum(rate(http_requests_total{job=“webkv”}[1m])) 
* 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this error query, we take the rate of error requests, that is the ones with non 2xx status code. This will give us multiple series for each status code like 404 or 500 so we need to &lt;code&gt;sum&lt;/code&gt; them. Next, we do the same &lt;code&gt;sum&lt;/code&gt; and &lt;code&gt;rate&lt;/code&gt; but for all of the requests regardless of its status to get the overall request rate. And finally, we divide and multiply by 100 to get a percentage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-errors.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-errors.png" alt="Errors"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, the latency distribution lies directly in &lt;code&gt;http_request_duration_microseconds&lt;/code&gt; metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http_request_duration_microseconds{job="webkv"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-latency.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-latency.png" alt="Latency"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So that was easy and it’s more than enough for my simple service.&lt;/p&gt;

&lt;p&gt;If you want to instrument for some custom metrics you can do it easily. I’ll show you how to do the same for the Redis requests that are made from the &lt;code&gt;webkv&lt;/code&gt; handler. It’s not of a much use because there is a dedicated &lt;a href="https://github.com/oliver006/redis_exporter" rel="noopener noreferrer"&gt;Redis exporter&lt;/a&gt; for Prometheus but, anyway, it’s just for the illustration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting for the custom metrics (Redis requests)
&lt;/h2&gt;

&lt;p&gt;As you can see from the previous sections all we need to get the meaningful monitoring are just 2 metrics – a plain counter for HTTP request quantified on status code and a &lt;a href="https://prometheus.io/docs/concepts/metric_types/" rel="noopener noreferrer"&gt;summary&lt;/a&gt; for request durations.&lt;/p&gt;

&lt;p&gt;Let’s start with the counter. First, to make things nice, we define a new type &lt;code&gt;Metrics&lt;/code&gt; with Prometheus &lt;code&gt;CounterVec&lt;/code&gt; and add it to the &lt;code&gt;Service&lt;/code&gt; struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/service.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/service.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -13,6 +14,7 @@&lt;/span&gt; type Service struct {
        Port        int
        RedisClient redis.UniversalClient
        ConsulAgent *consul.Agent
&lt;span class="gi"&gt;+       Metrics     Metrics
&lt;/span&gt; }
&lt;span class="gi"&gt;+
+type Metrics struct {
+       RedisRequests *prometheus.CounterVec
+}
+
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we must register our metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/service.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/service.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -28,6 +30,15 @@&lt;/span&gt; func New(addrs []string, ttl time.Duration, port int) (*Service, error) {
                Addrs: addrs,
        })
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+       s.Metrics.RedisRequests = prometheus.NewCounterVec(
+               prometheus.CounterOpts{
+                       Name: "redis_requests_total",
+                       Help: "How many Redis requests processed, partitioned by status",
+               },
+               []string{"status"},
+       )
+       prometheus.MustRegister(s.Metrics.RedisRequests)
+
&lt;/span&gt;        ok, err := s.Check()
        if !ok {
                return nil, err
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have created a variable of &lt;code&gt;CounterVec&lt;/code&gt; type because plain &lt;code&gt;Counter&lt;/code&gt; is for a single time series and we have a label for status, which makes it a vector of time series.&lt;/p&gt;

&lt;p&gt;Finally, we need to increment the counter depending on the status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/redis.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/redis.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -15,7 +15,9 @@&lt;/span&gt; func (s *Service) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        if err != nil {
                http.Error(w, "Key not found", http.StatusNotFound)
                status = 404
&lt;span class="gi"&gt;+               s.Metrics.RedisRequests.WithLabelValues("fail").Inc()
&lt;/span&gt;        }
&lt;span class="gi"&gt;+       s.Metrics.RedisRequests.WithLabelValues("success").Inc()
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;        fmt.Fprint(w, val)
        log.Printf("url=\"%s\" remote=\"%s\" key=\"%s\" status=%d\n",
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check, that it’s working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl -s 'localhost:8080/metrics' | grep redis
# HELP redis_requests_total How many Redis requests processed, partitioned by status
# TYPE redis_requests_total counter
redis_requests_total{status="fail"} 904
redis_requests_total{status="success"} 5433
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice!&lt;/p&gt;

&lt;p&gt;Calculating latency distribution is a little bit more involved because we have to time our requests and put it in distribution buckets. Fortunately, there is a very nice &lt;code&gt;prometheus.Timer&lt;/code&gt; helper to help measure time. As for the distribution buckets, Prometheus has a &lt;code&gt;Summary&lt;/code&gt; type that does it automatically.&lt;/p&gt;

&lt;p&gt;Ok, so first we have to register our new metric (adding it to our &lt;code&gt;Metrics&lt;/code&gt; type):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/service.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/service.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -18,7 +18,8 @@&lt;/span&gt; type Service struct {
 }
&lt;span class="err"&gt;
&lt;/span&gt; type Metrics struct {
        RedisRequests  *prometheus.CounterVec
&lt;span class="gi"&gt;+       RedisDurations prometheus.Summary
&lt;/span&gt; }
&lt;span class="err"&gt;
&lt;/span&gt; func New(addrs []string, ttl time.Duration, port int) (*Service, error) {
&lt;span class="p"&gt;@@ -39,6 +40,14 @@&lt;/span&gt; func New(addrs []string, ttl time.Duration, port int) (*Service, error) {
        )
        prometheus.MustRegister(s.Metrics.RedisRequests)
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+       s.Metrics.RedisDurations = prometheus.NewSummary(
+               prometheus.SummaryOpts{
+                       Name:       "redis_request_durations",
+                       Help:       "Redis requests latencies in seconds",
+                       Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
+               })
+       prometheus.MustRegister(s.Metrics.RedisDurations)
+
&lt;/span&gt;        ok, err := s.Check()
        if !ok {
                return nil, err
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our new metrics is just a &lt;code&gt;Summary&lt;/code&gt;, not a &lt;code&gt;SummaryVec&lt;/code&gt; because we have no labels. We defined 3 “objectives” – basically 3 buckets for calculating distribution – 50, 90 and 99 percentiles.&lt;/p&gt;

&lt;p&gt;Here is how we measure request latency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/redis.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/redis.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -5,12 +5,18 @@&lt;/span&gt; import (
        "log"
        "net/http"
        "strings"
&lt;span class="gi"&gt;+
+       "github.com/prometheus/client_golang/prometheus"
&lt;/span&gt; )
&lt;span class="err"&gt;
&lt;/span&gt; func (s *Service) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    status := 200
&lt;span class="err"&gt;
&lt;/span&gt;    key := strings.Trim(r.URL.Path, "/")
&lt;span class="gi"&gt;+
+   timer := prometheus.NewTimer(s.Metrics.RedisDurations)
+   defer timer.ObserveDuration()
+
&lt;/span&gt;    val, err := s.RedisClient.Get(key).Result()
    if err != nil {
            http.Error(w, "Key not found", http.StatusNotFound)
            status = 404
            s.Metrics.RedisRequests.WithLabelValues("fail").Inc()
        }
    s.Metrics.RedisRequests.WithLabelValues("success").Inc()
&lt;span class="err"&gt;
&lt;/span&gt;    fmt.Fprint(w, val)
    log.Printf("url=\"%s\" remote=\"%s\" key=\"%s\" status=%d\n",
        r.URL, r.RemoteAddr, key, status)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, it’s that easy. You just create a new timer and defer it’s invocation so it will be invoked on the function exit. Although it will additionally calculate a logging I’m okay with that.&lt;/p&gt;

&lt;p&gt;By default, this timer measure time in seconds. To mimic &lt;code&gt;http_request_duration_microseconds&lt;/code&gt; we can implement &lt;code&gt;Observer&lt;/code&gt; interface that &lt;code&gt;NewTimer&lt;/code&gt; accepts that does the calculation our way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;--- a/service/redis.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/redis.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -14,7 +14,10 @@&lt;/span&gt; func (s *Service) ServeHTTP(w http.ResponseWriter, r *http.Request) {
&lt;span class="err"&gt;
&lt;/span&gt;        key := strings.Trim(r.URL.Path, "/")
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gd"&gt;-       timer := prometheus.NewTimer(s.Metrics.RedisDurations)
&lt;/span&gt;&lt;span class="gi"&gt;+       timer := prometheus.NewTimer(prometheus.ObserverFunc(func(v float64) {
+               us := v * 1000000 // make microseconds
+               s.Metrics.RedisDurations.Observe(us)
+       }))
&lt;/span&gt;        defer timer.ObserveDuration()
&lt;span class="err"&gt;
&lt;/span&gt;        val, err := s.RedisClient.Get(key).Result()
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gd"&gt;--- a/service/service.go
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/service/service.go
&lt;/span&gt;&lt;span class="p"&gt;@@ -43,7 +43,7 @@&lt;/span&gt; func New(addrs []string, ttl time.Duration, port int) (*Service, error) {
        s.Metrics.RedisDurations = prometheus.NewSummary(
                prometheus.SummaryOpts{
                        Name:       "redis_request_durations",
&lt;span class="gd"&gt;-                       Help:       "Redis requests latencies in seconds",
&lt;/span&gt;&lt;span class="gi"&gt;+                       Help:       "Redis requests latencies in microseconds",
&lt;/span&gt;                        Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
                })
        prometheus.MustRegister(s.Metrics.RedisDurations)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl -s 'localhost:8080/metrics' | grep -P '(redis.*durations)'
# HELP redis_request_durations Redis requests latencies in microseconds
# TYPE redis_request_durations summary
redis_request_durations{quantile="0.5"} 207.17399999999998
redis_request_durations{quantile="0.9"} 230.399
redis_request_durations{quantile="0.99"} 298.585
redis_request_durations_sum 3.290851703000006e+06
redis_request_durations_count 15728
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now, when we have beautiful metrics let’s make a dashboard for them!&lt;/p&gt;

&lt;h2&gt;
  
  
  Grafana dashboard
&lt;/h2&gt;

&lt;p&gt;It’s no secret, that once you have a Prometheus, you will eventually have Grafana to show dashboards for your metrics because Grafana has builtin support for Prometheus as a data source.&lt;/p&gt;

&lt;p&gt;In my dashboard, I’ve just put our RED metrics and sprinkled some colors. Here is the final dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Falex.dzyoba.com%2Fimg%2Fwebkv-dashboard.png" alt="webkv dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note, that for latency graph, I’ve created 3 series for each of the 0.5, 0.9 and 0.99 quantiles, and divided it by 1000 for millisecond values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There is no magic here, monitoring the four golden signals or the RED metrics is easy with modern tools like Prometheus and Grafana and you really need it because without it you’re flying blind. So the next time you will develop any service, just add some instrumentation – be nice and cultivate at least some operational sympathy for great good.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Hitchhiker's guide to the Python imports</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Sat, 13 Jan 2018 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/hitchhikers-guide-to-the-python-imports-1215</link>
      <guid>https://dev.to/dzeban/hitchhikers-guide-to-the-python-imports-1215</guid>
      <description>&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; : If you write Python on a daily basis you will find nothing new in this post. It’s for people who &lt;strong&gt;occasionally use Python&lt;/strong&gt; like Ops guys and forget/misuse its import system. Nonetheless, the code is written with Python 3.6 type annotations to entertain an experienced Python reader. As usual, if you find any mistakes, please let me know!&lt;/p&gt;

&lt;h2&gt;
  
  
  Modules
&lt;/h2&gt;

&lt;p&gt;Let’s start with a common Python stanza of&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;invoke_the_real_code&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A lot of people, and I’m not an exception, write it as a ritual without trying to understand it. We somewhat know that this snippet makes difference when you invoke your code from CLI versus import it. But let’s try to understand why we really need it.&lt;/p&gt;

&lt;p&gt;For illustration, assume that we’re writing some pizza shop software. It’s &lt;a href="https://github.com/dzeban/python-imports"&gt;on Github&lt;/a&gt;. Here is the &lt;code&gt;pizza.py&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pizza.py file
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;math&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Pizza&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;awesomeness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'Carbonara'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;9000&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'pizza.py module name is %s'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Carbonara is the most awesome pizza.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’ve added printing of the magical &lt;code&gt;__name__&lt;/code&gt; variable to see how it may change.&lt;/p&gt;

&lt;p&gt;OK, first, let’s run it as a script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 pizza.py
pizza.py module name is __main__
Carbonara is the most awesome pizza.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Indeed, the &lt;code&gt;__name__&lt;/code&gt; global variable is set to the &lt;code&gt;__main__&lt;/code&gt; when we invoke it from CLI.&lt;/p&gt;

&lt;p&gt;But what if we import it from another file? Here is the &lt;code&gt;menu.py&lt;/code&gt; source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# menu.py file
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pizza&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pizza&lt;/span&gt;

&lt;span class="n"&gt;MENU&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Pizza&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;Pizza&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Margherita'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Pizza&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Carbonara'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;14.99&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Pizza&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Marinara'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;16.99&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MENU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;menu.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 menu.py
pizza.py module name is pizza
[&amp;lt;pizza.Pizza object at 0x7fbbc1045470&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fbbc10454e0&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fbbc1045b38&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now we see 2 things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The top-level &lt;code&gt;print&lt;/code&gt; statement from pizza.py was executed on import&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;__name__&lt;/code&gt; in pizza.py is now set to the filename without &lt;code&gt;.py&lt;/code&gt; suffix.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, the thing is, &lt;code&gt;__name__&lt;/code&gt; is the global variable that holds the name of the current Python module.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Module name is set by the interpreter in &lt;code&gt;__name__&lt;/code&gt; variable&lt;/li&gt;
&lt;li&gt;When module is invoked from CLI its name is set to &lt;code&gt;__main__&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what is the module, after all? It’s really simple - module is a file containing Python code that you can execute with the interpreter (the &lt;code&gt;python&lt;/code&gt;program) or import from other modules.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python module is just a file with Python code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just like when executing, when the module is being imported, its top-level statements are executed, but be aware that it’ll be executed only once even if you import it several times even from different files.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you import module it’s executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because modules are just plain files, there is a simple way to import them. Just take the filename, remove the &lt;code&gt;.py&lt;/code&gt; extension and put it in the &lt;code&gt;import&lt;/code&gt;statement.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To import modules you use the filename without the &lt;code&gt;.py&lt;/code&gt; extensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What is interesting is that &lt;code&gt;__name__&lt;/code&gt; is set to the filename regardless how you import it – with &lt;code&gt;import pizza as broccoli&lt;/code&gt; &lt;code&gt;__name__&lt;/code&gt; will still be the&lt;code&gt;pizza&lt;/code&gt;. So&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When imported, the module name is set to filename without &lt;code&gt;.py&lt;/code&gt; extension even if it’s renamed with &lt;code&gt;import module as othername&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But what if the module that we import is not located in the same directory, how can we import it? The answer is in module search path that we’ll eventually discover while discussing packages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Packages
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Package is a &lt;strong&gt;namespace&lt;/strong&gt; for a collection of modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The namespace part is important because by itself package doesn’t provide any functionality – it only gives you a way to group a bunch of your modules.&lt;/p&gt;

&lt;p&gt;There are 2 cases where you really want to put modules into a package. First is to isolate definitions of one module from the other. In our &lt;code&gt;pizza&lt;/code&gt; module, we have a &lt;code&gt;Pizza&lt;/code&gt; class that might conflict with other’s Pizza packages (&lt;a href="https://pypi.org/search/?q=pizza"&gt;and we do have some pizza packages on pypi&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;The second case is if you want to distribute your code because&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Package is the minimal unit of code &lt;em&gt;distribution&lt;/em&gt; in Python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything that you see on PyPI and install via &lt;code&gt;pip&lt;/code&gt; is a package, so in order to share your awesome stuff, you have to make a package out of it.&lt;/p&gt;

&lt;p&gt;Alright, assume we’re convinced and want to convert our 2 modules into a nice package. To do this we need to create a directory with empty &lt;code&gt;__init__.py&lt;/code&gt; file and move our files to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pizzapy/
├── __init__.py
├── menu.py
└── pizza.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s it – now you have a &lt;code&gt;pizzapy&lt;/code&gt; package!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To make a package create the directory with &lt;code&gt;__init__.py&lt;/code&gt; file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remember that package is a &lt;strong&gt;namespace&lt;/strong&gt; for modules, so you don’t import the package itself, you import a module from a package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import pizzapy.menu
pizza.py module name is pizza
&amp;gt;&amp;gt;&amp;gt; pizzapy.menu.MENU
[&amp;lt;pizza.Pizza object at 0x7fa065291160&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fa065291198&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fa065291a20&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do the import that way, it may seem too verbose because you need to use the fully qualified name. I guess that’s intentional behavior because one of the Python Zen items is “explicit is better than implicit”.&lt;/p&gt;

&lt;p&gt;Anyway, you can always use a &lt;code&gt;from package import module&lt;/code&gt; form to shorten names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; from pizzapy import menu
pizza.py module name is pizza
&amp;gt;&amp;gt;&amp;gt; menu.MENU
[&amp;lt;pizza.Pizza object at 0x7fa065291160&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fa065291198&amp;gt;, &amp;lt;pizza.Pizza object at 0x7fa065291a20&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Package init
&lt;/h3&gt;

&lt;p&gt;Remember how we put a &lt;code&gt;__init__.py&lt;/code&gt; file in a directory and it magically became a package? That’s a great example of convention over configuration – we don’t need to describe any configuration or register anything. Any directory with&lt;code&gt;__init__.py&lt;/code&gt; by convention is a Python package.&lt;/p&gt;

&lt;p&gt;Besides making a package &lt;code&gt;__init__.py&lt;/code&gt; conveys one more purpose – package initialization. That’s why it’s called init after all! Initialization is triggered on the package import, in other words, importing a package invokes&lt;code&gt;__init__.py&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you import a package, the &lt;code&gt;__init__.py&lt;/code&gt; module of the package is executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the &lt;code&gt;__init__&lt;/code&gt; module you can do anything you want, but most commonly it’s used for some package initialization or setting the special &lt;code&gt;__all__&lt;/code&gt; variable. The latter controls star import – &lt;code&gt;from package import *&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And because Python is awesome we can do pretty much anything in the &lt;code&gt;__init__&lt;/code&gt;module, even really strange things. Suppose we don’t like the explicitness of import and want to drag all of the modules’ symbols up to the package level, so we don’t have to remember the actual module names.&lt;/p&gt;

&lt;p&gt;To do that we can import everything from &lt;code&gt;menu&lt;/code&gt; and &lt;code&gt;pizza&lt;/code&gt; modules in&lt;code&gt;__init__.py&lt;/code&gt; like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# pizzapy/__init__.py

from pizzapy.pizza import *
from pizzapy.menu import *
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import pizzapy
pizza.py module name is pizzapy.pizza
pizza.py module name is pizza
&amp;gt;&amp;gt;&amp;gt; pizzapy.MENU
[&amp;lt;pizza.Pizza object at 0x7f1bf03b8828&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f1bf03b8860&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f1bf03b8908&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No more &lt;code&gt;pizzapy.menu.Menu&lt;/code&gt; or &lt;code&gt;menu.MENU&lt;/code&gt; :-) That way it kinda works like packages in Go, but note that this is discouraged because you are trying to abuse the Python and if you gonna check in such code you gonna have a bad time at code review. I’m showing you this just for the illustration, don’t blame me!&lt;/p&gt;

&lt;p&gt;You could rewrite the import more succinctly like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# pizzapy/__init__.py

from .pizza import *
from .menu import *
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is just another syntax for doing the same thing which is called relative imports. Let’s look at it closer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Absolute and relative imports
&lt;/h3&gt;

&lt;p&gt;The 2 code pieces above is the only way of doing so-called relative import because since Python 3 all imports are absolute by default (as in &lt;a href="https://docs.python.org/2.5/whatsnew/pep-328.html"&gt;PEP328&lt;/a&gt;), meaning that import will try to import standard modules first and only then local packages. This is needed to avoid shadowing of standard modules when you create your own&lt;code&gt;sys.py&lt;/code&gt; module and doing &lt;code&gt;import sys&lt;/code&gt; could override the standard library &lt;code&gt;sys&lt;/code&gt;module.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Since Python 3 all import are absolute by default – it will look for system package first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But if your package has a module called &lt;code&gt;sys&lt;/code&gt; and you want to import it into another module of the same package you have to make a &lt;strong&gt;relative import&lt;/strong&gt;. To do it you have to be explicit again and write &lt;code&gt;from package.module import&lt;br&gt;
somesymbol&lt;/code&gt; or &lt;code&gt;from .module import somesymbol&lt;/code&gt;. That funny single dot before module name is read as “current package”.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To make a relative import prepend the module with the package name or dot&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Executable package
&lt;/h3&gt;

&lt;p&gt;In Python you can invoke a module with a &lt;code&gt;python3 -m &amp;lt;module&amp;gt;&lt;/code&gt; construction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 -m pizza
pizza.py module name is __main__
Carbonara is the most awesome pizza.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But packages can also be invoked this way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 -m pizzapy
/usr/bin/python3: No module named pizzapy. __main__ ; 'pizzapy' is a package and cannot be directly executed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, it needs a &lt;code&gt;__main__&lt;/code&gt; module, so let’s implement it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# pizzapy/__main__.py

from pizzapy.menu import MENU

print('Awesomeness of pizzas:')
for pizza in MENU:
    print(pizza.name, pizza.awesomeness())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 -m pizzapy
pizza.py module name is pizza
Awesomeness of pizzas:
Margherita 300
Carbonara 9000
Marinara 200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Adding &lt;code&gt;__main__.py&lt;/code&gt; makes package executable (invoke it with &lt;code&gt;python3 -m
package&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Import sibling packages
&lt;/h3&gt;

&lt;p&gt;And the last thing I want to cover is the import of sibling packages. Suppose we have a sibling package &lt;code&gt;pizzashop&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── pizzapy
│   ├── __init__.py
│   ├── __main__.py
│   ├── menu.py
│   └── pizza.py
└── pizzashop
    ├── __init__.py
    └── shop.py

# pizzashop/shop.pyimport pizzapy.menuprint(pizzapy.menu.MENU)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, sitting in the top level directory, if we try to invoke shop.py like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 pizzashop/shop.py
Traceback (most recent call last):
  File "pizzashop/shop.py", line 1, in &amp;lt;module&amp;gt;
    import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we get the error that our pizzapy module not found. But if we invoke it as a part of the package&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 -m pizzashop.shop
pizza.py module name is pizza
[&amp;lt;pizza.Pizza object at 0x7f372b59ccc0&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f372b59ccf8&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f372b59cda0&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;it suddenly works. What the hell is going on here?&lt;/p&gt;

&lt;p&gt;The explanation for this lies in the Python module search path and it’s greatly described in &lt;a href="https://docs.python.org/3/tutorial/modules.html#the-module-search-path"&gt;the documentation on modules&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Module search path is a list of directories (available at runtime as &lt;code&gt;sys.path&lt;/code&gt;) that interpreter uses to locate modules. It is initialized with the path to Python standard modules (&lt;code&gt;/usr/lib64/python3.6&lt;/code&gt;), &lt;code&gt;site-packages&lt;/code&gt; where &lt;code&gt;pip&lt;/code&gt; puts everything you install globally, and also a directory that depends on how you run a module. If you run a module as a file like &lt;code&gt;python3 pizzashop/shop.py&lt;/code&gt; the path to containing directory (&lt;code&gt;pizzashop&lt;/code&gt;) is added to &lt;code&gt;sys.path&lt;/code&gt;. Otherwise, including running with &lt;code&gt;-m&lt;/code&gt; option, the current directory (as in &lt;code&gt;pwd&lt;/code&gt;) is added to module search path. We can check it by printing &lt;code&gt;sys.path&lt;/code&gt; in&lt;code&gt;pizzashop/shop.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ pwd
/home/avd/dev/python-imports

$ tree
.
├── pizzapy
│   ├── __init__.py
│   ├── __main__.py
│   ├── menu.py
│   └── pizza.py
└── pizzashop
    ├── __init__.py
    └── shop.py

$ python3 pizzashop/shop.py
['/home/avd/dev/python-imports/pizzashop',
 '/usr/lib64/python36.zip',
 '/usr/lib64/python3.6',
 '/usr/lib64/python3.6/lib-dynload',
 '/usr/local/lib64/python3.6/site-packages',
 '/usr/local/lib/python3.6/site-packages',
 '/usr/lib64/python3.6/site-packages',
 '/usr/lib/python3.6/site-packages']
Traceback (most recent call last):
  File "pizzashop/shop.py", line 5, in &amp;lt;module&amp;gt;
    import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'

$ python3 -m pizzashop.shop
['',
 '/usr/lib64/python36.zip',
 '/usr/lib64/python3.6',
 '/usr/lib64/python3.6/lib-dynload',
 '/usr/local/lib64/python3.6/site-packages',
 '/usr/local/lib/python3.6/site-packages',
 '/usr/lib64/python3.6/site-packages',
 '/usr/lib/python3.6/site-packages']
pizza.py module name is pizza
[&amp;lt;pizza.Pizza object at 0x7f2f75747f28&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f2f75747f60&amp;gt;, &amp;lt;pizza.Pizza object at 0x7f2f75747fd0&amp;gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the first case we have the &lt;code&gt;pizzashop&lt;/code&gt; dir in our path and so we cannot find sibling &lt;code&gt;pizzapy&lt;/code&gt; package, while in the second case the current dir (denoted as &lt;code&gt;''&lt;/code&gt;) is in &lt;code&gt;sys.path&lt;/code&gt; and it contains both packages.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python has module search path available at runtime as &lt;code&gt;sys.path&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;If you run a module as a script file, the containing directory is added to&lt;code&gt;sys.path&lt;/code&gt;, otherwise, the current directory is added to it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This problem of importing the sibling package often arise when people put a bunch of test or example scripts in a directory or package next to the main package. Here is a couple of StackOverflow questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/q/6323860"&gt;https://stackoverflow.com/q/6323860&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/q/6670275"&gt;https://stackoverflow.com/q/6670275&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good solution is to avoid the problem – put tests or examples in the package itself and use relative import. The dirty solution is to modify&lt;code&gt;sys.path&lt;/code&gt; at runtime (yay, dynamic!) by adding the parent directory of the needed package. People actually do this despite it’s an awful hack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The End!
&lt;/h2&gt;

&lt;p&gt;I hope that after reading this post you’ll have a better understanding of Python imports and could finally decompose that giant script you have in your toolbox without fear. In the end, everything in Python is really simple and even when it is not sufficient for your case, you can always monkey patch anything at runtime.&lt;/p&gt;

&lt;p&gt;And on that note, I would like to stop and thank you for your attention. Until next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Write your own diff for fun</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Wed, 27 Dec 2017 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/write-your-own-diff-for-fun-3ckl</link>
      <guid>https://dev.to/dzeban/write-your-own-diff-for-fun-3ckl</guid>
      <description>&lt;p&gt;On the other day, when I was looking at &lt;code&gt;git diff&lt;/code&gt;, I thought “How does it work?“. Brute-force idea of comparing all possible pairs of lines doesn’t seem efficient and indeed it has exponential algorithmic complexity. There must be a better way, right?&lt;/p&gt;

&lt;p&gt;As it turned out, &lt;code&gt;git diff&lt;/code&gt;, like a usual &lt;code&gt;diff&lt;/code&gt; tool is modeled as a solution to a problem called Longest Common Subsequence. The idea is really ingenious – when we try to diff 2 files we see it as 2 sequences of lines and try to find a Longest Common Subsequence. Then anything that is not in that subsequence is our diff. Sounds neat, but how can one implement it in an effective way (without that exponential complexity)?&lt;/p&gt;

&lt;p&gt;LCS problem is a classic problem that is better solved with dynamic programming – somewhat advanced technique in algorithm design that roughly means an iteration with memoization.&lt;/p&gt;

&lt;p&gt;I’ve always struggled with dynamic programming because it’s mostly presented through some (in my opinion) artificial problem that is hard for me to work on. But now, when I see something so useful that can help me write a diff, I just can’t resist.&lt;/p&gt;

&lt;p&gt;I used a &lt;a href="https://en.wikipedia.org/wiki/Longest_common_subsequence_problem"&gt;Wikipedia article on LCS&lt;/a&gt; as my guide, so if you want to check the algorithm nitty-gritty, go ahead to the link. I’m going to show you my implementation (that is, of course, &lt;a href="https://github.com/alexdzyoba/diff"&gt;available on GitHub&lt;/a&gt;) to demonstrate how easily you can solve such seemingly hard problem.&lt;/p&gt;

&lt;p&gt;I’ve chosen Python to implement it and immediately felt grateful because you can copy-paste pseudocode and use it with minimal changes. Here is the diff printing function from Wikipedia article in pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function printDiff(C[0..m,0..n], X[1..m], Y[1..n], i, j)
    if i &amp;gt; 0 and j &amp;gt; 0 and X[i] = Y[j]
        printDiff(C, X, Y, i-1, j-1)
        print " " + X[i]
    else if j &amp;gt; 0 and (i = 0 or C[i,j-1] ≥ C[i-1,j])
        printDiff(C, X, Y, i, j-1)
        print "+ " + Y[j]
    else if i &amp;gt; 0 and (j = 0 or C[i,j-1] &amp;lt; C[i-1,j])
        printDiff(C, X, Y, i-1, j)
        print "- " + X[i]
    else
        print ""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Print the diff using LCS length matrix by backtracking it"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"+ "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"- "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not the actual function for my diff printing because it doesn’t handle few corner cases – it’s just to illustrate Python awesomeness.&lt;/p&gt;

&lt;p&gt;The essence of diffing is building the matrix &lt;code&gt;C&lt;/code&gt; which contains lengths for all subsequences. Building it may seem daunting until you start looking at the simple cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LCS of “A” and “A” is “A”.&lt;/li&gt;
&lt;li&gt;LCS of “AA” and “AB” is “A”.&lt;/li&gt;
&lt;li&gt;LCS of “AAA” and “ABA” is “AA”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building iteratively we can define the LCS function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LCS of 2 empty sequences is the empty sequence.&lt;/li&gt;
&lt;li&gt;LCS of “${prefix1}A” and “${prefix2}A” is LCS(${prefix1}, ${prefix2}) + A&lt;/li&gt;
&lt;li&gt;LCS of “${prefix1}A” and “${prefix2}B” is the longest of LCS(${prefix1}A, ${prefix2}) and LCS(${prefix1}, ${prefix2}B)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s basically the core of dynamic programming – building the solution iteratively starting from the simple base cases. Note, though, that it’s working only when the problem has so-called “optimal” structure, meaning that it can be built by reusing previous memoized steps.&lt;/p&gt;

&lt;p&gt;Here is the Python function that builds that length matrix for all subsequences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lcslen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Build a matrix of LCS length.

    This matrix will be used later to backtrack the real LCS.
    """&lt;/span&gt;

    &lt;span class="c1"&gt;# This is our matrix comprised of list of lists.
&lt;/span&gt;    &lt;span class="c1"&gt;# We allocate extra row and column with zeroes for the base case of empty
&lt;/span&gt;    &lt;span class="c1"&gt;# sequence. Extra row and column is appended to the end and exploit
&lt;/span&gt;    &lt;span class="c1"&gt;# Python's ability of negative indices: x[-1] is the last elem.
&lt;/span&gt;    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;yj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;xi&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;yj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Having the matrix of LCS lengths we can now build the actual LCS by backtracking it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;backtrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Backtrack the LCS length matrix to get the actual LCS"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;backtrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;backtrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;backtrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for diff we don’t need the actual LCS, we need the opposite. So diff printing is actually slightly changed backtrack function with 2 additional cases for changes in the head of a sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Print the diff using LCS length matrix by backtracking it"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"+ "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"- "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"+ "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"- "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To invoke it we read input files into Python lists of strings and pass it to our diff functions. We also add some usual Python stanza:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lcslen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;print_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Usage: {} &amp;lt;file1&amp;gt; &amp;lt;file2&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;f2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there you go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python3 diff.py f1 f2
+ """Simple diff based on LCS solution"""
+ 
+ import sys
  from lcs import lcslen

  def print_diff(c, x, y, i, j):
+ """Print the diff using LCS length matrix by backtracking it"""
+ 
       if i &amp;gt;= 0 and j &amp;gt;= 0 and x[i] == y[j]:
           print_diff(c, x, y, i-1, j-1)
           print(" " + x[i])
       elif j &amp;gt;= 0 and (i == 0 or c[i][j-1] &amp;gt;= c[i-1][j]):
           print_diff(c, x, y, i, j-1)
- print("+ " + y[j])
+ print("+ " + y[j])
       elif i &amp;gt;= 0 and (j == 0 or c[i][j-1] &amp;lt; c[i-1][j]):
           print_diff(c, x, y, i-1, j)
           print("- " + x[i])
       else:
- print("")
- 
+ print("") # pass?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can check out the full source code at &lt;a href="https://github.com/alexdzyoba/diff"&gt;https://github.com/alexdzyoba/diff&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That’s it. Until next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Go service with Consul integration</title>
      <dc:creator>Alex Dzyoba</dc:creator>
      <pubDate>Thu, 14 Dec 2017 00:00:00 +0000</pubDate>
      <link>https://dev.to/dzeban/go-service-with-consul-integration-36mn</link>
      <guid>https://dev.to/dzeban/go-service-with-consul-integration-36mn</guid>
      <description>&lt;p&gt;In the world of stateless microservices, which are usually written in Go, we need to discover them. This is where Hashicorp’s Consul helps. Services register within Consul so other services can discover them via simple DNS or HTTP queries.&lt;/p&gt;

&lt;p&gt;Go has a Consul client library, alas, I didn’t see any real examples of how to integrate it into your services. So here I’m going to show you how to do exactly this.&lt;/p&gt;

&lt;p&gt;I’m going to write a service that will serve at some HTTP endpoint and will serve key-value data – I believe this resembles a lot of existing microservices that people write these days. Ours is called &lt;code&gt;webkv&lt;/code&gt; and it’s &lt;a href="https://github.com/alexdzyoba/webkv"&gt;on Github&lt;/a&gt;. Choose the &lt;a href="https://github.com/alexdzyoba/webkv/tree/v1"&gt;“v1” tag&lt;/a&gt; and you’re good to go.&lt;/p&gt;

&lt;p&gt;This service will register itself in Consul with TTL check that will, well, check internal health status and send a heartbeat like signals to Consul. Should Consul not receive a signal from our service within a TTL interval it will mark it as failed and remove it from queries results.&lt;/p&gt;

&lt;p&gt;Side note: Consul has also simple port checks when Consul agent will judge the health of the service based on the port availability. While it’s much simpler, e.g. you don’t have to add anything to your code, it’s not that powerful as a TTL check. With TTL checks you can inspect the internal state of your service which is a huge advantage in comparison with simple availability – you can accept queries but your data may be stale or invalid. Also, with TTL checks service status can be not only in binary state – good/bad – but also with a warning.&lt;/p&gt;

&lt;p&gt;All right, to the point! The “v1” version of &lt;code&gt;webkv&lt;/code&gt; uses only the standard library and the bare minimum of dependencies like Redis client and Consul API lib. Later I’m going to extend it with other niceties like Prometheus integration, structured logging, and sane configuration management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Web service
&lt;/h2&gt;

&lt;p&gt;Let’s start with a basic web service that will serve key-value data from Redis.&lt;/p&gt;

&lt;p&gt;First, parse &lt;code&gt;port&lt;/code&gt;, &lt;code&gt;ttl&lt;/code&gt;, and &lt;code&gt;addrs&lt;/code&gt; commandline flags. The last one is the list of Redis addresses separated with &lt;code&gt;;&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Port to listen on"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addrsStr&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"addrs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"(Required) Redis addrs (may be delimited by ;)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ttl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Service TTL check duration"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;addrsStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fprintln&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"addrs argument is required"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PrintDefaults&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;addrs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;addrsStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;";"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we create a service that should implement &lt;a href="https://golang.org/pkg/net/http/#Handler"&gt;&lt;code&gt;Handler&lt;/code&gt;&lt;/a&gt; interface and launch it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addrs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Listening on "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing fancy here. Now let’s look at the service itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/go-redis/redis"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;TTL&lt;/span&gt;         &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;
    &lt;span class="n"&gt;RedisClient&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UniversalClient&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Service&lt;/code&gt; is a type that holds a name, TTL and Redis client handler. It’s instantiated like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addrs&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"webkv"&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RedisClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewUniversalClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UniversalOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Addrs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;addrs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Check&lt;/code&gt; method issues &lt;code&gt;PING&lt;/code&gt; Redis command to check if we’re ok. This will be used later with Consul registration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RedisClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ping&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now the implementation of &lt;code&gt;ServeHTTP&lt;/code&gt; method that will be invoked for request processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;

    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RedisClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Key not found"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusNotFound&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;404&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"url=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;%s&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; remote=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;%s&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; key=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;%s&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt; status=%d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RemoteAddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basically, what we do is retrieve URL path from the request and use it as a key for Redis “GET” command. After that, we return the value or 404 in case of an error. Last, we log the request with a quick and dirty structured logging message in&lt;a href="https://brandur.org/logfmt"&gt;logfmt format&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Launch it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./webkv -addrs 'localhost:6379'
2017/12/13 21:44:15 Listening on :8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ curl 'localhost:8080/blink'
182
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And see the log message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2017/12/13 21:44:29 url="/blink" remote="[::1]:35020" key="blink" status=200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Consul integration
&lt;/h2&gt;

&lt;p&gt;Now let’s make our service discoverable via Consul. Consul has simple HTTP API to register services that you can employ directly via “net/http” but we will use its &lt;a href="https://godoc.org/github.com/hashicorp/consul/api"&gt;Go library&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Consul Go library doesn’t have examples, BUT, it has tests! Tests are nice not only because it gives you confidence in your lib, approval for the sanity of your code structure and API and, finally, a set of usage examples. &lt;a href="https://github.com/hashicorp/consul/blob/9f2989424e75ecbcaecb990cf7616ea8ad128adf/api/agent_test.go#L383"&gt;Here is an example&lt;/a&gt; from Consul API test suite for service registration and TTL checks.&lt;/p&gt;

&lt;p&gt;Looking at these tests, we can tell that we interact with Consul by creating a &lt;code&gt;Client&lt;/code&gt; and then getting a handle for the particular &lt;a href="https://www.consul.io/api/index.html"&gt;endpoint&lt;/a&gt; like&lt;code&gt;/agent&lt;/code&gt; or &lt;code&gt;/kv&lt;/code&gt;. For each endpoint, there is a corresponding Go type. Agent endpoint is responsible for service registration and sending health checks. To store an &lt;code&gt;Agent&lt;/code&gt; handle we extend our &lt;code&gt;Service&lt;/code&gt; type with a new pointer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;consul&lt;/span&gt; &lt;span class="s"&gt;"github.com/hashicorp/consul/api"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;TTL&lt;/span&gt;         &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;
    &lt;span class="n"&gt;RedisClient&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UniversalClient&lt;/span&gt;
    &lt;span class="n"&gt;ConsulAgent&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;consul&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, in the Service “constructor” we add the creation of Consul agent handle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addrs&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;consul&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consul&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefaultConfig&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConsulAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we use the agent to register our service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="n"&gt;serviceDef&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;consul&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AgentServiceRegistration&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;consul&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AgentServiceCheck&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;TTL&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TTL&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConsulAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceRegister&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serviceDef&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key thing here is the &lt;code&gt;Check&lt;/code&gt; part where we tell Consul how it should check our service. In our case, we say that we ourselves will send heartbeat-like signals to Consul so that it will mark our service failed after TTL. Failed service is not returned as part of DNS or HTTP API queries.&lt;/p&gt;

&lt;p&gt;After service is registered we have to send a TTL check signal with Pass, Fail or Warn type. We have to send it periodically and in time to avoid service failure by TTL. We’ll do it in a separate goroutine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UpdateTTL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;UpdateTTL&lt;/code&gt; method uses &lt;a href="https://golang.org/pkg/time/#Ticker"&gt;&lt;code&gt;time.Ticker&lt;/code&gt;&lt;/a&gt; to periodically invoke the actual update function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;UpdateTTL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ticker&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTicker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;check&lt;/code&gt; argument is a function that returns a service status. Based on its result we send either pass or fail check:&lt;br&gt;
go&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (s *Service) update(check func() (bool, error)) {
    ok, err := check()
    if !ok {
        log.Printf("err=\"Check failed\" msg=\"%s\"", err.Error())
        if agentErr := s.ConsulAgent.FailTTL("service:"+s.Name, err.Error()); agentErr != nil {
            log.Print(agentErr)
        }
    } else {
        if agentErr := s.ConsulAgent.PassTTL("service:"+s.Name, ""); agentErr != nil {
            log.Print(agentErr)
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check function that we pass to goroutine is the one we used earlier on creating service, it just returns bool status of Redis PING command.&lt;/p&gt;

&lt;p&gt;And that’s it! This is how it all works together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We launch the &lt;code&gt;webkv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;It connects to Redis and starts serving at given port&lt;/li&gt;
&lt;li&gt;It connects to Consul agent and register service with TTL check&lt;/li&gt;
&lt;li&gt;Every TTL/2 seconds we check service status by PINGing Redis and send Pass check&lt;/li&gt;
&lt;li&gt;Should Redis connectivity fail we detect it and send a Fail check that will remove our service instance from DNS and HTTP query to avoid returning errors or invalid data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see it in action you need to launch a Consul and Redis. You can launch Consul with &lt;code&gt;consul agent -dev&lt;/code&gt; or start a normal cluster. How to launch Redis depends on your distro, in my Fedora, it’s just &lt;code&gt;systemctl start redis&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now launch the &lt;code&gt;webkv&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./webkv -addrs localhost:6379 -port 8888
2017/12/14 19:00:29 Listening on :8888
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Query the Consul for services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ dig +noall +answer @127.0.0.1 -p 8600 webkv.service.dc1.consul
webkv.service.dc1.consul. 0 IN A 127.0.0.1

$ curl localhost:8500/v1/health/service/webkv?passing
[
    {
        "Node": {
            "ID": "a4618035-c73d-9e9e-2b83-24ece7c24f45",
            "Node": "alien",
            "Address": "127.0.0.1",
            "Datacenter": "dc1",
            "TaggedAddresses": {
                "lan": "127.0.0.1",
                "wan": "127.0.0.1"
            },
            "Meta": {
                "consul-network-segment": ""
            },
            "CreateIndex": 5,
            "ModifyIndex": 6
        },
        "Service": {
            "ID": "webkv",
            "Service": "webkv",
            "Tags": [],
            "Address": "",
            "Port": 0,
            "EnableTagOverride": false,
            "CreateIndex": 15,
            "ModifyIndex": 37
        },
        "Checks": [
            {
                "Node": "alien",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 5,
                "ModifyIndex": 5
            },
            {
                "Node": "alien",
                "CheckID": "service:webkv",
                "Name": "Service 'webkv' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "webkv",
                "ServiceName": "webkv",
                "ServiceTags": [],
                "Definition": {},
                "CreateIndex": 15,
                "ModifyIndex": 141
            }
        ]
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if we stop the Redis we’ll see the log messages&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
2017/12/14 19:29:19 err="Check failed" msg="EOF"
2017/12/14 19:29:27 err="Check failed" msg="dial tcp [::1]:6379: getsockopt: connection refused"
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and that Consul doesn’t return our service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ dig +noall +answer @127.0.0.1 -p 8600 webkv.service.dc1.consul
$ # empty reply

$ curl localhost:8500/v1/health/service/webkv?passing
[]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starting Redis again will make service healthy.&lt;/p&gt;

&lt;p&gt;So, basically this is it – the basic Web service with Consul integration for service discovery and health checking. Check out the full source code at &lt;a href="https://github.com/alexdzyoba/webkv"&gt;github.com/alexdzyoba/webkv&lt;/a&gt;. Next time we’ll add metrics export for monitoring our service with Prometheus.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
