DEV Community: Chris Hunt

Rancher Personal Server Setup

Chris Hunt — Mon, 28 Oct 2019 21:58:54 +0000

Introduction

By profession, I'm a Software Engineer. Like many others in the same line of work, I've accumulated a few low traffic sites and apps which I've put together for friends and myself. The problem isn't specifically the sites and apps, it's the hosting. I want a simple and cheap hosting solution that don't have to worry about. Something that's easily maintainable and upgradable but also easy to use.

A few years ago, I discovered Rancher which is a Docker orchestration stack. It used its own orchestration engine called Cattle upon which ran a user friendly UI that allowed me to host my sites and apps.

Rancher v1 on an EC2 has served me well for nearly three years but over that time my server had a number of little hacks and quirks regarding routing and certificates which were not easily replicable. Rancher v1 is also not being maintained since v2 was released.

It was time to build a new server. I'm not a sysadmin or a network guy so I really want a simple solution that just works.

Requirements

My requirements haven't really changed from my previous setup with Rancher v1. Let's look at those requirements:

A cheap, cloud based server
A Docker orchestration system with project separation
An administration UI
Access to ECR
Ability to host under 10 low traffic websites and a couple of long running Node apps
A way of distributing HTTP requests to containers
Certificate generation and management
Ability to scale if required

Step up Rancher v2

The documentation for Rancher v2 promised to solve the routing and certificate hacks with out of the box functionality. Rancher v2 uses Kubernetes as it's orchestration engine too which is better documented than Cattle was. This seemed a good place to start but the migration docs (https://rancher.com/docs/rancher/v2.x/en/v1.6-migration/) seemed quite a faff. I decided to start from a clean install.

Spoiler alert: Rancher v2 lived up to the billing and gave me exactly what I required however there were a few setup hoops to jump through to get to that situation - hence writing this article hoping it may help others.

Server setup

I started by setting up a t2.medium EC2 instance using Amazon Linux 2 AMI with 20GB EBS storage and an Elastic IP along with my key pair.

Ensure that the Security Group of the instance has inbound access to ports 80, 8080, 443 and 8443. This will allow requests to both the Rancher UI (via 8080 and 8443) and to our hosted sites (through 80 and 443).

Use a suitable key pair to secure your access. This is beyond the scope of this article but full details can be found at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html

The next thing is to ensure our server has a fixed IP. By default, the IP will get allocated when the server boots up. This means that when we restart our server, it may fire up with a different IP. From the AWS console, in EC2 Dashboard, select Elastic IP from the left hand menu. We can allocate a new IP and then allocate it to our instance. It's important that we release IP addresses which are not in use as we will be charged for IPs that you have allocated but not associated with an instance. You do not pay for IPs which are associated with an instance.

Once fired up, we can access your instance using your key through terminal:

ssh -i ~/.ssh/key.pem 52.16.31.9

and you're greeted with the following prompt.

Start by doing an update of the software on the server. sudo yum update. This may take a minute or so to get everything up to date.

The only software required on the server to run Rancher is Docker. This is a simple case of following the tutorial at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html#install_docker.

One additional step is to ensure that Docker starts when the server boots using this command...

sudo systemctl enable docker

So far, so good!

Install and access Rancher

Installing Rancher server and SSL certificate

Next we install Rancher and access the UI. Hoorah for online docs. Rancher's single node installation guide covered everything I needed to know - https://rancher.com/docs/rancher/v2.x/en/installation/single-node/

I wanted the Rancher data to be persisted in case my container had issues
I could rescue that data.

I also wanted to use Let's Encrypt to provide a certificate for my Rancher UI access.

In order to get the certificate, we need to start the server container bound to ports 80 and 443

docker run -d --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  -v /opt/rancher:/var/lib/rancher \
  rancher/rancher:latest --acme-domain mydomain.com

The container will now fire up and grab an SSL certificate from Let's Encrypt. You should now be able to connect to your server at https://mydomain.com (or whatever you've set up to point at your IP). You'll be asked to set a password for your admin account and also for the domain your Rancher UI is going to run on. We'll amend the URL to add a port :8443 to our URL.

As I was also going to be running the agent on the same node (server) as the Rancher server, we need to bind ports 80 and 443 to different ports (8080 and 8443 respectively).

This means we have to do a little juggling with our containers. We can do this as we've already got our SSL certificate. We need to remove our current container and then fire it up on the new ports.

To do this, list the Docker containers with docker ps and then remove the running container with docker rm -fv 637 where 637 is the first few characters of the container ID. See example below.

[ec2-user@ip-172-31-12-141 /]$ docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                      NAMES
6372f4c2ae95        rancher/rancher:latest   "entrypoint.sh --acm…"   18 minutes ago      Up 18 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   clever_pike
[ec2-user@ip-172-31-12-141 /]$ docker rm -fv 637
637

Fire it up again with the new port mappings:

docker run -d --restart=unless-stopped \
  -p 8080:80 -p 8443:443 \
  -v /opt/rancher:/var/lib/rancher \
  rancher/rancher:latest --acme-domain mydomain.com

Because we've stored our data on a host volume, all of our settings and our certificate have been persisted.

Within a few seconds, I could hit mydomain.com:8443 in my browser and Rancher was up and running.

Adding a Cluster

We have Rancher Server set up. We now need a cluster (which, as we've previously discussed, in this case will be a single node) to run our applications on.

From the main menu, click on Clusters > Add Cluster.

Rancher offers a lot of options to add a cluster from different providers. It will provision the resources for you. We're going to add a cluster from an existing node (server).

Adding a cluster has a lot of options but we'll concentrate on the basics to get up and running. The most basic is to just give the cluster a name. Click Next and we're presented by Docker command.

We want our agent running with all the roles etcd, Control Panel and Worker. Check all these boxes.

sudo docker run -d --privileged --restart=unless-stopped \
   --net=host -v /etc/kubernetes:/etc/kubernetes \
   -v /var/run:/var/run rancher/rancher-agent:v2.3.0 \
   --server https://mydomain.com:8443 \
   --token cp6jcp9lcvw8b279brstp92bvfkg8xgv8b6dkkp9xz7n6ktxqsctzq \
   --etcd --controlplane --worker

Copy and paste that command on to our server terminal to fire up our worker. Rancher also starts Kubernetes services behind the scenes. If you want to see what Rancher has set up for us, run docker ps. This lists the running containers. At the bottom, we can see the Rancher Server with our external mapped ports and then the remaining containers are managing our agent.

Back in the UI, we're informed of the status of the agent coming up. This takes a few minutes as each agent image needs to be downloaded and started.

Take some time to have a look around the UI. Some of the features are quite self explanatory and a bit of exploring should find those.

A few things I'd recommend looking at at this point (but beyond the scope of this article):

Core Rancher settings
Cluster settings
Change your default security provider (I went for Github) and add a user
Add namespaces. Our apps will later be placed under a namespace. Namespaces help us separate our sites and apps.

Accessing ECR

I use ECR as a registry for my Docker images so we need to allow access to pull images. This can be set up using an instance profile on EC2 with access to the ECR registry.

From the EC2 console, select the instance and then Actions > Instance Settings > Attach/Replace IAM Role. From here, we can create an IAM role through the screen prompts and attach the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ECRGetImage",
            "Effect": "Allow",
            "Action": [
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability"
            ]
        }
    ]
}

This gives our server full read permissions to all of our images on ECR. This ensures that our server has access and Kubernetes manages login to the repository.

Setting up SSL for sites

The last stage before we start adding our applications is setting up SSL.

This is achieved by adding the Let's Encrypt Certificate Manager application. In the context of Rancher, an application is a preconfigured image which we can launch directly from the Rancher UI.

Open up our cluster and click on Apps from the menu.

Click Launch and select the Let's Encrypt Certificate Manager. We are now presented with several options to get this started. We need to change the issuer from the stage issuer to the prod issuer using the option Let's Encrypt Cluster Issuer clients. We also need to enter our email address.

Provisioning our first workload

A workload is a containerised application. Both our sites and our apps are workloads.

Now that we've done all of the server setup, deploying a workload is little more than an exercise of completing a UI form. From our cluster, click the Deploy button. We're presented by an intuitive form where most of the options will be familiar to Docker users.

Give your workload a name, enter the full ECR image name, complete any other options as required and click the Launch button. Our workload should fire up.

If your app doesn't require connecting to the outside world, you're done. Our web app needs a couple more steps however.

We want to set up a load balancer to direct the traffic coming in to our server to the correct workload determined by the host name. From the cluster menu, select the Load Balancers tab followed by the Add Ingress button. Add a name for the load balancer and then ensure that it's on the same namespace as the website workload that you set up.

We then need to set up the rule which will direct traffic to our workload. The form looks as below. Enter the host name. To direct all traffic (rather than only a sub path, enter / in the path input box. Select the web app workload that we've just set up and the port that that workload accepts requests on.

We now have requests on port 80 directed to our workload. The final step is to ensure that we can also accept secure requests on port 443. The instructions to complete this are on the introduction of the Certificate Manager app.

Back on our Load Balancers tab, we can use the menu for our load balancer and select View/Edit YAML. The first thing we need to add is in the metadata.annotations section.

kubernetes.io/tls-acme: "true"
certmanager.k8s.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/secure-backends: "true"

This is a one time addition. We then need to add the following for each of the sites which we're setting up:

spec:
  tls:
  - hosts:
    - www.mydomain.com
    - mydomain.com
    secretName: mydomain-crt

Note that we can add multiple domains to a certificate. The certificate will be saved in Kubernetes secrets and the secret name is defined here.

Upon saving the YAML, cert-manager should kick in and find this config and obtain a certificate from Let's Encrypt. Assuming we've pointed our host name at our server, the site should now be available on port 443 too.

Troubleshooting

I got to the situation above through a little trial and error. I have noted a few situations which popped up. Here's a few places to find information to help you debug a situation.

Pod logs

Assuming you've only set up one pod for your workload, you can access the log of that pod by selecting the workload. When presented with a list of pods, you can use the dropdown menu to view the log of the specific pod.

This will hopefully help us determine whether requests are reaching the pods.

If you have more than one pod per workload, it may be worth reducing deployed pods to one so that you know where requests should be headed.

Cert Manager logs

If you can't access your site on port 443 then it's worth checking that your certificate was created correctly. This can be done through the certificate manager logs.

Click on your certificate manager workload within your cluster and then from the pod, select View Logs.

In the logs, try searching for your host name. Hopefully there will be a message which will help you. This may be something like port 80 wasn't accessible or that your configuration wasn't correct. The couple of errors that I've had in here were well worded and finding the solution wasn't a problem.

ECR Permissions

On one occasion, I had an issue where a Docker image could not be pulled from ECR. For some reason, the EC2 Profile wasn't present. Restarting the server fixed the issue.

If you wanted to check the user and role within the EC2 instance, you can run the following command:

aws sts get-caller-identity

What next?

Walking through the steps above achieves the goals I set out at the start. We have a server with user friendly administration. We can simply set up other services through the UI.

Use Rancher namespaces and projects to separate and organise your workloads. While it's pretty easy to manage a couple of workloads, once you add database apps, logging apps, admin UIs connecting to your workloads, you'll certainly appreciate a way of organising them.

I'd recommend getting familiar with the Secrets functionality in Rancher and how to link them in to your application. This will assist making your server more secure.

Try scaling the cluster with another node by firing up another server, installing Docker and then go through the same process of adding a node as we looked at earlier (but with only a worker role).

Final points

I pulled all the points above from a number of tutorials, articles and Stack Overflow questions. Put that together with a decent amount of trial and error, I got my personal server set up in a user friendly and maintainable state. I'm not a "server guy" and don't have in depth knowledge of a lot of the concepts I've dabbled in so I welcome feedback and thoughts on improvement of the process and article.

Code-free machine learning with Ludwig

Chris Hunt — Sat, 09 Mar 2019 19:35:10 +0000

Intro and Ludwig

At the start of February 2019, Uber made their code-free machine learning toolbox, Ludwig, open-source.

Website - https://uber.github.io/ludwig/
User guide - https://uber.github.io/ludwig/user_guide/
Github repo - https://github.com/uber/ludwig/

Ludwig runs on top of the popular and powerful TensorFlow library and offers CLI access to experiment and train machine learning models and predict using TensorFlow models

As an engineer, I'm absolutely not a data scientist. I know enough around TensorFlow to build the most basic of models using tutorials but really couldn't create anything from scratch. Ludwig offered that opportunity.

Our first experiment

Let's dive in and run through a basic example. We're going to try to recreate the Keras tutorial at https://www.tensorflow.org/tutorials/keras/basic_regression with zero lines of code.

The dataset shows basic data to cars in the Auto MPG dataset. Our task is to predict the MPG from the features provided. I've grabbed this and converted it to a CSV file for use in this example.

Ludwig uses a model definition file to determine the parameters for building the model. The internals of Ludwig deals with your data. It creates train, test and validation datasets. It also modifies the data into the best format for training depending on the data type you've specified.

The Keras example needs us to manipulate the data in order to train and test the model. Ludwig does all of this for us. It allows us to train the model immediately by setting up a model definition file at modeldef.yaml. Here we define the input features and their data type. There are a number of other parameters against each feature which can be set for more complex models. We also define the output feature and its parameters.

input_features:
  - 
    name: Cylinders
    type: numerical
  - 
    name: Displacement
    type: numerical
  - 
    name: Horsepower
    type: numerical
  - 
    name: Weight
    type: numerical
  - 
    name: Acceleration
    type: numerical
  - 
    name: ModelYear
    type: numerical
  - 
    name: Origin
    type: category
output_features:
  - 
    name: MPG
    type: numerical

First run

Our first experiment can now be run with the following command:

ludwig experiment  --data_csv cars.csv --model_definition_file modeldef.yaml --output_directory results

This gives the following results:

===== MPG =====
loss: 52.9658573971519
mean_absolute_error: 6.3724554520619066
mean_squared_error: 52.9658573971519
r2: 9.58827477467211e-05

After 200 epochs complete, I have a mean absolute error (MAE) of 6.4 (yours may vary slightly depending on the random train/test split). This means that on average MPG prediction on a car is 6.4MPG from the actual value. Bearing in mind that values are generally between 10MPG and 47MPG, that 6.4MPG represents quite a large error.

Refinement

If you were watching the log scrolling as Ludwig was running, you'd have seen the MAE against the validation set reducing with each epoch.

The Keras example was suggesting a final MAE of ~2 so we may need a bit of tweaking to get closer. There was fair indication that the MAE was still decreasing as the run ended. We can increase the amount of epochs with a simple addition to the addition to the model definition

training:
  epochs: 400

and continue from the previous training model with the command

ludwig experiment  --data_csv cars.csv --model_definition_file modeldef.yaml --output_directory results -mrp ./results/experiment_run_0

Our MAE only comes down to 5.3MPG. Still not that close.

Further refinement

In a real life example, we'd start amending the hyperparameters, retraining, amending and retraining again while our target MAE still reduces.

We'll skip this step by replicating the hyperparameters from the Keras tutorial:

training:
  batch_size: 32
  epochs: 400
  early_stop: 50
  learning_rate: 0.001
  optimizer:
    type: rmsprop

In addition, we set early stop at 50 epochs - this means that our model will stop training if our validation curve doesn't improve for 50 epochs. The experiment is fired off in the same way as before. It produces these results:

Last improvement of loss on combined happened 50 epochs ago

EARLY STOPPING due to lack of validation improvement, it has been 50 epochs since last validation accuracy improvement

Best validation model epoch: 67

loss: 10.848812248133406
mean_absolute_error: 2.3642308198952975
mean_squared_error: 10.848812248133406
r2: 0.026479910446118703

We get a message that our model has stopped training at 132 epochs because it's hit the early stop limit.

MAE is down to 2.36MPG without writing a line of code and we've got our example to similar results to the Keras tutorial.

Visualising our training

Now we'd like to validate that our test and validation loss curves are getting pretty close but not showing overfitting. Ludwig continues to deliver on its promise of a no-code solution. We can view our learning curves with the following command:

ludwig visualize -v learning_curves -ts results/experiment_run_0/training_statistics.json

The curves remain following a similar trajectory. Should the validation curve start heading upwards while the training curve remain on this trajectory, it would suggest that overfitting is occurring.

Real life validation

Ok, this is all well and good but tutorials notoriously pick and choose data so the output "just works". Let's try our model out with some real data.

With a bit of investigation, I've dug out the required stats of the DeLorean DMC-12 (https://en.wikipedia.org/wiki/DMC_DeLorean):

Cylinders:     6
Displacement:  2849cc (174 cubic inches)
Horsepower:    130hp
Weight:        1230 kg (2712 lb)
Acceleration:  10.5s
Year:          1981
Origin:        US

and converted it to the same CSV format as the training data:

Cylinders,Displacement,Horsepower,Weight,Acceleration,ModelYear,Origin
6,174,130,2712,10.5,81,1

Now, to predict the fuel economy of this, we run the predict command through Ludwig:

ludwig predict --data_csv delorean.csv -m results/experiment_run_0/model -op

We specify the -op flag to tell Ludwig that we only want predictions. Inputting a CSV file with MPG column and not adding this flag will run the predictions but also provide us with statistics against actual values supplied in the file.

The result given by my model is 23.53405mpg. How good is this? Unfortunately our Wikipedia article doesn't show the published fuel economy but I did manage to find it in this fantastic article about the amazing car - 22.8mpg. A pretty decent real life test!

Summary

I appreciate that the data scientists out there are screaming that we didn't run through any analysis on the input features to create a meaningful feature set and that we didn't run specific analysis on the test data predictions. I also appreciate that MAE isn't necessarily the ultimate measure of accuracy as it may be skewed heavily by outliers which we could have validated through further analysis.

What we have shown is that using Ludwig, we can experiment and train a machine learning model and then predict using the model we've trained.

Machine learning is becoming more and more accessible. Ludwig seems to be big step forward in that regard.

Why you need a basic ML understanding

Chris Hunt — Fri, 11 Jan 2019 17:23:52 +0000

There are points in your career where you discover a piece of software or a library or a technique that you wish you'd known about years ago. You can see how it could have saved you hours and made your life so much easier in a previous job or project.

For me, Docker and MongoDB both fall into this category. Although I was relatively early to the party on both of those, the impact they had on my day to day work and ability to deliver rapidly changed the way I work.

Machine learning is the latest. Until relatively recently, I may have been a little naive in believing that ML was something that was exclusive to monster corporations with huge budgets and specialised ML staff who have access to mega servers. This is the situation that most tech news stories related to ML portray and, while at the top end of the spectrum this may be true, it is a spectrum and the basics are not that hard to grasp and access.

To make it clear, machine learning (and all it's sub categories) is a massive subject and I'm not suggesting that in a few hours you can train your car to go driverless. I am suggesting that, as a developer, you should be aware of the kind of jobs it can and can't do, how it can benefit you (and your company) and, should you wish to implement a basic ML solution, the route you'd need to take to get there. Or, to put it another way, you should have the ability to hold a conversation around ML when your manager/director/project manager inevitably drops into the conversation "I've heard so much about machine learning - I think we should be using it".

"We don't need Machine Learning"

Maybe you work in a company in which you believe ML couldn't offer any value. You don't make driverless cars and you don't run petabyte scale search engines. There's a good chance, however that if your company works with any significant amount of data, ML may be able to benefit you.

Let me throw an example to you and explain how ML could open up some other ideas.

Let's say the sales department of your e-commerce company asks for an alert when a new customer makes their first purchase with a flag suggesting whether that customer is likely to become a long term customer so that the sales team can follow up personally. All we have to go on is the existing data of a few ten thousand existing customers.

A year ago, as an experienced engineer with no ML knowledge, I'd have embarked upon looking for patterns in the existing customers, first purchases, potentially their location and their further history. I'd have looked to write a function with multiple if statements which determined their status. Job done.

The approach for ML also starts with looking at the data. We analyse the data and clean it into a way the at our model will be able to read ("features" is the term used in ML) and the outcome ("label") of whether they become a long term customer which we will know for existing customers. We could then train a machine learning model to associate the features to the labels. With a few tens of thousands of customers, this training wouldn't be too expensive on a decent PC (probably to the contrary of the impression we may have given ML articles in the press), We can then test this model to find its accuracy. If sufficient, we are in a position that our model can predict whether new customers are likely to become long-term customers or not.

At this point we've approached the same problem in two different ways and potentially produced two workable solutions. However six months later, the sales department inform you that the flag isn't as accurate as it used to be. Something has changed in the market and so have the habits of the customers and it needs fixing. With our engineering method, we'd potentially have to restart from the beginning and rewrite our function. However, with our ML solution, we may only need to retrain our model with the new, more recent data in order to improve its performance. In time we could train our model with new data as it arrives and your model will always be up to date.

So that's easy for me to say and just give an example but I know what you're thinking....

"I don't have the skills"

And neither do I. I'm not a data scientist and my experience of ML doesn't go beyond working with data scientists on a daily basis and a basic ML course in Python on Udemy which I'd highly recommended if you want a basic start. What I do have however, is knowledge of the kind of problems ML can solve. This is the tool I suggest all developers can benefit from. Having this will allow you to recognise when a ML solution may be useful and potentially more efficient for you.

Summary

If I've convinced you to have a look into ML, I'd suggest this video as a good follow on which will give you some very basic code examples. Laurence does a great job of introducing ML without going into heavy mathematical formulae. He also explains the crux of ML versus "traditional programming" really well.

Ultimately, traditional programming takes in rules and data and produces answers. Machine learning takes in answers and data and produces the rules (the ML model).

This approach is particularly useful as requirements become more complex or obscure (as examined in the brief e-commerce example).

Putting all of this together, I truly believe that the developer who can have a solid conversation around machine learning because they have an understanding of what it can do, will have a significant advantage over those who don't both within most workplaces and then further in to the job market. It's a fast moving industry.

Give it a go!

Elasticsearch is different

Chris Hunt — Sun, 11 Mar 2018 23:42:38 +0000

Introduction

I'll warn you now, I may sound like an Elasticsearch salesperson in this article but I can assure you that I'm completely impartial.

Until six months ago, I'd used Elasticsearch little more basic document retrieval on simple search terms. I almost resented it as another query "language" to learn. That all changed when I moved into a position which used ES extensively and I needed to get up to speed quickly.

I read the book Relevant Search (https://www.manning.com/books/relevant-search) in two weeks. A superb read which really highlighted the difference between basic document retrieval and real relevant search. I began to see the real power in Elasticsearch. It fired up my imagination in to ways I could have used it with great effect in previous projects.

I wished I'd taken some time to understand the features it offers before now and I believe it could be very valuable to many developers to at least have an overview which is what I hope to give here. It may enable you to make a better decision on software selection.

Notable features

Search engine

Ok, so it's a search engine but your company doesn't require a "search engine". You use some database queries to your NoSQL/relational database which does a decent job?

That's what I thought however the key here is relevance. It's about getting the user where they want/need to be as quickly and as simply as possible. Yes, you can write a database query which puts some data in front of a user which roughly relates to what they put in the search box. And you can tinker with query but it soon becomes very complex and hard to maintain.

Consider search results specific to a user weighted by their previous searches, their favourites, their age, their gender. I don't envy you having to write that SQL query!

Elasticsearch will allow you to query on any of your indexed fields. You can filter your results to create a context for your search. For example, if you the user had specifically said they are searching for a book in your multi-department store, you would filter much as you would in the WHERE clause of a SQL statement.

The good stuff comes when you start telling Elasticsearch that the result may have a certain term or that term but one term should be boosted above the other by a certain factor. You may want to boost more highly rated products or maybe you have some stock that you need to move, that could be boosted in the results.

If you had a library of articles, you may want the most recent ones to be boosted above older articles. If your results were location based, you could boost on distance from the user.

I hope this starts giving you an idea of what relevance is and how a search engine could be useful after all. But that's just the start...

Auto-suggest

ES has built in functionality for auto-suggest. When the user is typing in the search box, you can preempt their request and put it in front of them as they type. Once again, this may have been something you could do with your current database.

ES can be "smarter" though. You can build your indexes so that it looks in any part of words or phrases in multiple fields. If indexed well with good thought, ES can return its results very quickly and minimal overhead on the server.

Aggregation and Facets

ES can return aggregation results in the same result set as the query results. This alone is an appealing feature but it really makes sense. When searching for something, if a user can't find what they were looking for in the results returned, they want an easy step to filter their result set further. Consider I've just searched for TVs on Amazon and it returns thousands of results, how do I get closer to what I want? Facet filters - these are the properties that Amazon display on the left hand side of their desktop page. It allows me to filter further on brand, size, price range etc. ES can easily return these facets and counts for the result set returned.

Aggregations follow what you'd expect from any other database and allowing a pipeline to really dig in to your results.

More like this

A very simple feature to implement but really powerful. It does exactly as it suggests. It allows you to point to a document (or more) and tell ES that you want other results similar to the document you've suggested. This can be very useful for a user to potentially continue their journey on your site or an alternative product similar to the one they're viewing in a shop.

Kibana and visualisations

Kibana is a web UI allowing you to easily build visualisations for your data. It's relatively simple to use and really helps you dig deep in to your data.

It offers all the standard graphing types but also some quite imaginative alternatives. All of these can be put together in dashboards which can be shared and displayed in other pages or on a big screen.

Summary

To make it clear, while I make several comparisons to databases, I'm not suggesting that Elasticsearch should replace your database however in many cases I believe it can complement your database.

It could open up functionality to your product that you maybe didn't consider possible. It's very, very fast so can really improve your user experience in places and potentially reduce the load on your database server.

Even if you can't think of a use case within your products, it could potentially open up a whole now world to your logs and analysis on your product performance. With the ability to analyse your data in time series data and dig deep, it could really revolutionise the way you look at your product.

Developer job application assessments are a two way thing

Chris Hunt — Sun, 11 Mar 2018 21:05:04 +0000

When looking for a new developer job, a fair amount of positions require some kind of assessment by the company in order that developers can justify their claimed skills.

In my opinion, these assessments reflect on the company too. As a developer, this is an early chance to see what a company values in their developers. Having been invited to complete a number of these tests, I've seen both ends of the spectrum which adjusted my views on the company I was applying to.

These assessments can be time consuming and both sides (applicant and company) should value the time put in. The applicant should be able to get a feel for the company in what they're being asked to do. It should also be challenging so that the applicant can feel some reward in completing the assessment.

Very recently, I was asked to complete a 50 minute multiple choice test. Questions were generally around syntax of code printed in a non-monospaced font without syntax highlighting. I lasted about five minutes and even less questions before I sent an email back to the company explaining that their job spec highlighted the need for imaginative and versatile developers but their test promoted some kind of robot-like code monkey and that I wanted to withdraw my application. This response may have appeared arrogant to the company in question however if taken in the spirit it was intended, may highlight the need for a review in their hiring process.

I believe that in attempting to assess the ability of an applicant, a company can and should also impress the "spirit" of the job in that process.

My preferred method is for a project kind of problem which allows the applicant to be imaginative and to show why they should be top candidate for a role. It also allows for discussion in any further interviews around thought processes and techniques. These can often be lengthy (3 or 4 hours) but a quality application can almost secure a the role even before further interviewing.

What do you think is the ideal way for a company to assess the standard of an applicant while promoting the position and the company?

Sharpening out of date skills

Chris Hunt — Wed, 07 Mar 2018 18:55:50 +0000

Following a set of unfortunate circumstances I found myself out of work for the first time in nearly 20 years. In itself this isn't a large concern however what I noticed when updating my CV, was more so - my core skills had been sorely neglected.

What happened?

Working for a company generally gives you a limited set of projects to work on and if there's no business reason to upgrade/refactor code/rewrite tests to use the latest and greatest software or library versions, that learning may not be available in the work environment.

How about outside work? I've always used my personal projects to teach myself new skills, technologies and techniques. Ah yes - but those funky libraries, niche techniques and third party services which I spent hours implementing in a made-up project which was never going to see the light of day aren't what gets you employed. It's the core skills that do that and they needed some polishing.

I guess that fun side projects aren't considered fun if you're using the same technologies that you use in the office every day.

The knowledge gaps

I identified several areas which really concerned me as they were my headlines skills on my CV. For example:

MongoDB was a two point releases ahead of any version I've used in projects
While my PHP projects were running on PHP7, I hadn't upgraded any of the code to take advantage of any PHP7 functionality
Most of my front end work has been built using Bootstrap 3. Bootstrap 4 has since moved through Alpha, Beta and now in release.
My projects were being run in a Docker environment on EC2 however the "deployment" was still very manual

Time for action

I've read plenty about PHP7, MongoDB 3.6 and Bootstrap 4 to have a good idea what was available and of the new functionality but I'd never used it. I knew for a fact that reading alone doesn't embed knowledge. I learnt that from reading a lot about Promises, await/async functions and generators in Javascript. When a came to use them for the first time, I was floundering.

I considered how to approach this knowledge upgrade and decided I needed to structure my actions in order that there was value beyond just learning for the sake of it. It was also very important to try to make this process assist my job searching.

I considered the following approach:

Identify areas for concern and improvement
Prioritise these areas
Read and understand each area
Identify a use case for each area
Action

I won't go in to my findings and discoveries for each step as they're very specific to me but generally the process started to fall in to place quite well with some nice discoveries along the way.

I found a course on MongoDB University about the new features in MongoDB 3.6 (https://university.mongodb.com/courses/M036/about).
A couple of the 3.6 features made complete sense to implement in a one of my personal projects
I discovered Bitbucket pipelines which allowed me to build and deploy my Docker containers straight to my server on git tag

Within a couple of weeks, I felt my core skills were back in good shape and I could speak with confidence about new features, not just having read about them, but having implemented solutions using many of them.

Lessons learnt

What's become clear to me as a developer is that I can't neglect my core skills again. Development is a fast moving area and as a developer, I owe it to keep up to speed.

Obviously, a work life balance is important so we can't get home from work each night and spend hours on training therefore I think it's important to push harder in a work environment to enable learning. This obviously needs to be "sold" to the company. As I mentioned right at the start, often if there's no business requirement or value to upgrade your internal products/software, it doesn't get done.

I believe there is value to most companies to allow developers to keep up to date though:

Developers will generally be happier that they are using newer versions of products and that they are being allowed to learn on the job.
The company is likely to be more attractive when hiring developers.
New features in upgraded software/libraries may enable the possibility of increasing the feature set of the company product in which it's being used.
...and obviously newer versions of products often have security and speed improvements.

A lot of this is "hidden" value which will rely the developer selling the idea to the company and for the company to be forward thinking enough to see the value.

They are the kind of companies I'll be looking at as the job search continues.

DEV Community: Chris Hunt

Rancher Personal Server Setup

Introduction

Requirements

Step up Rancher v2

Server setup

Install and access Rancher

Installing Rancher server and SSL certificate

Adding a Cluster

Accessing ECR

Setting up SSL for sites

Provisioning our first workload

Troubleshooting

Pod logs

Cert Manager logs

ECR Permissions

What next?

Final points

Code-free machine learning with Ludwig

Intro and Ludwig

Our first experiment

First run

Refinement

Further refinement

Visualising our training

Real life validation

Summary

Why you need a basic ML understanding

"We don't need Machine Learning"

"I don't have the skills"

Summary

Elasticsearch is different

Introduction

Notable features

Search engine

Auto-suggest

Aggregation and Facets

More like this

Kibana and visualisations

Further reading

Summary

Developer job application assessments are a two way thing

Sharpening out of date skills

What happened?

The knowledge gaps

Time for action

Lessons learnt