DEV Community: Tryolabs

Benchmarking Machine Learning Edge Devices

Tryolabs — Tue, 15 Oct 2019 20:32:11 +0000

This post was originally published here.

Why edge computing?

Humans are generating and collecting more data than ever. We have devices in our pockets that facilitate the creation of huge amounts of data, such as photos, gps coordinates, audio, and all kinds of personal information we consciously and unconsciously reveal.

Moreover, not only are we individuals generating data for personal reasons, but we’re also collecting data unbeknownst to us from traffic and mobility control systems, video surveillance units, satellites, smart cars, and an infinite array of smart devices.

This trend is here to stay and will continue to rise exponentially. In terms of data points, the International Data Corporation (IDC) predicts that the collective sum of the world’s data will grow from 33 zettabytes (ZB) in 2019 to 175 ZB by 2025, an annual growth rate of 61%.

While we’ve been processing data, first in data centers and then in the cloud, these solutions are not suitable for highly demanding tasks with large data volumes. Network capacity and speed are pushed to the limit and new solutions are required. This is the beginning of the era of edge computing and edge devices.

In this report, we'll benchmark five novel edge devices, using different frameworks and models, to see which combinations perform best. In particular, we'll focus on performance outcomes for machine learning on the edge.

Want to jump directly to the performance outcomes? See the results in the interactive dashboard in this interactive benchmark report!

What is edge computing?

Edge computing consists of delegating data processing tasks to devices on the edge of the network, as close as possible to the data sources. This enables real-time data processing at a very high speed, which is a must for complex IoT solutions with machine learning capabilities. On top of that, it mitigates network limitations, reduces energy consumption, increases security, and improves data privacy.

Under this new paradigm, the combination of specialized hardware and software libraries optimized for machine learning on the edge results in cutting-edge applications and products ready for mass deployment.

The biggest challenges to building these amazing applications are posed by audio, video, and image processing tasks. Deep learning techniques have proven to be highly successful in overcoming these difficulties.

Enabling deep learning on the edge

As an example, let’s take self-driving cars. Here, you need to quickly and consistently analyze incoming data, in order to decipher the world around you and take action within a few milliseconds. Addressing that time constraint is why we cannot rely on the cloud to process the stream of data but instead must do it locally.

The downside of doing it locally is that the hardware is not as powerful as a super computer in the cloud, and we cannot compromise on accuracy or speed.

The solution to this is either stronger, more efficient hardware, or less complex deep neural networks. To obtain the best results, a balance of the two is essential.

Therefore, the real question is:

Which edge hardware and what type of network should we bring together in order to maximize the accuracy and speed of deep learning algorithms?

In our quest to identify the optimal combination of the two, we compared several state-of-the-art edge devices in combination with different deep neural network models.

Benchmarking novel edge devices

Based on what we think is the most innovative use case, we set out to measure inference throughput in real-time via a one-at-a-time image classification task, so as to get an approximate frames-per-second score.

To accomplish this, we evaluated top-1 inference accuracy across all categories of a specific subset of ImagenetV2 comparing them to some ConvNets models and, when possible, using different frameworks and optimized versions.

Hardware accelerators

While there has been much effort invested over the last few years to improve existing edge hardware, we chose to experiment with these edge devices:

Nvidia Jetson Nano
Google Coral Dev Board
Intel Neural Compute Stick
Raspberry Pi (upper bound reference)
2080ti NVIDIA GPU (lower bound reference)

We included the Raspberry Pi and the Nvidia 2080ti so as to be able to compare the tested hardware against well-known systems, one cloud-based and one edge-based.

The lower bound was a no-brainer. At Tryolabs, we design and train our own deep learning models. Because of this, we have a lot of computing power at our disposal. So, we used it. To set this lower bound on inference times, we ran the tests on a 2080ti NVIDIA GPU. However, because we were only going to use it as a reference point, we ran the tests using basic models, with no optimizations.

For the upper bound, we went with the defending champion, the most popular single-board computer: the Raspberry Pi 3B.

Neural network models

There are two main networks we wanted to include in this benchmark: the old, well-known, seasoned Resnet-50 and the novel EfficientNets released by Google this year.

For all benchmarks, we used publicly available pre-trained models, which we run with different frameworks. With respect to the Nvidia Jetson, we tried the TensorRT optimization; for the Raspberry, we used Tensor Flow and PyTorch variants; while for Coral devices, we implemented the Edge TPU engine versions of the S, M, and L EfficientNets models; and finally, regarding Intel devices, we used the Resnet-50 compiled with OpenVINO Toolkit.

The dataset

Because all models were trained on an ImageNet dataset, we use ImageNet V2 MatchedFrequency. It consists of 10,000 images in 1,000 categories.

We ran the inference on each image once, saved the inference time, and then found the average. We calculated the top-1 accuracy from all tests, as well as the top-5 accuracy for certain models.

Top-1 accuracy: this is conventional accuracy, meaning that the model’s answer (the one with the highest probability) must equal the exact expected answer.

Top-5 accuracy: means that any one of the model’s top five highest-probability answers must match the expected answer.

Something to keep in mind when comparing the results: for fast device-model combinations, we ran the tests incorporating the entire dataset, whereas we only used parts of datasets for the slower combinations.

Results & analysis

The interactive dashboards display the metrics obtained from the experiments. Due to the large difference in inference times across models and devices, the parameters are shown in logarithmic scale.

GO TO DASHBOARDS

How we used IoT and computer vision to build a stand-in robot for remote workers

Tryolabs — Thu, 13 Dec 2018 15:52:18 +0000

By Lucas Micol. The article has originally been published here.

With clients and partners located around the globe, we've always had a culture of remote collaboration here at Tryolabs. We are used to joining meetings no matter where we are, using tools such as Slack, Google Hangouts, and Zoom. A sweet consequence of this is a generous work from home policy, allowing us to work from home whenever we want.

Trouble is, working remotely leads to us missing out on all the fun that takes place outside of meetings when we’re not connected. Wouldn’t it be cool to have a robot representing us at the office, showing us what goes on while we’re not there?

As a group of computer vision, IoT, and full-stack specialists, we got really enthusiastic about the idea and went on a mission to create a robot that could be remotely controlled from home and show us what takes place at the office.

We spent the 2018 edition of the Tryolabs Hackathon facing that challenge with only three rules in place:

⏰ Hours to complete the project: 48
👫 Number of team members: 4
☕️ Coffee provided: Unlimited

Hackathon team: Joaquín, Braulio, Javier and Lucas.

Building the hardware of a mini-robot

It all started with the design of the mini-robot’s mechanical structure, which is present at the office while the remote worker isn't.

To have a robot that can easily move around the office, it must be mobile, stable, small enough to pass through doors, and big enough to not be overseen and trampled on by the team working at the office.

We went through several iterations of its structural design before settling on this one:

Sketch drawn during hackathon to define main hardware components of the robot and its communication with the remote worker.

We chose aluminum as the main material for the components since it's light, robust, and cheap.

Once we defined the design and selected the materials, we cut the aluminum parts and put them together with small screws. Since we had to work with the tools available at the office and from the store around the corner, this was a rather improvised and humorous process. We used heavy books to shape the aluminum and sunglasses as safety glasses while drilling into the components, just to give you an idea. 🙈

The main hardware components we settled on were:

Layers of aluminum sheets to build the structural backbone
Screws
RaspberryPi
PiCamera
1 Servo motor SG90
H-bridge to control the motors
2 DC motors
2 wheels
Swivel Casters
Wire
PowerBank

Enabling the robot to communicate in real-time

While some of us continued working on the hardware and assembling the pieces, the rest of the team started building the software that would control all the components mentioned above.

Implementing WebRTC

The aim of the robot's software was to enable real-time communication between the remote workers and the teams at the office. In other words, the robot needed to be able to transmit video and audio from the office to the people working remotely and vice versa.

While evaluating various approaches to solving this problem, we came across WebRTC, which promised to be the tool we were looking for:

WebRTC is ideal for telepresence, intercom, VoIP software in general as it has a very powerful standard and modern protocol which has a number of features and is compatible with various browsers, including Firefox, Chrome, Opera, etc.

The WebRTC extension for the UV4L Streaming Server allows for streaming of multimedia content from audio, video, and data sources in real-time as defined by the WebRTC protocol.

Specifically, we used the WebRTC extension included in UV4L. This tool allowed us to create bidirectional communication with extremely low latency between the robot and the remote worker’s computer.

Running the UV4L server with the WebRTC extension enabled, we were able to serve a web app from the RaspberryPi, then simply access it from the remote worker’s browser establishing real-time bidirectional communication; amazing!

This allowed us to set up a unidirectional channel for the video from the PiCamera to the browser, a bidirectional channel for the audio, and an extra unidirectional channel to send the commands from the browser to the robot.

Building a UI to manage communication

To be able to see the data and send the commands in a user-friendly way for the remote worker, we researched how to integrate those functionalities into an accessible and practical front-end.

Inspired by the web app example from the UV4L project, we integrated the data channels mentioned above into a basic but functional front-end, including the following components:

index.html: the HTML5 page, which contains the UI elements (mainly video) to show the incoming streaming and *the *canvas to show the pose estimation key-points
main.js: defines the callbacks triggered by user actions like “start streaming”, "load net", "toggle pose estimation", etc
signalling.js: implements the WebRTC signaling protocol over WebSocket

Time lapse shot during the hackathon.

Remotely control the robot's movements

To handle the movement commands the robot would receive from the remote worker, we developed a controller written in Python, that runs like a system service. This service translates commands that control the robot’s motors by:

Setting the pins, connected to the H-bridge wheel motors, to high or low
Establishing the PWM frequency and duty-cycle for the Servo, which adjusts the PiCamera's orientation

Here's a snippet of the controller classes:

class MotorsWheels:

    def __init__(
            self, r_wheel_forward=6, r_wheel_backward=13, l_wheel_forward=19, l_wheel_backward=26):
                self.r_wheel_forward = r_wheel_forward
                ...
                GPIO.setmode(GPIO.BCM)
        GPIO.setup(r_wheel_forward, GPIO.OUT)
                GPIO.setup(r_wheel_backward, GPIO.OUT)
                ...
                # Turn all motors off
        GPIO.output(r_wheel_forward, GPIO.LOW)
        GPIO.output(r_wheel_backward, GPIO.LOW)

    def _spin_right_wheel_forward(self):
        GPIO.output(self.r_wheel_forward, GPIO.HIGH)
        GPIO.output(self.r_wheel_backward, GPIO.LOW)

    def _stop_right_wheel(self):
        GPIO.output(self.r_wheel_backward, GPIO.LOW)
        GPIO.output(self.r_wheel_forward, GPIO.LOW)

    def go_fw(self):
        self._spin_left_wheel_forward()
        self._spin_right_wheel_forward()

class ServoCamera:
    CENTER = 40000
    UP_LIMIT = 80000
    DOWN_LIMIT = 30000
    STEP = 5000

    def __init__(self, servo=18, freq=50):
        self.servo = servo
        self.freq = freq
        self.pi = pigpio.pi()

        self.angle = self.CENTER
        self._set_angle()

    def _set_angle(self):
        self.pi.hardware_PWM(self.servo, self.freq, self.angle)

    def up(self):
        if self.angle + self.STEP < self.UP_LIMIT:
            self.angle += self.STEP
            self._set_angle()

    def down(self):
        if self.angle - self.STEP > self.DOWN_LIMIT:
            self.angle -= self.STEP
            self._set_angle()

As a result, we were able to control the robot, “walk” it through the office, and enable remote workers to see their teams and approach them via the robot.

However, it wasn’t enough for our enthusiastic team and we continued to pursue the ultimate goal: having an autonomous robot.

Adding computer vision to the robot

We thought, wouldn't it be awesome if the robot could recognize people and react to their gestures and actions (and in this way have a certain amount of personality)?

A recently released project called PoseNet surfaced fast. It’s presented as a "machine learning model, which allows for real-time human pose estimation in the browser". So, we dug deeper into that.

The performance of that neural net was astounding and it was really attractive as we ran it over TensorFlowJS in the browser. This way, we were able to get a higher accuracy and FPS rate than by running it from the RaspberryPi, and also less latency than if we had run it on a third server instead.

Rushed by the parameters of the hackathon, we skimmed the project’s documentation and demo web app source code. Once we identified which files we were going to need, we imported them and immediately jumped to integrating these functionalities into our web app.

We wrote a basic detectBody function to infer the pose estimation key points that invoked thenet.estimateMultiplePoses with these params:

async function detectBody(canvas, net) {
    if (net){
        var ctx = canvas.getContext('2d');
        var imageElement = ctx.getImageData(0, 0, canvas.width, canvas.height);

        var imageScaleFactor = 0.3;
        var flipHorizontal = false;
        var outputStride = 16;
        var maxPoseDetections = 2;
        var poses = await net.estimateMultiplePoses(
            imageElement,
            imageScaleFactor,
            flipHorizontal,
            outputStride,
            maxPoseDetections
        )
        return poses;
    }

Said detectBody was invoked at a rate of 3 times per second to refresh the pose estimation key points.

Then, we adapted some util functions in order to print the detected body key points and plot its skeleton above the video, arriving at a demo like this:

Soledad and Lucas showing off pose detection algorithm.

This was a very quick proof of concept which added a wonderful feature and hugely expanded the potential capabilities of our robot.

If you’d like to know how this model works under the hood, you can read more here.

Results

48 hours and an unknown amount of coffee led to the construction of a mini-robot with the ability to walk through the office, enable real-time communication between the remote worker and their office mates, and even transport an LP. 😜

Stand-in robot walking through the office, controlled by a remote worker.

Interface that shows the remote worker in a browser how the robot is controlled.

We managed to build the hardware, implement the communication software, and build a PoC for an additional feature using computer vision, which facilitates the robot's interaction with people. Future enhancements could include object detection features that would allow the robot to recognize objects and interact with them without human help using Luminoth, our open source toolkit for computer vision, for example.

Though we normally prototype for longer than two days, this hackathon project reflects how we work at Tryolabs. We often build prototypes and solutions with state-of-the-art technologies to enhance operational and organizational processes.

Thinking of a robot for your business? Get in touch with us!

Getting Started with AWS: Open Source Workshop

Tryolabs — Tue, 10 Jul 2018 15:11:00 +0000

Introduction

One of our strengths at Tryolabs is that we have people coming from diverse technological backgrounds. In order to make sure that everyone who joins the company, no matter their previous experience, can be up to speed with developing apps with the stack we usually use, we have an extensive onboarding process that involves the development of a real application (frontend, backend and some data science), with a coach, code reviews, and iterative improvements.

Why using cloud services?

An interesting skill that sometimes is neglected (especially with junior devs) is how to put apps into production in a robust and scalable way. It is desirable that our apps are highly available (meaning they are very rarely down), and have mechanisms so they can tolerate load of concurrent users without crashing or slowing down.

For doing this, we use cloud providers, such as Amazon Web Services (AWS) , Google Cloud Platform or Microsoft Azure. Eeach provider offers its own services to make the deployment of our apps easier. Some, for example, offer different types of databases as a service, so we don't need to handle database administration or scaling. Scalable file storage solutions are very common, too.

It is interesting to note that the services offered by these cloud providers can affect the design (e.g. the database you use) of your app. Very scalable apps that some years ago were very difficult to develop are now almost trivial to do.

How to get started with AWS?

So, given the importance of cloud providers and AWS in particular, we've created – and are now open sourcing – a guide we call AWS Workshop.

In this workshop, a developer learns how to deploy a demo application using several services available in the AWS stack. As the demo app to be deployed, we chose an open source test application called Conduit, which is handy to learn new frameworks because the same app has implementations in multiple frameworks for backend and frontend. In particular, we use the version built with React and Django + Django-Rest-Framework backend, which most resembles the technologies we use for several of our projects.

In the workshop, developers learn:

Setting up users in your AWS account.
Deploying a website on S3, a backend in EC2 and a database in RDS.
Setting up Load Balancing and Auto Scaling Groups.
Setting up a VPC and a bastion instance.
Deploying the application using Elastic Beanstalk.

Are you new to AWS and want to use its essential services to deploy your app? Get started with the open source workshop right here!

In the future, we'll add other services such as a deployment using AWS Lambda (pull requests are welcome!)

This article was originally published here.

Luminoth 0.1: Open source Computer Vision toolkit

Tryolabs — Mon, 28 May 2018 21:00:51 +0000

Luminoth is an open-source computer vision toolkit, built upon Tensorflow and Sonnet. We just released a new version, so this is a good time as any to dive into it!

Version 0.1 brings several very exciting improvements:

An implementation of the Single Shot Multibox Detector (SSD) model was added, a much faster (although less accurate) object detector than the already-included Faster R-CNN. This allows performing object detection in real-time on most modern GPUs, allowing the processing of, for instance, video streams.
Some tweaks to the Faster R-CNN model, as well as a new base configuration, making it reach results comparable to other existing implementations when training on the COCO and Pascal datasets.
Checkpoints for both SSD and Faster R-CNN models are now provided, trained on the Pascal and COCO datasets, respectively, and providing state-of-the-art results. This makes performing object detection in an image extremely straightforward, as these checkpoints will be downloaded automatically by the library, even when just using the command-line interface.
General usability improvements, such as a cleaner command-line interface for most commands, as well as supporting videos on prediction, and a redesign of the included web frontend to easily play around with the models.

We’ll now explore each of these features through examples, by incrementally
building our own detector.

First things first: testing it out

First of all, of course, we should install Luminoth. Inside your virtualenv,
run:

$ pip install luminoth

(N.B.: If you have a GPU available and want to use it, run `pip install tensorflow-gpu` first, and then the above command.)

Since the addition of the checkpoint functionality, we now offer pre-trained models for both Faster R-CNN and SSD out of the box. Effectively, this means that by issuing a couple commands, you can download a fully-trained object detection model for your use. Let’s start by refreshing the checkpoint repository using Luminoth’s CLI tool, lumi:

$ lumi checkpoint refresh
Retrieving remote index... done.
2 new remote checkpoints added.
$ lumi checkpoint list

================================================================================

|           id |                  name |       alias | source |         status |

================================================================================

| 48ed2350f5b2 |   Faster R-CNN w/COCO |    accurate | remote | NOT_DOWNLOADED |

| e3256ffb7e29 |      SSD w/Pascal VOC |        fast |  local | NOT_DOWNLOADED |
================================================================================

The output shows all the available pre-trained checkpoints. Each checkpoint is identified with the id field (in this example, 48ed2350f5b2 and e3256ffb7e29) and with a possible alias (e.g., accurate and fast). You can check other information with the command lumi checkpoint detail <checkpoint_id_or_alias>. We're going to try out the Faster R-CNN checkpoint, so we'll download it (by using the alias instead of the ID) and then use the lumi predict command:

$ lumi checkpoint download accurate

Downloading checkpoint...  [####################################]  100%

Importing checkpoint... done.
Checkpoint imported successfully.
$ lumi predict image.png
Found 1 files to predict.
Neither checkpoint not config specified, assuming `accurate`.
Predicting image.jpg... done.
{
  "file": "image.jpg",
  "objects": [
    {"bbox": [294, 231, 468, 536], "label": "person", "prob": 0.9997},
    {"bbox": [494, 289, 578, 439], "label": "person", "prob": 0.9971},
    {"bbox": [727, 303, 800, 465], "label": "person", "prob": 0.997},
    {"bbox": [555, 315, 652, 560], "label": "person", "prob": 0.9965},
    {"bbox": [569, 425, 636, 600], "label": "bicycle", "prob": 0.9934},
    {"bbox": [326, 410, 426, 582], "label": "bicycle", "prob": 0.9933},
    {"bbox": [744, 380, 784, 482], "label": "bicycle", "prob": 0.9334},
    {"bbox": [506, 360, 565, 480], "label": "bicycle", "prob": 0.8724},
    {"bbox": [848, 319, 858, 342], "label": "person", "prob": 0.8142},
    {"bbox": [534, 298, 633, 473], "label": "person", "prob": 0.4089}
  ]
}

The lumi predict command defaults to using the checkpoint with alias accurate, but we could specify otherwise using the option --checkpoint=<alias_or_id>. After about 30 seconds on a modern CPU, here is the output:

You can also write the JSON output to a file (through the --output or -f option) and make Luminoth store the image with the bounding boxes drawn (through the --save-media-to or the -d option).

Now in real-time!

Unless you’re reading this several years into the future (hello from the past!),
you probably noticed Faster R-CNN took quite a while to detect the objects in the image. That is because this model favors prediction accuracy over computational efficiency, so it’s not really feasible to use it, e.g., for real-time processing of videos (especially if you’re not in possession of modern hardware): even on a pretty fast GPU, Faster R-CNN won’t do more than 2-5 images per second.

Enter SSD, the single-shot multibox detector. This model provides a lower accuracy (which accentuates with the more classes you want to detect) while being, well, much faster. On the same GPU you get a couple images per second on Faster, SSD will achieve around 60 images per second, making it much more suitable for running over video streams or just videos in general.

Let’s do just that, then! Run the lumi predict again, but this time using the fast checkpoint. Also, notice how we didn’t download it beforehand; the CLI will notice that and look for it in the remote repository.

$ lumi predict video.mp4 --checkpoint=fast --save-media-to=.
Found 1 files to predict.
Predicting video.mp4  [####################################]  100%     fps: 45.9

It's, much faster! The command will generate a video by running SSD on a frame-by-frame basis, so no fancy temporal-prediction models (at least for now). In practice, this means you’ll probably see some jittering in the boxes, as well as some predictions appearing and disappearing out of nowhere, but nothing some post-processing can’t fix.

And of course, train your own

Say you just want to detect cars from out of your window, and you aren’t interested in the 80 classes present in COCO. Training your model to detect a lower number of classes may improve the detection quality, so let’s do just that. Note, however, that training on a CPU may take quite a while, so be sure to use a GPU or a cloud service such as Google’s ML Engine (read more about Luminoth’s integration with it here), or just skip this section altogether and look at the pretty pictures instead.

Luminoth contains tools to prepare and build a custom dataset from standard formats, such as the ones used by COCO or Pascal VOC. You can also build your own dataset transformer to support your own format, but that’s for another blog post. For now, we’ll use the lumi dataset CLI tool to build a dataset containing only cars, taken from both COCO and Pascal (2007 and 2012).

Start by downloading the datasets from here, here and here and storing them into a datasets/ directory created on your working directory (specifically, into datasets/pascal/2007/, datasets/pascal/2012/ and datasets/coco/). Then merge all the data into a single .tfrecords file ready to be consumed by Luminoth by running the following commands:

$ lumi dataset transform \
        --type pascal \
        --data-dir datasets/pascal/VOCdevkit/VOC2007/ \
        --output-dir datasets/pascal/tf/2007/ \
        --split train --split val --split test \
        --only-classes=car
$ lumi dataset transform \
        --type pascal \
        --data-dir datasets/pascal/VOCdevkit/VOC2012/ \
        --output-dir datasets/pascal/tf/2012/ \
        --split train --split val \
        --only-classes=car
$ lumi dataset transform \
        --type coco \
        --data-dir datasets/coco/ \
        --output-dir datasets/coco/tf/ \
        --split train --split val \
        --only-classes=car
$ lumi dataset merge \
        datasets/pascal/tf/2007/classes-car/train.tfrecords \
        datasets/pascal/tf/2012/classes-car/train.tfrecords \
        datasets/coco/tf/classes-car/train.tfrecords \
        datasets/tf/train.tfrecords
$ lumi dataset merge \
        datasets/pascal/tf/2007/classes-car/val.tfrecords \
        datasets/pascal/tf/2012/classes-car/val.tfrecords \
        datasets/coco/tf/classes-car/val.tfrecords \
        datasets/tf/val.tfrecords

Now we’re ready to start training. In order to train a model using Luminoth, you must create a configuration file specifying some required information (such as a run name, the dataset location and the model to use, as well as a battery of model-dependent hyperparameters). Since we provide base configuration files already, something like this will be enough:

train:
  run_name: ssd-cars
  # Directory in which model checkpoints & summaries (for Tensorboard) will be saved.
  job_dir: jobs/
  # Specify the learning rate schedule to use. These defaults should be good enough.
  learning_rate:
    decay_method: piecewise_constant
    boundaries: [1000000, 1200000]
    values: [0.0003, 0.0001, 0.00001]
dataset:
  type: object_detection
  # Directory from which to read the dataset.
  dir: datasets/tf/
model:
  type: ssd
  network:
    # Total number of classes to predict. One, in this case.
    num_classes: 1

Store it in your working directory (same place where `datasets/` is located) as `config.yml`. As you can see, we’re going to train an SSD model. You can start running as follows:

$ lumi train -c config.yml
INFO:tensorflow:Starting training for SSD
INFO:tensorflow:Constructing op to load 32 variables from pretrained checkpoint
INFO:tensorflow:ImageVisHook was created with mode = "debug"
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into jobs/ssd-cars/model.ckpt.
INFO:tensorflow:step: 1, file: b'000004.jpg', train_loss: 20.626895904541016, in 0.07s
INFO:tensorflow:step: 2, file: b'000082.jpg', train_loss: 12.471542358398438, in 0.07s
INFO:tensorflow:step: 3, file: b'000074.jpg', train_loss: 7.3356428146362305, in 0.06s
INFO:tensorflow:step: 4, file: b'000137.jpg', train_loss: 8.618950843811035, in 0.07s
(ad infinitum)

Many hours later, the model should have some reasonable results (you can just stop it when it goes beyond one million or so steps). You can test it right away using the built-in web interface by running the following command and going to

$ lumi server web -c config.yml
Neither checkpoint not config specified, assuming 'accurate'.
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Since Luminoth is built upon Tensorflow, you can also leverage
Tensorboard by running it on the job_dir specified in the config, in order to see the training progress.

Conclusion

And that’s it! This concludes our overview of the new (and old) features of Luminoth: we’ve detected objects in images and videos using pre-trained models, and even trained our own in a couple of commands. We limited ourselves to the CLI tool and didn’t even get to mention the Python API, from which you can use your trained models as part of a larger system. Next time!

This is the most feature-packed release of Luminoth yet, so we hope you get to try it out. Since we’re still at 0.1, you may hit some rough edges here and there. Please, feel free to write up some issues in our GitHub, or even contribute! All feedback is more than welcome in our road to make Luminoth better. You can also check out the documentation here, which contains some more usage examples.

And again, if you hit any roadblocks, hit us up!

This article was originally published here.