Tryolabs

Posted on Dec 13, 2018

How we used IoT and computer vision to build a stand-in robot for remote workers

#machinelearning #computervision #iot #remotework

By Lucas Micol. The article has originally been published here.

With clients and partners located around the globe, we've always had a culture of remote collaboration here at Tryolabs. We are used to joining meetings no matter where we are, using tools such as Slack, Google Hangouts, and Zoom. A sweet consequence of this is a generous work from home policy, allowing us to work from home whenever we want.

Trouble is, working remotely leads to us missing out on all the fun that takes place outside of meetings when we’re not connected. Wouldn’t it be cool to have a robot representing us at the office, showing us what goes on while we’re not there?

As a group of computer vision, IoT, and full-stack specialists, we got really enthusiastic about the idea and went on a mission to create a robot that could be remotely controlled from home and show us what takes place at the office.

We spent the 2018 edition of the Tryolabs Hackathon facing that challenge with only three rules in place:

⏰ Hours to complete the project: 48
👫 Number of team members: 4
☕️ Coffee provided: Unlimited

Hackathon team: Joaquín, Braulio, Javier and Lucas.

Building the hardware of a mini-robot

It all started with the design of the mini-robot’s mechanical structure, which is present at the office while the remote worker isn't.

To have a robot that can easily move around the office, it must be mobile, stable, small enough to pass through doors, and big enough to not be overseen and trampled on by the team working at the office.

We went through several iterations of its structural design before settling on this one:

Sketch drawn during hackathon to define main hardware components of the robot and its communication with the remote worker.

We chose aluminum as the main material for the components since it's light, robust, and cheap.

Once we defined the design and selected the materials, we cut the aluminum parts and put them together with small screws. Since we had to work with the tools available at the office and from the store around the corner, this was a rather improvised and humorous process. We used heavy books to shape the aluminum and sunglasses as safety glasses while drilling into the components, just to give you an idea. 🙈

The main hardware components we settled on were:

Layers of aluminum sheets to build the structural backbone
Screws
RaspberryPi
PiCamera
1 Servo motor SG90
H-bridge to control the motors
2 DC motors
2 wheels
Swivel Casters
Wire
PowerBank

Enabling the robot to communicate in real-time

While some of us continued working on the hardware and assembling the pieces, the rest of the team started building the software that would control all the components mentioned above.

Implementing WebRTC

The aim of the robot's software was to enable real-time communication between the remote workers and the teams at the office. In other words, the robot needed to be able to transmit video and audio from the office to the people working remotely and vice versa.

While evaluating various approaches to solving this problem, we came across WebRTC, which promised to be the tool we were looking for:

WebRTC is ideal for telepresence, intercom, VoIP software in general as it has a very powerful standard and modern protocol which has a number of features and is compatible with various browsers, including Firefox, Chrome, Opera, etc.

The WebRTC extension for the UV4L Streaming Server allows for streaming of multimedia content from audio, video, and data sources in real-time as defined by the WebRTC protocol.

Specifically, we used the WebRTC extension included in UV4L. This tool allowed us to create bidirectional communication with extremely low latency between the robot and the remote worker’s computer.

Running the UV4L server with the WebRTC extension enabled, we were able to serve a web app from the RaspberryPi, then simply access it from the remote worker’s browser establishing real-time bidirectional communication; amazing!

This allowed us to set up a unidirectional channel for the video from the PiCamera to the browser, a bidirectional channel for the audio, and an extra unidirectional channel to send the commands from the browser to the robot.

Building a UI to manage communication

To be able to see the data and send the commands in a user-friendly way for the remote worker, we researched how to integrate those functionalities into an accessible and practical front-end.

Inspired by the web app example from the UV4L project, we integrated the data channels mentioned above into a basic but functional front-end, including the following components:

index.html: the HTML5 page, which contains the UI elements (mainly video) to show the incoming streaming and *the *canvas to show the pose estimation key-points
main.js: defines the callbacks triggered by user actions like “start streaming”, "load net", "toggle pose estimation", etc
signalling.js: implements the WebRTC signaling protocol over WebSocket

Time lapse shot during the hackathon.

Remotely control the robot's movements

To handle the movement commands the robot would receive from the remote worker, we developed a controller written in Python, that runs like a system service. This service translates commands that control the robot’s motors by:

Setting the pins, connected to the H-bridge wheel motors, to high or low
Establishing the PWM frequency and duty-cycle for the Servo, which adjusts the PiCamera's orientation

Here's a snippet of the controller classes:

class MotorsWheels:

    def __init__(
            self, r_wheel_forward=6, r_wheel_backward=13, l_wheel_forward=19, l_wheel_backward=26):
                self.r_wheel_forward = r_wheel_forward
                ...
                GPIO.setmode(GPIO.BCM)
        GPIO.setup(r_wheel_forward, GPIO.OUT)
                GPIO.setup(r_wheel_backward, GPIO.OUT)
                ...
                # Turn all motors off
        GPIO.output(r_wheel_forward, GPIO.LOW)
        GPIO.output(r_wheel_backward, GPIO.LOW)

    def _spin_right_wheel_forward(self):
        GPIO.output(self.r_wheel_forward, GPIO.HIGH)
        GPIO.output(self.r_wheel_backward, GPIO.LOW)

    def _stop_right_wheel(self):
        GPIO.output(self.r_wheel_backward, GPIO.LOW)
        GPIO.output(self.r_wheel_forward, GPIO.LOW)

    def go_fw(self):
        self._spin_left_wheel_forward()
        self._spin_right_wheel_forward()

class ServoCamera:
    CENTER = 40000
    UP_LIMIT = 80000
    DOWN_LIMIT = 30000
    STEP = 5000

    def __init__(self, servo=18, freq=50):
        self.servo = servo
        self.freq = freq
        self.pi = pigpio.pi()

        self.angle = self.CENTER
        self._set_angle()

    def _set_angle(self):
        self.pi.hardware_PWM(self.servo, self.freq, self.angle)

    def up(self):
        if self.angle + self.STEP < self.UP_LIMIT:
            self.angle += self.STEP
            self._set_angle()

    def down(self):
        if self.angle - self.STEP > self.DOWN_LIMIT:
            self.angle -= self.STEP
            self._set_angle()

As a result, we were able to control the robot, “walk” it through the office, and enable remote workers to see their teams and approach them via the robot.

However, it wasn’t enough for our enthusiastic team and we continued to pursue the ultimate goal: having an autonomous robot.

Adding computer vision to the robot

We thought, wouldn't it be awesome if the robot could recognize people and react to their gestures and actions (and in this way have a certain amount of personality)?

A recently released project called PoseNet surfaced fast. It’s presented as a "machine learning model, which allows for real-time human pose estimation in the browser". So, we dug deeper into that.

The performance of that neural net was astounding and it was really attractive as we ran it over TensorFlowJS in the browser. This way, we were able to get a higher accuracy and FPS rate than by running it from the RaspberryPi, and also less latency than if we had run it on a third server instead.

Rushed by the parameters of the hackathon, we skimmed the project’s documentation and demo web app source code. Once we identified which files we were going to need, we imported them and immediately jumped to integrating these functionalities into our web app.

We wrote a basic detectBody function to infer the pose estimation key points that invoked thenet.estimateMultiplePoses with these params:

async function detectBody(canvas, net) {
    if (net){
        var ctx = canvas.getContext('2d');
        var imageElement = ctx.getImageData(0, 0, canvas.width, canvas.height);

        var imageScaleFactor = 0.3;
        var flipHorizontal = false;
        var outputStride = 16;
        var maxPoseDetections = 2;
        var poses = await net.estimateMultiplePoses(
            imageElement,
            imageScaleFactor,
            flipHorizontal,
            outputStride,
            maxPoseDetections
        )
        return poses;
    }

Said detectBody was invoked at a rate of 3 times per second to refresh the pose estimation key points.

Then, we adapted some util functions in order to print the detected body key points and plot its skeleton above the video, arriving at a demo like this:

Soledad and Lucas showing off pose detection algorithm.

This was a very quick proof of concept which added a wonderful feature and hugely expanded the potential capabilities of our robot.

If you’d like to know how this model works under the hood, you can read more here.

Results

48 hours and an unknown amount of coffee led to the construction of a mini-robot with the ability to walk through the office, enable real-time communication between the remote worker and their office mates, and even transport an LP. 😜

Stand-in robot walking through the office, controlled by a remote worker.

Interface that shows the remote worker in a browser how the robot is controlled.

We managed to build the hardware, implement the communication software, and build a PoC for an additional feature using computer vision, which facilitates the robot's interaction with people. Future enhancements could include object detection features that would allow the robot to recognize objects and interact with them without human help using Luminoth, our open source toolkit for computer vision, for example.

Though we normally prototype for longer than two days, this hackathon project reflects how we work at Tryolabs. We often build prototypes and solutions with state-of-the-art technologies to enhance operational and organizational processes.

Thinking of a robot for your business? Get in touch with us!