Cover image for Applying Computer Vision to build great UI/UX for applications

Applying Computer Vision to build great UI/UX for applications

brunoamuniz profile image Bruno Muniz ・4 min read

Last year, during a brainstorm, to bring new ideas for TotalCross, we started to think about the app development process and which part was more painful for developers. As there are 5 main stages required to build an application (Plan, Design, Develop, Test, Deploy), we created some hypotheses of which one was more time consuming and started to validate with clients and experienced mobile developers.

One of these hypotheses was related to the time spent by the developer to "translate" all the prototypes/mockups from the Design phase to real source code. After talking to some developers we found a pattern between them and in some worst cases, they were spending more than 30% of development time just to build the UI/UX of the application.

Definitely, there is a problem to be solved.

After digging into the existing solutions we found out two main approaches to solve this issue:

  1. New tool for designers: this way the designing tool would "generate" the source code automatically and make it "easier" for the developer;

  2. New tool for developers: this way the developer "just need" to drag and drop some components to build the UI again, making a copy of the prototype.

Either both solutions bring a lot of new issues, like:

  • Both, the designers and developers, will need to learn a new tool. They are already used to their favorite tools and most professionals simply don’t want to change it;

  • The adoption of a new design tool probably is going to be extremely hard, there are several and well-establish design tools for both personas as Figma, AdobeXD, Photoshop, XCode, Android Studio, etc. (as startup people say: your tech should be 10x better than your competitor's);

  • Most design tools that generate source code doesn't provide a clean and decent implementation.

After that research phase, we came with THE crazy idea:

Why not use Computer Vision, the one people uses to detect objects, animals, etc, to detect UI components in prototype images and draw the UI for the developers?

That way, we wouldn’t be adding a new tool for anyone, the designer could continue to use his favorite design tool, the developer could use his IDE and, the most important thing, we could cut up to 30% of development time and cost of a mobile project. Looks like a win-win solution =)

We decided to start a Proof of Concept to check the viability of this idea and avoid spending a lot of time, money, and energy with no guarantee of success.

At the time we started, some publications were made aiming at the same objective and the most promising was pix2code, we began the technology validation with that research and trained a neural network with some basic components like TextBox, Label, ComboBox, and the first results were absolutely amazing.

The first validation was finished but we still had one problem to solve: as in other technologies, we needed to translate the components into source code or to an intermediary language/format to make it easier for the developer to use it but, again, differently from other solutions, we wanted to use the best technology that already exists. The question in our minds was: "What is the most used dev tool to build UI for applications?"

The answer was so obvious: Mobile developers can provide an amazing UX for applications using tools like Android Studio. Why not generate an Android XML with the drawn UI? And so we did it.

We gathered a team including a Ph.D. and a Masters in Computer Graphics and students from Computer Science of the University of Fortaleza in Brazil to take this task and build this technology. Also, we raised some money from Banco do Nordeste do Brasil, one of the biggest public banks in Brazil, to support the creation of this MVP.

Results so far:

Some of the results we achieved so far

  • We still working to improve the neural network but we achieved 70% of accuracy on the Computer Vision algorithm;

  • Android Layouts are 80% supported right now;

  • Tests are being made on devices like Android, Raspberry Pi, Toradex modules, iOS, etc. The applications are really fluid, the UX is amazing and the footprint is extremely low (as we are not using android to render on the device). You can take a look in the first sample here;

  • The engine that renders the Android XML on the device is now open source, available at Github =). You can check the source code here.


This all started from a brainstorm with the team is becoming a really cool and promising technology that can make the app development process easier for developers and designers without adding a new complex tool for them.

If you want to take a look in the first sample, it is available at Github and we wrote a short tutorial to show how to use the technology.

Computer Vision and Neural Networks algorithms, Android XML, TotalCross, pix2code are some of the technologies that we are using to "read" prototype images and translate them to real applications that can run in mobile, desktop, and embedded devices. The work is not done yet but the results so far are really exciting =)

What do you think about this technology? Leave your comments!

cover image from: https://medium.com/@mou.abdelhamid/learning-computer-vision-machine-learning-c1521ee6ed08

Posted on Jun 2 by:

brunoamuniz profile

Bruno Muniz


Entrepreneur @TotalCross, noob at #opensource world, always learning.


markdown guide