What I built
Speakr is a web app which allows you to write in air, using your mobile phone as a pen - gesture to speech translation.
Category Submission:
Random Roulette
App Link
Speakr Web App - for mobile phones with an onboard IMU. As of writing, the Sensor API only works on Android.
Screenshots
Description
Speakr is a web app which allows you to write in air, using your mobile as a pen. It utilises the onboard IMU to record movements, before translating them to an image, and then running handwriting recognition on them to determine the written text. This text is then played out loud, via text-to-speech.
Link to Source Code
Permissive License
MIT
Background
I wanted to explore the possibility of using a mobile phone as a pen, with handwriting in air being an interesting opportunity. Almost all phones have an onboard inertial measurement unit so this was definitely possible.
How I built it
Speakr is a React web app served by a simple NodeJS server, hosted on the DigitalOcean App Platform.
Interfacing with the Chrome Sensor API, whilst "draw" is held down, orientation sensor readings are recorded and relative distance is calculated from the initial orientation, with simple trigonometry. This method allows for greater flexibility as letters can be drawn in varying sizes in air, but all are scaled down to the same size. Rendering the letters in an accurate manner was the bulk of the problem.
When "speak" is pressed, the sensor readings are processed through a combination of scaling and offsets before being rendered on to a canvas element.
From the canvas, an image is generated and calls a backend REST API for translation - a NodeJS server running on the Digital Ocean App Platform. Handwriting recognition is done via the Google Vision API, and the returned text is converted to speech with the Google TTS API. This audio is sent to the app and played back on the mobile phone.
Initially, I started off with the accelerometer, however this was too noisy and unreliable. I quickly switched to the orientation fusion sensor.
During the hacking process, I learnt how to use the DigitalOcean App platform, as well as interfacing with Google Cloud Vision APIs for handwriting recognition. I had this concept as an idea for quite some time. To finally dive in and build it feels like a great accomplishment, especially when I got the Google Vision API to recognise the rendered text after a lot of trial and error.
What's next for Speakr
I would like to develop the gesture-to-text concept and apply it to novel applications, for which further research is required. Regarding the app itself, I would like to train custom ML models to recognise shapes/arrows and other special gestures utilising TensorflowJS, and hook it up to integrations such as IFTTT for more flexibility. For example, custom gestures are important for intuitive controls with smart home devices or perhaps signalling an emergency. Perhaps a new mode of communication?
Top comments (0)