DEV Community

Cover image for AI on Mobile Devices
Mikkel
Mikkel

Posted on

AI on Mobile Devices

Introduction

We’ve all heard about the major advancements in AI, including large language models, image recognition, and video generation. These are relatively demanding models, requiring significant computational power and hardware capabilities. But what about smaller devices, like smartphones, smartwatches, or Internet of Things (IoT) devices? Is it possible to take a picture on your phone and, within seconds, determine if your plant is dying due to neglect or a rare disease—letting you off the hook for the blame? In this post, we’ll take a closer look at that possibility.

Computing Power on Mobile Devices

When it comes to AI on mobile devices, why can’t we just download ChatGPT and use it as we please?

The primary reason is the sheer number of mathematical computations involved in running an AI model. Mobile devices have significant limitations in terms of how many calculations they can perform per second, how much memory they can store at once, and how much power they can consume for a task.

Let’s use OpenAI’s GPT-3 as an example to understand these challenges. GPT-3 has 175 billion parameters—the numerical values used to generate responses to your questions. These parameters require about 700 GB of memory to run. Even the most expensive smartphones, typically equipped with 16–24 GB of RAM, fall far short of what’s needed to handle such a model.

Now, let’s compare this to a specialized mobile model like MobileNet, which is optimized for on-device performance. MobileNet has only 4 million parameters and requires approximately 16 MB of RAM—a massive reduction in computational and memory demands.

While this comparison is somewhat exaggerated (as ChatGPT runs on specialized servers, meaning your phone or computer doesn’t actually handle the heavy computations), it effectively illustrates the constraints of mobile devices. If you want an AI model to run directly on your phone without relying on an internet connection, it’s crucial to design and optimize the model specifically for mobile use.

Mobile Sensors

With nearly everyone owning a smartphone these days, it’s only natural to make use of the many sensors they contain. Most people are familiar with the microphone, camera, and GPS, but smartphones also come equipped with accelerometers and gyroscopes.

Audio, image, and video are relatively intuitive applications. These can be used to identify anomalies or classify events. For example, audio can be used for diagnostics in the healthcare sector or to detect animal sounds in environmental monitoring.

Where accelerometers and gyroscopes have proven particularly useful is in activity tracking. Common applications include step counters or running apps. Beyond these relatively simple uses, smartphones have the potential to play a much larger role in public health.

Gait analysis has been shown to be a valuable tool for diagnosing several cognitive disorders, such as dementia and Alzheimer’s, where the brain undergoes degeneration. Even in the early stages, deviations in a person’s walking pattern can be detected, allowing for early intervention. With specialized AI models, we could capture these changes without the need for expensive and bulky gait analysis equipment.

A simpler but still impactful application related to cognitive disorders is monitoring how often a patient is active. By tracking when the phone, and consequently the patient, is in motion, we can estimate daily activity levels. This is particularly valuable for individuals who may struggle to remember their day-to-day activities. Since physical activity is a crucial factor in slowing the progression of cognitive disorders, this type of monitoring can provide meaningful insights and support for both patients and healthcare providers.

Existing Mobile Models

There are already several AI models optimized for mobile devices and smaller hardware. A few examples include Gemini, Yamnet, and MobileNet.

Gemini is Google’s flagship AI product. It’s a versatile model with specialized versions. While it can be used similarly to ChatGPT, it also has smaller, optimized versions that make it practical for mobile use as an enhanced Google Assistant. By 2024, the mobile version is expected to function as an AI assistant capable of actively planning tasks for you, using inputs such as emails and calendar data.

Yamnet is an audio classification model built on a large YouTube dataset. It can classify sounds into 512 different categories. Available on mobile, it can be used for classification tasks and also for converting audio into spectrograms or embeddings, which can be utilized by other models.

MobileNet is an image classification model optimized for mobile devices, as its name suggests. The model is versatile and can be applied to tasks like object detection, facial recognition, and geolocation.

Use Cases

At Convai, we have worked on several mobile apps that either integrate AI or are in the process of doing so.

As mentioned in the post Sound Event Detection, we collaborated with Sonohaler, a company using a mobile app with AI to classify audio events and measure their quality. In some of our experiments, we utilized Yamnet as it offers a straightforward method for extracting spectrograms from audio data.

Another project we’re working on is Taptics, an app designed to help football clubs train and test their players’ tactical understanding. One of the simpler ways we use AI here is as a quiz assistant. The assistant can generate new questions or answer options by analyzing existing quizzes. Additionally, it can evaluate players' free-text answers by comparing them to the coach's correct responses, saving the coach from having to manually review every answer. This allows for more complex answer structures compared to multiple-choice questions, without adding an extra feedback burden on the coach.

Top comments (0)