DEV Community

RockAndNull
RockAndNull

Posted on • Originally published at paleblueapps.com on

Implementing Live Camera OCR with Jetpack Compose

Implementing Live Camera OCR with Jetpack Compose

Building apps that can seamlessly interpret real-world data is becoming increasingly essential, especially with the rise of AI and machine learning.

Integrating features like Optical Character Recognition (OCR) directly into mobile apps allows users to extract and process text from images or camera feeds, enhancing the app's interactivity and usefulness.

In this post, we’ll explore how to implement a live camera view with OCR in Jetpack Compose. Leveraging Compose's modern UI toolkit Jetpack's CameraX, and the power of ML Kit, we’ll create a streamlined, intuitive experience for real-time text detection. Whether you’re building a document scanner, a data entry helper, or just want to experiment with cool tech, this guide will provide a practical, step-by-step approach to integrating these features into your app.

The Composable

@OptIn(ExperimentalPermissionsApi::class)
@Composable
private fun CameraView(
    modifier: Modifier,
    onTextDetected: (Text) -> Unit = {},
) {
    val context = LocalContext.current
    val lifecycleOwner = LocalLifecycleOwner.current
    val permissionState = rememberPermissionState(permission = Manifest.permission.CAMERA) // 1.

    val cameraController = remember {
        LifecycleCameraController(context).apply {
            setEnabledUseCases(CameraController.IMAGE_ANALYSIS)
            setImageAnalysisAnalyzer(
                ContextCompat.getMainExecutor(context),
                TextRecognitionAnalyzer(onTextDetected = onTextDetected), // 2.
            )
        }
    }

    Box(
        modifier = modifier.fillMaxWidth(),
        contentAlignment = Alignment.Center,
    ) {
        AndroidView(
            modifier = Modifier
                .fillMaxSize()
                .clip(RoundedCornerShape(12.dp)),
            factory = { context ->
                PreviewView(context).apply { // 3.
                    scaleType = PreviewView.ScaleType.FILL_CENTER
                    layoutParams = ViewGroup.LayoutParams(
                        ViewGroup.LayoutParams.MATCH_PARENT,
                        ViewGroup.LayoutParams.MATCH_PARENT,
                    )
                    this.controller = cameraController
                    cameraController.bindToLifecycle(lifecycleOwner) // 4.
                }
            },
        )

        if (!permissionState.status.isGranted) { // 5.
            Column(
                horizontalAlignment = Alignment.CenterHorizontally,
            ) {
                Text(
                    text = "Needs camera permission",
                )
                Spacer(modifier = Modifier.size(8.dp))
                Button(
                    onClick = {
                        permissionState.launchPermissionRequest()
                    },
                ) {
                    Text(text = "Request permission")
                }
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. Accessing the camera needs the appropriate permission. This example code handles permission requests and grants using the Google Accompanist Permissions Library.
  2. A custom analyzer that has input an image and output some text. This will be covered in the next section.
  3. The PreviewView from Jetpack's CameraX handles the live preview from the camera. Unfortunately, this is not Compose-native, so we need to use the old Android View.
  4. The controller must be aware when the app is in the foreground/background for resource allocation, so this conveniently takes care of it.
  5. The UI for displaying permission granting requests.

The Analyzer

internal class TextRecognitionAnalyzer(
    private val onTextDetected: (Text) -> Unit,
) : ImageAnalysis.Analyzer {

    private val scope: CoroutineScope = CoroutineScope(Dispatchers.IO + SupervisorJob())
    private val textRecognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS) // 1.

    @OptIn(ExperimentalGetImage::class)
    override fun analyze(imageProxy: ImageProxy) {
        scope.launch { // 2.
            val mediaImage = imageProxy.image ?: run {
                imageProxy.close()
                return@launch
            }

            val inputImage =
                InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

            suspendCoroutine { continuation ->
                textRecognizer.process(inputImage)
                    .addOnSuccessListener { visionText: Text ->
                        if (visionText.text.isNotBlank()) {
                            onTextDetected(visionText) // 3.
                        }
                    }
                    .addOnCompleteListener {
                        continuation.resume(Unit)
                    }
            }
            delay(100)
        }.invokeOnCompletion { exception ->
            exception?.printStackTrace()
            imageProxy.close()
        }
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. This is Google's ML KitRecognition for doing OCR (Optical Character Recognition) using machine learning. Essentially, it will analyze images and return text.
  2. We are using coroutines to handle the background processing.
  3. This is called when the OCR processing is complete and we can proceed with the business logic for the extracted text.

By integrating a live camera view with OCR in Jetpack Compose, you unlock powerful capabilities to process real-world data in real-time, adding immense value to your app's user experience. This combination of CameraX, ML Kit, and Compose makes it possible to create seamless, modern, and efficient interfaces while leveraging cutting-edge AI tools.

Happy coding!

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay