DEV Community: Antonio Feregrino

Understanding (a bit of) the Gradle Kotlin DSL

Antonio Feregrino — Sun, 12 Jan 2025 18:22:09 +0000

Ever since I started learning Kotlin I have been intrigued by what is going on behind the scenes in the build.gradle.kts file. At first glance, the file looks confusing, and for a newbie like me, it is hard to comprehend how the snippet below even has Kotlin-valid syntax.

plugins {
  kotlin("jvm") version "1.9.25"
  kotlin("plugin.spring") version "1.9.25"
  id("org.springframework.boot") version "3.4.1"
}

To fully understand how is this Kotlin there are a couple of mechanisms at play here that you must understand.

Lambdas with receivers
Infix functions

What is `plugins`?

Internally, plugins is a function whose signature looks somewhat like this:

fun plugins(block: PluginDependenciesSpecScope.() -> Unit): Unit

This tells us that plugins is a function that takes a single argument named block which known as a lambda with receiver.

A lambda with receiver is a special kind of lambda that allows you to call methods on a specific object (the receiver) within the lambda's body without explicitly referencing it.

For example, in our function signature above, we can see that the receiving type is PluginDependenciesSpecScope from the package org.gradle.kotlin.dsl. This lambda function does not receive any arguments, as indicated by the empty parentheses after the receiver type specification .(), and it does not return anything meaningful, as indicated by the -> Unit.

A potential implementation of the plugins function could be:

fun plugins(block: PluginDependenciesSpecScope.() -> Unit): Unit {
    val scope = PluginDependenciesSpecScope()
    scope.block()
}

While Gradle is way more complex, this is just a very simple example of how the function could be implemented. The key part is that behind the scenes, an instance of PluginDependenciesSpecScope is created for you, and block, being an extension method, can be called directly on this instance.

What is `kotlin` and `id`?

Both kotlin and id are extension methods of PluginDependenciesSpecScope, the reason behind one being able to call these methods directly without specifying the receiver is because this receiver is implicit, but if you wanted to, you reference it directly using the this keyword:

plugins {
  this.kotlin("jvm") version "1.9.25"
  this.kotlin("plugin.spring") version "1.9.25"
  this.id("org.springframework.boot") version "3.4.1"
}

Both these methods follow the fluent pattern, meaning they return the same object they acted on. For example, the signature of the kotlin method looks like this:

fun PluginDependenciesSpec.kotlin(module: String): PluginDependencySpec

The difference between kotlin and id

Behind the scenes, kotlin is just a shorthand for id, when you call kotlin("jvm"), this methods calls id but appends the prefix org.jetbrains.kotlin. to whatever string you passed, so kotlin("jvm") is equal to id("org.jetbrains.kotlin.jvm").

So, what is `version`?

The missing piece in my understanding of the Kotlin Gradle DSL is the version in the plugin definition, at first I thought this was a keyword, however, it turns out is an extension method, but not any kind of method but an infix one.

The signature of the version method looks like:

infix fun PluginDependencySpec.version(version: String): PluginDependencySpec

As you can see, it takes operates on a PluginDependencySpec but takes in a String and returns a PluginDependencySpec.

An infix function can be called without using dot notation and parentheses. It's declared using the infix keyword and must have a single parameter. This allows for more readable, natural language-like expressions in code.

So we could further change our plugin block definition to:

plugins {
  this.kotlin("jvm").version("1.9.25")
  this.kotlin("plugin.spring").version("1.9.25")
  this.id("org.springframework.boot").version("3.4.1")
}

Conclusion

On Lambdas with receivers

While the approach of using lambdas with receivers is powerful for creating DSLs, it can initially be confusing for those new to Kotlin or Gradle. This technique allows for a more natural, declarative syntax, enabling code that looks like special language constructs when it's actually just method calls on an implicit receiver.

Once grasped, this results in more readable and expressive code that closely resembles the domain it's describing - in this case, Gradle build configurations. However, the implicit nature of the receiver can be a source of confusion for beginners trying to understand what's happening "behind the scenes.”

On fluent interfaces and infix methods

The fluent pattern and infix methods aim to create a chain of method calls that read almost like natural language. In theory, this should enhance readability and make the build script more intuitive. However, for those not deeply familiar with Gradle, Kotlin, or the concept of DSLs, this syntax can initially seem magical or confusing.

It's not immediately obvious that version is a method, for instance, or why we can omit parentheses in some places but not others. While it allows for a clear separation of concerns, each method in the chain responsible for a specific aspect of plugin configuration requires some learning and adjustment to fully appreciate and utilise effectively.

As for me…

While I'm still getting accustomed to the Gradle Kotlin DSL and occasionally find it confusing, I've come to appreciate its power and elegance. Building DSLs is indeed one of Kotlin's strengths, and understanding these concepts opens up exciting possibilities for creating expressive and maintainable code.

By exploring the mechanics behind Gradle's Kotlin DSL, I feel I have demystified a bit of its syntax and gained insights into advanced Kotlin features that can be applied in various contexts. When mastered, these patterns and techniques can significantly enhance our ability to write clean, readable, and powerful code.

I hope this dive into the Gradle Kotlin DSL has been as enlightening for you as it has been for me. Whether you're a seasoned Kotlin developer or just starting out, understanding these concepts can greatly improve your grasp of Gradle and Kotlin's capabilities.

Creando un notebook con Jupyter y Kotlin

Antonio Feregrino — Mon, 06 Jan 2025 21:48:34 +0000

Introducción

Recientemente, comencé a sumergirme en el mundo de Kotlin, un lenguaje de programación moderno y versátil que ha captado mi atención. Sin embargo, como alguien acostumbrado al entorno interactivo de Jupyter, que permite iteraciones rápidas y una exploración fluida del código, me preguntaba si existía algo similar para Kotlin.

Para mi agradable sorpresa, descubrí que existe un kernel de Jupyter para Kotlin. Esta herramienta combina la potencia y elegancia de Kotlin con la interactividad y facilidad de uso de Jupyter, creando un ambiente de desarrollo ideal para aprender y experimentar con el lenguaje.

En este post, compartiré mi experiencia configurando un entorno de Jupyter con soporte para Kotlin, e incluso iré un paso más allá, creando un notebook que permite trabajar con múltiples lenguajes simultáneamente.

Creando un contenedor con Kotlin

La instalación del kernel de Kotlin para Jupyter es relativamente sencilla, especialmente si utilizamos Docker para crear un entorno controlado y reproducible. Veamos el Dockerfile que he creado para este propósito – revisa los comentarios para entender cada paso:

Dockerfile

Comenzamos con una imagen oficial de Jupyter descargada de quay.io. Usamos una versión específica para asegurar la reproducibilidad y etiquetamos la imagen como kotlin-kernel para identificarla fácilmente.

FROM quay.io/jupyter/base-notebook:2024-12-31 AS kotlin-kernel

Instalamos OpenJDK 21, necesario para ejecutar Kotlin, la instalación se realiza como root para evitar problemas de permisos y luego cambiamos al usuario no-root para asegurar la seguridad de la imagen.

USER root

RUN apt-get update && apt-get -y install openjdk-21-jdk

USER jovyan

Instalamos el kernel de Kotlin para Jupyter, esto nos permitirá ejecutar código Kotlin en nuestro notebook.

RUN pip install --user \
    kotlin-jupyter-kernel==0.12.0.322

Creamos un directorio para almacenar los notebooks.

RUN mkdir -p /home/jovyan/notebooks

Por último, establecemos la variable de entorno NOTEBOOK_ARGS que permite configurar el notebook con las opciones que necesitemos, en este caso, no queremos que se abra un navegador automáticamente y queremos que el directorio de notebooks sea /home/jovyan/notebooks.

ENV NOTEBOOK_ARGS="--no-browser --notebook-dir=/home/jovyan/notebooks"

Para construir la imagen Docker, ejecutamos:

docker build --target kotlin-kernel -t kotlin-kernel .

Este comando construye la imagen Docker y la etiqueta como kotlin-kernel.

Para ejecutar el contenedor:

docker run \
    -it \
    -p 8888:8888 \
    -v $(pwd)/notebooks:/home/jovyan/notebooks \
    kotlin-kernel

Este comando:

Ejecuta el contenedor en modo interactivo (-it).
Mapea el puerto 8888 del contenedor al puerto 8888 del host (-p 8888:8888).
Monta el directorio local notebooks en el directorio :/home/jovyan/notebooks del contenedor (-v $(pwd)/notebooks::/home/jovyan/notebooks).

Una vez ejecutado, podrás acceder a JupyterLab en tu navegador y verás que el Launcher ya tiene dos kernels disponibles: Python y Kotlin.

Y de hecho, ya podemos crear notebooks con Kotlin!

El siguiente paso en interactividad

Al profundizar en Kotlin, noté algunas similitudes interesantes con Python. Esto me llevó a querer visualizar estas similitudes de manera más detallada, creando comparaciones directas entre los dos lenguajes. Me pregunté si sería posible ejecutar código Python y Kotlin en el mismo notebook, y resulta que sí es posible.

Descubrí una extensión (y kernel de Jupyter) llamada SoS (Script of Scripts) que permite esta funcionalidad. Decidí agregarla a mi contenedor con el kernel de Kotlin. Aquí están las adiciones al Dockerfile:

Actualización del Dockerfile

Instalamos SoS, que nos permitirá ejecutar código Python y Kotlin en el mismo notebook.

RUN pip install --user \
    sos-notebook==0.24.4 \
    jupyterlab-sos==0.11.0 \
    sos==0.25.1 && \
    python -m sos_notebook.install

Con estas adiciones, ahora podemos construir y ejecutar nuestro contenedor mejorado:

docker build -t jupyter-kotlin .

docker run \
    -it \
    -p 8888:8888 \
    -v $(pwd)/notebooks:/home/jovyan/notebooks \
    jupyter-kotlin

Al acceder a JupyterLab ahora, verás tres kernels disponibles: Python, Kotlin y SoS.

Y ahora podemos ejecutar código Python y Kotlin en el mismo notebook:

Personalización extra

Para mejorar la experiencia visual y distinguir fácilmente entre las celdas de diferentes lenguajes, decidí personalizar la apariencia de las celdas.

Jupyter Notebook permite agregar CSS personalizado, lo que nos permite añadir gradientes a la izquierda de cada celda, dependiendo del lenguaje.

Aquí está el CSS que utilicé:

div[class*="sos_lan__python"] { 
    background: linear-gradient(90deg, rgba(255,222,87,1) 10px, rgba(69,132,182,1) 10px, rgba(69,132,182,1) 20px, rgba(254,254,254,1) 20px);
}
div[class*="sos_lan__kotlin"] {
    background: linear-gradient(90deg, rgba(180,140,252,1) 0px, rgba(196,22,224,1) 6px, rgba(223,73,107,1) 16px, rgba(223,73,107,1) 20px, rgba(255,255,255,1) 20px)
}

Para implementar esta personalización, guardé el CSS en un archivo llamado custom.css y lo agregué al Dockerfile:

# Copy the custom.css file
COPY custom.css ${HOME}/.jupyter/custom/custom.css

Además, es necesario especificar al comando jupyter lab que queremos usar este CSS personalizado, añadiendo la bandera --custom-css al comando de ejecución.

ENV NOTEBOOK_ARGS="${NOTEBOOK_ARGS} --custom-css"

Errores y cómo esconderlos

Durante el uso del kernel de múltiples lenguajes, ocasionalmente aparece un error cuando se ejecuta una celda de Kotlin. Este error se muestra de forma aleatoria y, aunque aún no he logrado identificar su origen ni cómo resolverlo de manera definitiva, he encontrado una solución temporal para mejorar la experiencia del usuario.

Para ocultar este error molesto, decidí utilizar CSS. Agregué la siguiente línea al archivo custom.css mencionado anteriormente:

div[class*="sos_lan__kotlin"] div[data-mime-type="application/vnd.jupyter.stderr"] { 
    display: none; 
}

Esta línea de CSS oculta los mensajes de error específicos de Kotlin en el notebook. Aunque no es una solución ideal, ya que podría ocultar errores importantes, mejora significativamente la experiencia visual al trabajar con notebooks de Kotlin, especialmente cuando se trata de este error recurrente y aparentemente inofensivo.

Conclusión

En este post, hemos explorado cómo crear un entorno de desarrollo interactivo para Kotlin utilizando Jupyter Notebooks.

Comenzamos con la configuración básica de un contenedor Docker con soporte para Kotlin, luego avanzamos hacia un entorno más sofisticado que permite la ejecución de código en múltiples lenguajes dentro del mismo notebook.

Además, hemos visto cómo personalizar la apariencia de nuestros notebooks para mejorar la experiencia visual y la legibilidad, y cómo "esconder" algunos errores comunes que pueden surgir durante el uso de estos notebooks.

Esto no solo facilita el aprendizaje de Kotlin, sino que también permite realizar comparaciones directas con otros lenguajes como Python, lo cual puede ser extremadamente útil para desarrolladores que están haciendo la transición a Kotlin o que trabajan regularmente con múltiples lenguajes de programación.

Recursos adicionales

Para aquellos interesados en explorar más a fondo o replicar este entorno, he puesto a disposición todo el código utilizado en este proyecto en mi repositorio de GitHub.

Espero que esta guía te sea útil en tu viaje de aprendizaje con Kotlin y Jupyter.

Más allá del pickle: el verdadero resultado de un equipo de aprendizaje automático

Antonio Feregrino — Mon, 16 Sep 2024 19:06:46 +0000

Imagínate esto: Un científico de datos recibe un problema, desaparece en las profundidades del data warehouse durante meses y emerge triunfante con un archivo pickle. ¿Qué contiene ese archivo? Un modelo que presume de una precisión impresionante y que está listo para que los ingenieros de software lo pongan en producción.

¡Lo hemos conseguido! ¡hicimos machine learning!

Er... no, la verdad es que no.

En los primeros días del ML -y seamos sinceros, esto sigue siendo así en muchas empresas hoy en día- todo giraba en torno a esta criatura mítica conocida como «El Científico de Datos». Se esperaba que lo hicieran todo, desde la gestión de los datos hasta la creación de modelos y el despliegue en producción. Esta es sin duda una idea seductora, pero llena de problemas.

La dura realidad

Algunos de los problemas que surgen de este enfoque y otros similares son los siguientes:

Mala asignación de competencias: Los científicos de datos son brillantes, pero sus habilidades a menudo se utilizan mejor en otros ámbitos. Sus habilidades de codificación pueden no ser las mejores cuando se trata de código optimizado, y estoy bastante seguro de que para la mayoría de ellos, escribir miles de líneas de SQL no es algo que les fascine.
Desconexión de la producción: Este enfoque mantiene a los científicos de datos en la oscuridad sobre las realidades de los despliegues de producción, y aunque esto suena como una buena idea para algunos de ellos, en realidad les está quitando oportunidades de aprendizaje que podrían informar sus futuras oportunidades de trabajo.
Problemas de produccionalización: Cuando llega el momento del despliegue, los ingenieros de MLOps, ML y software a menudo se encuentran rascándose la cabeza, tratando de descifrar el código del científico de datos y entender cómo se supone que debe resolver el problema. Es como recibir un rompecabezas al que le faltan la mitad de las piezas.
Discrepancia en features: Dado que en un data warehouse, es fácil usar todas las que encontremos. El modelo puede construirse con características que ni siquiera están disponibles cuando llega el momento de ejecutar la inferencia en el mundo real. Imagínese entrenarse para un maratón en casa en una caminadora y luego sorprenderse al descubrir que la carrera es en el Himalaya.
Falta de escalabilidad: Cuando cada proyecto depende de una sola persona o de un pequeño equipo que se encarga de todo de principio a fin, resulta casi imposible ampliar las operaciones o asumir varios proyectos simultáneamente.
Dificultad de mantenimiento: Una vez implantado el modelo, ¿quién supervisa su rendimiento? ¿Quién lo actualiza cuando inevitablemente empieza a degradarse? A menudo, cuando hay que reentrenar el modelo el científico de datos ya está trabajando en el siguiente proyecto.

¿El resultado de todo esto? Interesados frustrados que se dan golpecitos en los pies, preguntándose por qué todo tarda tanto o se rompe en pedazos, a veces lentamente, a veces con un fuerte estruendo. La promesa de ML se convierte en un juego de espera.

No estoy aquí para señalar a nadie. Los científicos de datos son increíbles -incluso intenté ser uno de ellos-, pero no tuve su paciencia ni sus habilidades. A lo que quiero llegar es a la importancia de la colaboración, las herramientas y la cultura para que los proyectos de ML tengan éxito.

De las fábricas de modelos a las fábricas de fábricas

Esta es mi propuesta: En lugar de ver el producto final como un modelo entrenado en producción, ¿qué pasaría si apuntáramos a algo más grande? ¿Y si nuestro objetivo fuera crear código que cree, reentrene y supervise modelos?

En otras palabras, dejemos de ser una fábrica de modelos para convertirnos en una fábrica de fábricas de modelos.

No se trata sólo de una diferencia semántica: es un cambio fundamental para abordar los proyectos de aprendizaje automático.

Ventajas del enfoque de fábrica de fábricas

Reproducibilidad mejorada: No dependemos de la suerte al entrenar un modelo o de la intuición de un único científico de datos. Cada paso del proceso está documentado y es repetible.
Mejor gestión de errores: Gracias a los de supervisión y formación, a menudo podemos solucionar los problemas sin necesidad de recurrir al científico de datos cada vez que algo va mal.
Escalabilidad mejorada: Una vez que disponemos de la infraestructura necesaria para crear y gestionar modelos de forma automática, podemos gestionar varios proyectos simultáneamente sin un aumento lineal de la carga de trabajo.
Tiempo de comercialización más rápido: Con canales automatizados para la preparación de datos, la formación de modelos y el despliegue, podemos pasar de la idea a la producción mucho más rápidamente.
Mejor asignación de recursos: Los científicos de datos pueden centrarse en tareas de alto valor, como la ingeniería de características y la arquitectura de modelos, en lugar de estancarse en tareas operativas.
Mayor colaboración: Los procesos claramente definidos y las interfaces compartidas facilitan que los miembros del equipo con diferentes especialidades trabajen juntos de forma eficaz.
Mejora continua: Con el reentrenamiento y la supervisión automatizados, los modelos pueden adaptarse continuamente a los nuevos datos, manteniendo su rendimiento a lo largo del tiempo sin intervención manual.
Mejora de la gobernanza y el cumplimiento: Los pipelines automatizados facilitan la implementación y el cumplimiento de las normas para el manejo de datos, la validación de modelos y el despliegue.

Hacerlo realidad: La ruta hacia el éxito

Convertirse en una fábrica de fábricas no se consigue de la noche a la mañana. Requiere un cambio fundamental tanto en las herramientas como en la cultura.
He aquí una lista no exhaustiva de lo que se necesita:

Herramientas esenciales

Almacenes de datos bien documentados: Ya sea un data lake, un warehouse o una feature store, los datos deben estar listos para que los científicos de datos los consuman. La documentación debe incluir el linaje y la disponibilidad de las funciones (es decir, ¿esta función está disponible en tiempo real?). Los almacenes de datos también deben ser fáciles de modificar y añadir funciones según sea necesario.
Plataformas de experimentación: Los científicos de datos necesitan entornos en los que puedan probar ideas rápidamente sin preocuparse por romper los sistemas de producción.
Infraestructura específica de ML: Ya sea una plataforma en la nube o una solución local, la infraestructura debe estar optimizada para las demandas de las cargas de trabajo de aprendizaje automático.
Seguimiento de experimentos, artefactos y modelos: Herramientas que permitan a los equipos realizar un seguimiento de los diferentes experimentos, comparar versiones de modelos y comprender qué funciona y qué no. Esto también podría servir como catálogo de modelos, un lugar donde todo el mundo pueda saber qué está en producción.
Frameworks de desarrollo y despliegue: Herramientas que funcionen a la perfección desde el desarrollo hasta la producción, minimizando la necesidad de reescribir el código o de complejas transferencias de código.
Soluciones de supervisión automatizadas: Sistemas que pueden realizar un seguimiento del rendimiento del modelo, la deriva de los datos y otras métricas clave sin supervisión humana constante, pero capaces de alertar cuando las cosas van mal.
Sistema de control de versiones: No basta con hacer un seguimiento de los modelos y experimentos; también es necesario hacer un seguimiento del código que los crea, que es nuestro principal producto.
Plataformas de colaboración: Herramientas que facilitan la comunicación y el intercambio de conocimientos entre los diferentes roles dentro y fuera del equipo de ML.

Cambios culturales necesarios

Científicos de datos más interesados: Necesitan sentirse cómodos con una gama más amplia de herramientas. No buscamos un científico de datos que lo sepa todo, sino uno que entienda y aproveche las herramientas que tiene a su disposición.
Inversión en infraestructura: Las empresas deben comprender que invertir en las herramientas y plataformas adecuadas reporta dividendos a largo plazo.
Planificación inclusiva: Todo el mundo, incluido el personal técnico, debe tener un sitio en la mesa de negociación, desde el principio de un proyecto y a la hora de definir su alcance y etapas.
Compromiso de las partes interesadas: Tiene que haber voluntad de probar y comprobar los resultados intermedios. El aprendizaje automático no es una solución mágica; necesita una retroalimentación temprana y constante. Los equipos deben sentirse cómodos con la mejora continua y la iteración.
Pensamiento a largo plazo: Las organizaciones deben planificar el mantenimiento y la mejora continuos de sus sistemas de ML. El software se degrada rápidamente, y el aprendizaje automático se degrada aún más porque es software MÁS datos desordenados.
Comunicación mejorada: Las reuniones regulares y los canales de comunicación claros entre los equipos de datos, los equipos de ML y otras partes de la organización son esenciales. Debido a su naturaleza, el ML tiende a estar al final de la cadena alimentaria de los datos y, como tal, a menudo es olvidado por los generadores de datos.
Equipos interfuncionales: Romper los silos entre los científicos de datos, los ingenieros de ML, los desarrolladores de software y los expertos en el dominio conduce a soluciones más sólidas y prácticas.

Nota sobre AutoML

Algunas personas ven AutoML como una solución para adoptar el aprendizaje automático en sus organizaciones. Estas plataformas intentan automatizar muchos aspectos del proceso de ML y ofrecen ventajas como una estrecha integración con los datos, la democratización del ML, una supervisión sencilla y una comercialización más rápida.

Sin embargo, las herramientas AutoML tienen posibles inconvenientes: el riesgo de dependencia del proveedor, la personalización limitada para necesidades empresariales específicas, las posibles restricciones de despliegue y la dificultad para comprender lo que ocurre bajo el capó.

Aunque valiosas, las herramientas AutoML no pueden reemplazar la necesidad de canalizaciones ML personalizadas a medida que las operaciones se vuelven más complejas; son sólo una herramienta en el conjunto de herramientas ML y no niegan la necesidad de cambios. Puede considerarlas un punto de partida, pero prepárese para complementarlas o sustituirlas por soluciones personalizadas a medida que maduren sus operaciones de ML.

La verdadera medida del éxito

Esto no significa que tengas que tener todas las piezas en su sitio ahora mismo para considerarse un éxito. Conseguirlo lleva tiempo y es un proceso. Cada paso que das hacia esta visión es un progreso. Empieza poco a poco, céntrate en una parte de tu flujo de trabajo y sigue avanzando a partir de ahí.

Y si tienes que empezar por algún sitio, hazlo por la cultura. Las herramientas van y vienen, pero una base cultural sólida te servirá independientemente de las tecnologías concretas que utilices.

A medida que esta cultura arraigue, le resultará más fácil identificar qué herramientas necesita y cómo implantarlas con eficacia. También estará mejor posicionado para utilizar estas herramientas de una manera que realmente mejore sus capacidades de ML en lugar de simplemente añadir complejidad.

El verdadero objetivo de un equipo de aprendizaje automático nunca debería ser crear un único modelo perfecto o tener el stack tecnológico más elegante. Se trata de crear un sistema (y un equipo) que pueda producir, mejorar y mantener modelos eficaces de forma constante. Así es como realmente aprovecharemos el poder del aprendizaje automático de una manera que sea sostenible, escalable y realmente útil.

Recuerda, fábricas de fábricas.

[Gracias a Ned Webster y Lorraine D'Almeida por sus comentarios sobre este artículo].

Face detection in movie trailers

Antonio Feregrino — Thu, 29 Aug 2024 19:28:06 +0000

I recently started a new YouTube channel where I review movies and TV shows. To make my videos a little bit more interesting, I show scenes from the trailers and the original movies instead of just my talking head.

In particular, if I speak of a particular character, I like to show the scenes where that character appears. This is a process that I started doing manually by scrolling through the video and carefully selecting the scenes where the character appears, but then I thought, "Wait a minute, this could be a job for a computer."

The following is a step-by-step guide on how I did it and how you can do it too.

Overall architecture

I started by thinking of this process as a data pipeline: starting with the full-length video, detecting scene changes, and then finding the faces in those scenes. With the faces extracted, I can perform clustering to group the faces belonging to the same individual. Once I have the clusters, I can just re-stitch the video with the scenes where the faces are from the same cluster.

A pipeline that looks like this:

Everything starts with a video

I will be using a movie from the Malayalam film industry – a movie I recently reviewed on my channel, the movie is called Ullozhukku and this is its trailer:

I downloaded the video and saved it in my movies folder, and I will be using the path in my local machine:

original_video_path = 'Ullozhukku-Trailer.mp4'

Scene detection

Thankfully, there are libraries to help us out with scene change detection, such as scenedetect.

You can install it via pip:

pip install scenedetect[opencv]

It could not be easier to use the detect function along with a ContentDetector to detect the scenes in the video:

from scenedetect import detect, ContentDetector

scenes = detect(original_video_path, ContentDetector())

If needed you can further customize the scene detection by passing arguments to the ContentDetector constructor.

The return value of the detect function is a list of tuples, where each tuple contains a scene's start and end time in the shape of a FrameTimecode.

FrameTimecode has methods to get the exact frame number and seconds.

In my case, I'll be introducing a dataclass to store the scene data more conveniently:

from dataclasses import dataclass


@dataclass
class Scene:
    start_time: float
    end_time: float
    start_frame: int
    end_frame: int

    @property
    def duration(self):
        return self.end_time - self.start_time


detected_scenes = [
    Scene(
        scene[0].get_seconds(),
        scene[1].get_seconds(),
        scene[0].get_frames(),
        scene[1].get_frames(),
    )
    for scene in scenes
]

First frame extraction

We will work under the assumption that the first frame of each scene is the most representative frame of the scene – after all, the scene is defined by dramatic changes between frames.

Using opencv we can easily extract frames from a video, but first we need to turn the video into a VideoCapture object:

import cv2

video = cv2.VideoCapture(original_video_path)

We can use the read() method of the VideoCapture object to get the current frame in the video. However, we need to set the position of the video to the start of the scene we want to extract; we can do this using the set() method of the VideoCapture object.

We need to do this for each scene in the video:

first_frames = []
for scene in detected_scenes:
    video.set(cv2.CAP_PROP_POS_FRAMES, scene.start_frame)
    _, frame = video.read()
    first_frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

.read() returns a tuple, where the first element is a boolean indicating whether the frame was successfully read, and the second element is the frame itself.

.cvtColor() is used to convert the frame from BGR to RGB, which is the format expected by some image processing libraries in Python.

For example, these are the first frames of six random scenes in the original video:

Face detection

We will use the ultralytics library to detect the faces in the video – be aware that you may need to install the torch and torchvision libraries too.

pip install ultralytics

With ultralytics we can use a YOLO model to detect the faces in the video, in this case we need to use a custom model that has been trained to detect faces such as yolov5s_face_relu6.pt that I got from this repository.

from ultralytics import YOLO

model = YOLO("yolov5s_face_relu6.pt")

If we call the model with a frame, it will return a list of results (with a single element), this single element is a complex object we need to treat before accessing the bounding box coordinates, the confidence score, and the class ID.

results = model(first_frames[5], verbose=False)

x1, y1, x2, y2, confidence, class_id = results[0].boxes.data.cpu().numpy()[0]

print(f"x1: {x1}, y1: {y1}, x2: {x2}, y2: {y2}, confidence: {confidence}, class_id: {class_id}")

I will introduce another dataclass to store the detected faces:

@dataclass
class DetectedFace:
    x1: int
    y1: int
    x2: int
    y2: int

Then, we can iterate over the results and extract the detected faces, at this point we can also filter out the faces with low confidence, and just to be sure, we can check that the class ID is 0 (which corresponds to the Human face class):

faces_in_frame = []
detections = results[0].boxes.data.cpu().numpy()
for det in detections:
    x1, y1, x2, y2, confidence, class_id = det
    if results[0].names[int(class_id)] == 'Human face':
        if confidence > 0.5:
            faces_in_frame.append(DetectedFace(int(x1), int(y1), int(x2), int(y2)))

And just as a sanity check, let's plot the detected faces for one frame:

Face embedding

Before clustering the faces, we need to extract the face embeddings. These embeddings are a numerical representation of the face that approximately encode facial features that we can then use to compare and cluster the faces.

We will use the face_recognition library to extract the face embeddings.

pip install face-recognition

The face_recognition library provides a face_encodings function that can extract the face embeddings from a frame given a set of face locations:

import face_recognition

encodings = face_recognition.face_encodings(
    first_frames[5],
    [(face.y1, face.x2, face.y2, face.x1) for face in
 faces_in_frame]
    )

The face_encodings function returns a list of encodings, where each encoding is a numpy array of 128 values.

We can use these encodings to compare and cluster the faces.

Putting it all together

But before doing that, we need to detect the faces in each scene and extract the face embeddings, we also need to keep track of the scene id for each face to be able to retrieve the original scenes later.

from collections import defaultdict

def detect_faces(frame, confidence=0.5):
    results = model(frame, verbose=False)
    detections = results[0].boxes.data.cpu().numpy()

    results = []
    for det in detections:
        x1, y1, x2, y2, conf, class_id = det
        if class_id == 0 and conf > confidence:
            results.append(DetectedFace(int(x1), int(y1), int(x2), int(y2)))

    return results

def extract_encodings(frame, detections):
    return face_recognition.face_encodings(frame, [(detection.y1, detection.x2, detection.y2, detection.x1) for detection in detections])

face_id = 0
detected_faces = []
encodings = []
face_id_to_scene = {}
for scene_id, frame in enumerate(first_frames):
    face_detection_results = detect_faces(frame)
    for detection in face_detection_results:
        detected_faces.append(detection)
        face_id_to_scene[face_id] = scene_id
        face_id += 1
    encodings.extend(extract_encodings(frame, face_detection_results))

By now we have a list of detected faces, a list of encodings, and a dictionary that maps each scene to the detected faces in that scene.

Clustering

We will use the scikit-learn library to perform the clustering.

pip install scikit-learn

We will use the DBSCAN algorithm to cluster the faces. DBSCAN is a clustering algorithm that groups together points that are close to each other, and separates points that are far apart.

It's a powerful algorithm that can find arbitrarily shaped clusters, and it doesn't require the number of clusters to be specified beforehand.

It identifies high-density regions (clusters) and low-density regions (noise). It groups together points that are close to each other and separates points that are far apart.

It requires two parameters to be set:

eps: the maximum distance between two samples to be considered in the same neighbourhood.
min_samples: the minimum number of samples in a neighbourhood for a point to be considered a core point.

We can play with the parameters to see how the clustering changes – but in my experiments, I found that these values worked well:

eps=0.45
min_samples=3

Let's perform the clustering using the encodings and print the results:

from sklearn.cluster import DBSCAN

clustering = DBSCAN(eps=0.45, min_samples=3)
clustering.fit(encodings)

To get the clusters we can access the labels_ attribute of the DBSCAN object, this will return an array of labels, one for each encoding. A label of -1 means that the encoding is a noise point (meaning that it's not part of any cluster).

We can then use these labels to group the detected faces into clusters:

face_clusters = defaultdict(list)
for i, label in enumerate(clustering.labels_):
    if label != -1:  # -1 is noise
        face_clusters[label].append(i)

And just as a sanity check, let's plot some of the detected faces, now grouped by cluster.

Remember that we have a dictionary that maps each face to the scene it belongs to, so we can use this to plot the detected faces in the original scenes.

As you can see, the clustering algorithm has done a pretty good job of grouping the faces that belong to the same individual. However, we can see that some clusters contain more than one face, and some faces that belong to the same individual are in different clusters.

I only care about clear face shots for my use case, so whatever the algorithm didn't cluster correctly, I will discard it.

Assembling clips

Now, we need to assemble the clips for each cluster. We will use the moviepy library to do this.

pip install moviepy

We will iterate over the desired clusters and assemble the clips for each cluster.

Let's start by creating a function that takes a video and a list of scenes and returns a video clip containing the scenes using the subclip method of the VideoFileClip object. Then we can use the concatenate_videoclips function to concatenate the clips.

from moviepy.editor import VideoFileClip, concatenate_videoclips

def create_video(original_video, scenes, output_name):
    subclips = [
        original_video.subclip(scene.start_time, scene.end_time)
        for scene in scenes
    ]
    final_clip = concatenate_videoclips(subclips)
    final_clip.write_videofile(output_name, verbose=False)
    final_clip.close()

Then we can use this function to assemble the clips for each cluster by iterating over the desired clusters, selecting the scenes where the faces in the cluster appear, and then creating a video clip for each cluster with the function we just created:

desired_clusters = [1, 2]
output_names = ['Parvathy', 'Urvashi']

original_video = VideoFileClip(original_video_path)

for cluster_id, output_name in zip(desired_clusters, output_names):
    original_video = VideoFileClip(original_video_path)
    face_ids = face_clusters[cluster_id]
    scene_ids = [face_id_to_scene[fid] for fid in face_ids]
    scenes = [detected_scenes[scene_id] for scene_id in scene_ids]

    video_name = f"{output_name}.mp4"
    create_video(original_video, scenes, video_name)

original_video.close()

And that's it! We have successfully clustered the faces in the video and assembled the clips for each cluster. Just what we wanted.

Find the results below:

The code is far from perfect and could be optimized further, but it works well for my use case, and I hope it can be helpful for yours too, or at least it gives you some ideas on how to tackle your own problem.

If you want the full code, find it in this Jupyter Notebook.

Documenting my pin collection with Segment Anything: Part 4

Antonio Feregrino — Sat, 22 Jun 2024 17:21:30 +0000

Welcome to the fourth entry in my series where I document my journey of cataloguing my enamel pin collection. If you missed the previous posts, you can catch up here. Previously, I introduced a simple app that segments each pin, assigning unique identifiers and names. Although I shared some future enhancements at the end of my last post, it dawned on me that I had slightly deviated from my main objective: effectively showcase my collection.

In this update, I'll take you through the process of integrating all previous developments into a single interactive webpage. This page highlights each pin, with detailed information accessible via mouse hover, all crafted using HTML, JavaScript, and jQuery.

As always, let me show you what the end product looks like:

And the live web page of my pin collection showcase v2 here.

Improving the quality of the cutout

Before getting into the front-end development, I wanted to try a couple of things to improve the quality of the cutout.

If you remember, from a previous post, the output of the Segment Anything Model is a set of masks covering where the segmented object is, however, for my use case the edges of the masks always ended up being a bit jagged, too pointy and complex, so I created the following function in an attempt to simplify the edges of the mask:

def refine_mask(image, mask):
    polygons = [Polygon(poly) for poly in sv.mask_to_polygons(mask)]
    single_polygon = unary_union(polygons)

    if single_polygon.geom_type == "Polygon":
        selected_polygon = single_polygon

    elif single_polygon.geom_type == "MultiPolygon":
        selected_polygon = max(single_polygon.geoms, key=lambda x: x.area)

    else:
        raise ValueError(f"Unexpected geometry type: {single_polygon.geom_type}")

    simplified_polygon = simplify(selected_polygon, 1.0)

    selected_polygon = simplified_polygon.buffer(10, join_style=1).buffer(-10.0, join_style=1)
    polygon = []
    for x, y in zip(selected_polygon.exterior.xy[0], selected_polygon.exterior.xy[1]):
        polygon.append(x)
        polygon.append(y)

    new_mask = sv.polygon_to_mask(
        np.array(selected_polygon.exterior.coords, dtype=np.int32),
        (image.shape[1], image.shape[0]),
    )

    return new_mask, polygon

A brief description of the function behaviour is:

Parameters

image: This is the original image associated with the mask. It is used to determine the dimensions for the new mask.
mask: This is a binary mask produced by SAM where the areas of interest are marked.

Function Body

Convert Mask to Polygons

polygons = [Polygon(poly) for poly in sv.mask_to_polygons(mask)]

Converts the mask into a list of shapely’s Polygon objects. It is achieved by detecting contours or similar features in the mask using the supervision library’s mask_to_polygons.

Merge Polygons

single_polygon = unary_union(polygons)

Combines these polygons into a single polygon using shapely.ops.unary_union, which efficiently merges overlapping or adjacent polygons.

Select Largest Polygon (if necessary)

if single_polygon.geom_type == "Polygon":
    selected_polygon = single_polygon
elif single_polygon.geom_type == "MultiPolygon":
    selected_polygon = max(single_polygon.geoms, key=lambda x: x.area)
else:
    raise ValueError(f"Unexpected geometry type: {single_polygon.geom_type}")

Checks the geometry type of the resultant polygon. If it's a MultiPolygon (which in my case happens quite often), it selects the polygon with the largest area, assuming that that largest area is one that contains the pin.

Simplify Polygon

simplified_polygon = simplify(selected_polygon, 1.0)

Simplifies the polygon's shape to reduce the number of vertices, making the shape easier to handle and process, the simplify function comes from the shapely module.

Buffering

selected_polygon = simplified_polygon.buffer(10, join_style=1).buffer(-10.0, join_style=1)

Applies a buffer of 10 units outward and then -10 units inward to smooth and regularise the edges, potentially cleaning up the polygon's boundary.

Extract Coordinates

polygon = []
for x, y in zip(selected_polygon.exterior.xy[0], selected.polygons.exterior.xy[1]):
    polygon.append(x)
    polygon.append(y)

Extracts the x and y coordinates from the exterior of the selected polygon and stores them in a list where the coordinates are laid like this: [x1, y1, x2, y2, ..., xn, yn], which is useful when showing the polygon in the front end as image maps.

Convert Polygon to Mask

new_mask = sv.polygon_to_mask(
    np.array(selected_polygon.exterior.coords, dtype=np.int32),
    (image.shape[1], image.shape[0]),
)

Finally, converts the simplified polygon back into a mask format with the original image's dimensions.

Returns

new_mask: The refined mask derived from the largest or simplified polygon.
polygon: The coordinates of the simplified polygon.

Some results

In this image, it is possible to see how in the original cutout there was an extra bit of image that does not belong to the pin badge, with the refining function we got rid of it.

The refining function not only helps in removing the unwanted bits of the image but also helps in removing empty spaces that should not be there.

However, the benefits of the refining function is not always visible, as shown above.

Front-end

Now, on to the front-end, where most of the time was invested.

A new `view` endpoint

I added a new endpoint to my FastAPI app, this endpoint serves the existing masks rendered into an HTML that will show the original image along with an HTML map element:

@app.get("/view/")
def get_view(request: Request):
    existing_cutouts = load_selected_cutouts()

    return templates.TemplateResponse(
        "view.html.jinja",
        {
            "request": request,
            "imageWidth": og_image.width,
            "imageHeight": og_image.height,
            "existing_cutouts": existing_cutouts,
            "image": turns_image_to_base64(og_image),
        },
    )

The `view.html.jinja` template:

The template is quite simple since most of the interactivity and functionality is in the JavaScript code that I will explain later:

<!DOCTYPE html>
<html lang="en">
<head>
<!-- Code omitted for brevity -->
</head>
<body>

  <img src="{{image}}" alt="Enamel Pins Collection" id="canvasMapContainer" usemap="#pinmap">

  <map name="pinmap" id="pinmap">
      {%- for cutout in existing_cutouts %}
      <area shape="poly" coords="{{cutout.polygon | join(',')}}" 
        data-name="{{cutout.name}}"
        {%- if cutout.description %}
        data-description="{{cutout.description}}"
        {%- endif %}
        alt="{{cutout.name}}" data-key="{{cutout.uuid}}" href="#">
      {%- endfor %}
  </map>

  <!-- Modal -->
  <dialog id="modal">
    <article>
      <header>
          <h3 id="infoPinNameModal"></h3>
      </header>
      <p id="infoPinDescriptionModal"></p>
    </article>
  </dialog>

  <!-- Tooltip -->
  <div id="tooltip" style="display: none;">
    <article>
      <header><h3 id="infoPinNameTooltip"></h3></header>
      <p id="infoPinDescriptionTooltip" style="display: none;"></p>
    </article>
  </div>

    <script>
    /* Functionality described below */
    </script>
</body>
</html>

There are five key pieces to this app:

The img tag with canvasMapContainer as id. This image will show the image containing all the pins. This image is the same I have been working with across these series of posts. The tag has src="{{image}}" where the image is provided via the server as a base46 image. Another thing to note about this image tag is that it has the property usemap set to "#pinmap", this lets the browser know that there is an image map attached to this image.
The map tag contains areas that correspond to different parts of the enamel pin canvas, notice how the map’s name property matches the value set as usemap in the image above. These values are set dynamically at render time, the loop {%- for cutout in existing_cutouts %} allows us to create an area element with information such as polygon coordinates, name and descriptions for each of the pins.
A dialog tag with modal as id. This element is used to display more detailed information about a selected pin. This element is hidden by default and only shown whenever a user clicks on a pin.
A div that works as a floating tooltip that displays basic information about the pin over which the user is hovering the cursor. Just like the modal dialog above, this tooltip is hidden initially and shown on certain interactions defined in the script below.
The fifth element is a script that orchestrates the whole functionality of the app, it requires more than a simple paragraph to explain its functionality, continue reading to learn more about it.

The app’s logic

Dependencies

jQuery: A fast, small, and feature-rich JavaScript library, some people may think it is quite outdated, however, it simplifies things like HTML document traversal and manipulation, event handling, and even animation.
ImageMapster: A jQuery plugin that provides interactive image maps functionality. It allows images to be used with areas that can be manipulated and interacted with in various ways.

Functionality

Everything happens after the document has been loaded, inside a $(document).ready(function() { }); definition.

Modal and Tooltip Interaction:

The script initialises variables for modal and tooltip elements, as well as several configuration variables for classes and animation timing.

  const $modal = $("#modal");
  const isOpenClass = "modal-is-open";
  const openingClass = "modal-is-opening";
  const closingClass = "modal-is-closing";
  const scrollbarWidthCssVar = "--pico-scrollbar-width";
  const animationDuration = 400; // ms
  const padding = 10;

  const $tooltip = $("#tooltip");
  const $infoPinNameTooltip = $("#infoPinNameTooltip");
  const $infoPinNameModal = $("#infoPinNameModal");
  const $canvasMapContainer = $("#canvasMapContainer");
  let visibleModal = null;

It defines functions to toggle, open, and close the modal. The modal can be opened or closed either by clicking on an area of the image map or using the Escape key.

  // Toggle modal
  const toggleModal = () => {
      if (!$modal.length) return;
      $modal[0].open ? closeModal() : openModal();
  };

  // Open modal
  const openModal = () => {
      $("html").addClass(isOpenClass).addClass(openingClass);
      setTimeout(() => {
          visibleModal = $modal;
          $("html").removeClass(openingClass);
      }, animationDuration);
      $modal[0].showModal();
  };

  // Close modal
  const closeModal = () => {
      visibleModal = null;
      $("html").addClass(closingClass);
      setTimeout(() => {
          $("html").removeClass(closingClass).removeClass(isOpenClass);
          $("html").css(scrollbarWidthCssVar, '');
          $modal[0].close();
      }, animationDuration);
  };

  // Close with a click outside
  $(document).on("click", (event) => {
      if (visibleModal === null) return;
      const isClickInside = $(visibleModal).find("article").has(event.target).length > 0;
      if (!isClickInside) closeModal();
  });

  // Close with Esc key
  $(document).on("keydown", (event) => {
      if (event.key === "Escape" && visibleModal) {
          closeModal();
      }
  });

Interactive Image Map Setup

The image map is initialized with the ImageMapster plugin, which is configured to not allow selection (highlighting) of map areas but to react to mouse events – this plugin’s documentation is top-notch.

$canvasMapContainer.mapster({
    enableAutoResizeSupport: true,
    autoResize: true,
    isSelectable: false,
    stroke: false,
    strokeColor: '00FF00',
    strokeWidth: 5,
    mapKey: 'data-key',
    fillOpacity: 0.0,
// ....

On clicking an image map area, the script fetches the area's data attributes (like name), updates the modal's content, and toggles the modal's visibility.

    onClick: function (data) {
        $infoPinNameModal.text(data.e.target.dataset.name);
        toggleModal();
    }

On mouseover, the tooltip's content is updated based on the hovered area's data attributes, and its position is dynamically calculated to appear near the cursor but adjusted to avoid overflowing the viewport.

    onMouseout: function() {
        $tooltip.hide();
    },
    onMouseover: function(data) {
        // ... see below for the dynamic positioning

Dynamic Positioning:

The tooltip's position is calculated based on the coordinates of the hovered area. The script ensures that the tooltip does not overflow the window edges by adjusting its position relative to the image map area's boundaries.

Position calculations take into account the current scroll position and the tooltip's dimensions to ensure it is always visible.

      const coords = $(this).attr('coords').split(',').map(coord => parseInt(coord, 10));
      const xCoords = coords.filter((_, i) => i % 2 === 0);
      const yCoords = coords.filter((_, i) => i % 2 === 1);
      const x1 = Math.min(...xCoords);
      const y1 = Math.min(...yCoords);
      const x2 = Math.max(...xCoords);
      const y2 = Math.max(...yCoords);
      const centerX = (x1 + x2) / 2;

      $infoPinNameTooltip.text(data.e.target.dataset.name);

      const infoWidth = $tooltip.width();
      const infoHeight = $tooltip.height();

      let positionX = "centre";
      if (x1 - infoWidth - padding < 0) {
          positionX = "left";
      } else if (x2 + infoWidth + padding > $canvasMapContainer.width()) {
          positionX = "right";
      }

      let positionY = "top";
      if (y1 - infoHeight - padding < $(window).scrollTop()) {
          positionY = "bottom";
      }

      const positionXmap = {
          "left": x2 + padding,
          "centre": centerX - infoWidth / 2,
          "right": x1 - infoWidth - padding
      };

      const positionYmap = {
          "top": y1 - padding - infoHeight,
          "bottom": y2 + padding
      };

      $tooltip.css({
          top: positionYmap[positionY],
          left: positionXmap[positionX],
      }).show();

In a real-world production app, this script probably should exist in its own file, however, as this is just a toy project, it is currently inlined along with the HTML code.

Conclusion

This project has been an enriching learning experience, and although the results haven't fully met my expectations yet, I believe it's time for a pause. Juggling multiple interests and responsibilities, including learning, writing, and teaching, demands that I prioritise my commitments.

In the meantime, I will keep a list of the ideas that come to my mind to improve the results of the processes I have been describing here, and, if you have ideas on how I could improve this project or want to share your experiences with similar projects, please leave a comment below or reach out to me on Twitter.

If you are looking for all the code I have written so far, everything is on GitHub, feel free to use it for your own projects!

Documenting my pin collection with Segment Anything: Part 3

Antonio Feregrino — Fri, 14 Jun 2024 04:11:08 +0000

In my last post, I showed how to use the Segment Anything Model with prompts to improve the segmentation output, in it I decided that using bounding boxes to prompt the model yielded the best results for my purposes.

In this post I will try to describe a tiny, but slightly complex, app I made with the help of GitHub Copilot. This app is made with vanilla JavaScript and HTML uses SAM in the backend to extract the cutouts along with the bounding polygons for further use in my ultimate collection display.

Before we dive into a mess of code, have a look at the app I created:

(if you just want the code, go to the end of this post)

Requirements

The app I created needed to:

Allow me to draw boxes on an image,
perform image segmentation using the drawn box as a prompt.
Once the image semgmentation is done, show the candidate cutouts and allow me to select the best and,
give each one of them a unique identifier and a name.

Solution

In the end, I created a client-server app:

For the backend, the obvious decision was Python, since the Segment Anything Model is readily accessible in that language, and it is the language I know the most.

The client app is done with vanilla JavaScript, CSS and HTML; Using the canvas API it is effortless to draw bounding boxes over an image, and all the mouse events help us send the necessary data to extract a cutout.

Implementation

Project Structure Overview

The project consists of several interconnected components, including a FastAPI backend, HTML5 and JavaScript for the frontend, and CSS for styling. Here’s a breakdown of the key files and their roles:

web/labeller.py: The core backend file built with FastAPI. It handles route definitions, image manipulations, and interactions with the image segmentation model.
web/static/app.css: Contains CSS styles to enhance the appearance of the application.
web/static/app.js: Manages the frontend logic, particularly the interactions on the HTML5 Canvas where users draw annotations.
web/templates/index.html.jinja: The Jinja2 template for the HTML structure, dynamically integrating backend data.
web/resources.py: Manages downloading necessary resources like images and model files.
web/sam.py: Integrates the machine learning model for image segmentation.

Out of these files, perhaps the most important ones are the one that manages the frontend logic and the core of the app; will try my best to describe them below:

`web/static/app.js`

The script starts by setting up an environment where users can draw rectangular boxes on an image loaded into a canvas element. This functionality is an essential part of the app, since these boxes will be the prompts to the segment anything model in the backend.

1. Initialization on Window Load:

The script begins execution once the window has fully loaded, ensuring all HTML elements, especially the <canvas> and <img>, are available to manipulate.

window.addEventListener('load', function() {
    const canvas = document.getElementById('canvas');
    const ctx = canvas.getContext('2d');
    const img = document.getElementById('image');
    const results = document.getElementById('results');
    const contours = [];

2. Canvas and Context Setup:

Here, the canvas dimensions are set to match the image dimensions, and the image is then drawn onto the canvas. This forms the base on which users will draw the bounding boxes.

    canvas.width = img.width;
    canvas.height = img.height;
    ctx.drawImage(img, 0, 0);

3. Drawing Interactions:

Listeners for mousedown, mousemove, and mouseup events are added to the canvas to handle drawing:

Create a variables to hold the mouse position and the drawing state:

  let startingMousePosition = { x: 0, y: 0 };
  let isDrawing = false;

Start Drawing: On mousedown, it captures the starting point where the user begins the draw interaction.

canvas.addEventListener('mousedown', function(e) {
    startingMousePosition = { x: e.offsetX, y: e.offsetY };
    isDrawing = true;
});

Drawing in Progress: The mousemove event updates the drawing in real-time, showing a visual feedback of the rectangle being drawn on the canvas via the redrawCanvas and the drawBox functions.

canvas.addEventListener('mousemove', function(e) {
    if (isDrawing) {
        const currentX = e.offsetX;
        const currentY = e.offsetY;
        redrawCanvas();
        drawBox(startingMousePosition.x, startingMouse-Position.y, currentX - startingMousePosition.x, currentY - startingMousePosition.y);
    }
});

End Drawing: The mouseup event finalises the drawing and optionally sends the drawn box data to the server using the sendBoxData function.

canvas.addEventListener('mouseup', function(e) {
    if (isDrawing) {
        const endX = e.offsetX;
        const endY = e.offsetY;
        const box = {
            x1: Math.min(startingMousePosition.x, endX),
            y1: Math.min(startingMousePosition.y, endY),
            x2: Math.max(startingMousePosition.x, endX),
            y2: Math.max(startingMousePosition.y, endY)
        };
        sendBoxData(box);
        redrawCanvas();
        isDrawing = false;
    }
});

4. Server Interaction:

Upon completing a drawing, the box data is sent to the server using a fetch call. This allows the application to process the box. This processing involves using the segment anything model to extract the candidate cutouts and returning them to be presented to the user using the createForm function:

function sendBoxData(box) {
    fetch('/cut', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(box)
    })
    .then(response => response.json())
    .then(data => {
        results.innerHTML = '';
        data.results.forEach(result => {
            const t = createForm(result);
            results.appendChild(t);
        });
    })
    .catch(error => {
        console.error('Error:', error);
    });
}

5. Dynamic Form Generation:

Responses from the server include an image and an identifier, that is used to create and populate forms dynamically. Using Mustache.js for templating, the script generates HTML forms based on this data, which are then inserted into the DOM, allowing further user interaction.

const template = `
<div class="form-container">
    <div class="image-container">
        <img src="{{image}}">
    </div>
    <form action="/select_cutout" method="POST">
        <input type="text" name="name" placeholder="Name">
        <input type="hidden" name="id" value="{{id}}">
        <button type="submit">Select</button>
    </form>
</div>
`;

function createForm(result) {
    const rendered = Mustache.render(template, {
        id: result.id,
        image: result.image
    });

    const div = document.createElement('div');
    div.innerHTML = rendered.trim();
    return div.firstChild;
}

6. Utility Functions:

Several utility functions handle repetitive tasks:

redrawCanvas: Clears and redraws the canvas, useful for updating the view when needed.

function redrawCancas() {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    ctx.drawImage(img, 0, 0);
    contours.forEach(points => {
        drawPolygon(points);
    });
}

drawBox: Draws rectangles based on coordinates, the pins that have already been cut.

function drawBox(x, y, width, height, fill = false) {
    ctx.beginPath();
    ctx.rect(x, y, width, height);
    ctx.strokeStyle = 'red';
    if (fill) {
        ctx.fillStyle = '#ff000033';
        ctx.fill();
    }
    ctx.stroke();
}

drawPolygon: A more complex drawing function that can render polygons, used here to illustrate the capability to handle various shapes.

function drawPolygon(points) {
    ctx.beginPath();
    ctx.moveTo(points[0][0], points[0][1]);
    points.forEach(point => {
        ctx.lineTo(point[0], point[1]);
    });
    ctx.fillStyle = '#ff0000FF';
    ctx.closePath();
    ctx.fill();
    ctx.stroke();
}

These utility functions are essential for managing the visual elements on the canvas, allowing for efficient updates and complex graphical operations like drawing polygons and boxes.

`web/labeller.py`

1. Environment Setup and Initialisations:

The application begins with setting up the FastAPI environment and configuring static file paths and template directories. This setup is crucial for serving static content like images and CSS, and for rendering HTML templates.

app = FastAPI()
app.mount("/static", StaticFiles(directory="web/static"), name="static")
templates = Jinja2Templates(directory="web/templates")

The image to annotate and the model to use are downloaded or loaded into the application, ensuring that all necessary components are available for image processing and analysis.

resources = download_resources()

2. Image Loading and Preprocessing:

The image to annotate is loaded and preprocessed. This involves reading the image from a path, converting it to an appropriate colour format, and resizing it to a manageable size. This resizing is particularly important to ensure that the processing is efficient.

original_image = cv2.cvtColor(cv2.imread(str(resources["image_path"])), cv2.COLOR_BGR2RGB)
image_to_show = Image.fromarray(original_image)
image_to_show = image_to_show.resize((desired_image_width, int(image_to_show.height * ratio)))

3. Model Loading for Image Segmentation:

The segmentation model is loaded, configured, and prepared to predict masks based on user-defined annotations.

mask_predictor = get_mask_predictor(resources["model_path"])
mask_predictor.set_image(original_image)

4. Web Routing and Request Handling:

FastAPI routes handle different types of web requests. The main route serves the annotated image along with tools for the user to interact with. This is done through a both POST and GET request which renders an HTML template with the image and existing annotations.

@app.get("/")
@app.post("/")
def get_index(request: Request):
    img = turns_image_to_base64(image_to_show)
    existing_cutouts = []
    for file in os.listdir(selected_folder):
        if file.endswith(".json"):
            with open(f"{selected_folder}/{file}") as f:
                metadata = json.load(f)
                existing_cutouts.append(metadata)
    data = {
        "request": request,
        "image": img,
        "width": image_to_show.width,
        "height": image_to_show.height,
        "existing_cutouts": existing_cutouts,
        "ratio": ratio,
    }
    return templates.TemplateResponse("index.html.jinja", data)

5. Image Annotation and Segmentation:

When a user submits a bounding box annotation, the coordinates are scaled back to the original, unresized image and processed to segment the image. The application uses the model to predict the mask and then extracts the relevant part of the image based on these masks.

Apart from the extracted image cutouts, metadata is saved to a temporary folder, so that when a user selects a given cutout, they can be recovered.

@app.post("/cut/")
def post_cut(request: Request, box: BoundingBox):
    box = np.array([box.x1, box.y1, box.x2, box.y2])
    original_box = box / ratio
    masks, _, _ = mask_predictor.predict(box=original_box, multimask_output=True)
    results = []
    for mask in masks:
        uuid = str(uuid4())
        cutout, bbox = extract_from_mask(original_image, mask)
        base64_cutout = turns_image_to_base64(cutout, format="PNG")
        results.append({
            "id": uuid,
            "image": base64_cutout,
        })
        metadata = {
            "uuid": uuid,
            "bbox": {"x1": bbox[0], "y1": bbox[1], "x2": bbox[2], "y2": bbox[3]},
            "original_bbox": {
                "x1": original_box[0],
                "y1": original_box[1],
                "x2": original_box[2],
                "y2": original_box[3],
            },
            "polygons": [poly.tolist() for poly in sv.mask_to_polygons(mask)],
        }
        with open(f"{temp_folder}/{uuid}.png", "wb") as f:
            cutout.save(f, format="PNG")
        with open(f"{temp_folder}/{uuid}.json", "w") as f:
            f.write(json.dumps(metadata))

    return {"results": results}

6. Handling user selection of cutouts

After users mark and submit their desired cutout, this endpoint manages the user's selection, moving the annotated image data from temporary storage to a selected folder and updating its associated metadata with new user-provided information (like a name for the annotation):

@app.post("/select_cutout/")
def post_select_cutout(request: Request, id: Annotated[str, Form()], name: Annotated[str, Form()]):
    import shutil

    # Move the PNG image from temporary to selected folder
    shutil.move(f"{temp_folder}/{id}.png", f"{selected_folder}/{id}.png")

    # Load the existing metadata for the selected annotation
    with open(f"{temp_folder}/{id}.json") as f:
        metadata = json.load(f)
        metadata["name"] = name  # Update the name field with user-provided name

    # Write the updated metadata back to the selected folder
    with open(f"{selected_folder}/{id}.json", "w") as f:
        f.write(json.dumps(metadata))

    # Redirect back to the main page after processing is complete
    return RedirectResponse("/")

7. Utility Functions for Image Manipulation:

Several utility functions facilitate image manipulation tasks like cropping the image based on the mask:

def extract_from_mask(image, mask, crop_box=None, margin=10):
    image_rgba = np.zeros((image.shape[0], image.shape[1], 4), dtype=np.uint8)
    alpha = (mask * 255).astype(np.uint8)
    for i in range(3):
        image_rgba[:, :, i] = image[:, :, i]
    image_rgba[:, :, 3] = alpha
    image_pil = Image.fromarray(image_rgba)
    if crop_box is None:
        bbox = Image.fromarray(alpha).getbbox()
        crop_box = (
            max(0, bbox[0] - margin),
            max(0, bbox[1] - margin),
            min(image_pil.width, bbox[2] + margin),
            min(image_pil.height, bbox[3] + margin),
        )
    cropped_image = image_pil.crop(crop_box)
    return cropped_image, crop_box

And converting images to a web-friendly format to be sent as responses to the front end.

def turns_image_to_base64(image, format="JPEG"):
    buffered = BytesIO()
    image.save(buffered, format=format)
    img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
    return "data:image/jpeg;base64," + img_str

These functions ensure that the application can handle image data efficiently and render it appropriately on the web interface.

Libraries I used

FastAPI: A web framework for building APIs (and web pages). It is used as the backbone of the application to handle web requests, routing, and server logic, and orchestrates the overall API structure. Although not used here, FastAPI provides robust features such as data validation, serialisation, and asynchronous request handling.
OpenCV (cv2): OpenCV is a powerful library used for image processing operations. It is utilised to read and transform images, such as converting colour spaces and resizing images, which are essential pre-processing steps before any segmentation tasks.
NumPy: This library is fundamental for handling arrays and matrices, such as for operations that involve image data. NumPy is used to manipulate image data and perform calculations for image transformations and mask operations.
PIL (Pillow): The Python Imaging Library (Pillow) is used for opening, manipulating, and saving many different image file formats. Here it is specifically used to convert images to different formats, handle image cropping, and integrate alpha channels to extract the cutouts.
Supervision: Although not yet a widely known library, this powerful library provides a seamless process for annotating predictions generated by various object detection and segmentation models; in this case, I used it to evaluate the results of SAM, and to convert its predictions to Polygon masks.
Mustache.js: This is a templating engine used for rendering templates on the web. In your application, Mustache.js is used to dynamically create HTML forms based on the data received from the server, such as image cutouts and identifiers.

Closing thoughts

I hope I did not bore you to death with some of these deep dives into my (and some of my friendly coding assistant's) code – I tried my best to be thorough. But if you still have doubts do not hesitate to reach out to me.

Here is the code by the way.

Believe it or not, this app is not complete yet, there is some other functionality yet to be implemented:

A way to easily recover the selected cutouts
A way to match already existing cutouts so that when the user selects the same cutout we don't duplicate entries
A way to handle updated canvas pictures, because what is going to happen when I inevitably expand my collection?

I will explore these details in the next blog post in the series.

Documenting my pin collection with Segment Anything: Part 2

Antonio Feregrino — Tue, 11 Jun 2024 17:33:13 +0000

In a previous post I shared my desire to create an interactive display for my pin collection. In it, I decided to use Meta AI’s Segment Anything Model to extract cutouts from my crowded canvas:

But as I discovered, with such a crowded and detailed image, the automatic segmentator struggles with identifying all the pins individually.

Luckily for me, segment anything, has other ways of extracting masks from an image, via the use of prompts; there are two kinds of prompts: boxes and points.

In this post, I will show you these two features.

Load the model and image

First thing, we load the model:

import torch
from segment_anything import sam_model_registry

sam = sam_model_registry['vit_b'](checkpoint='sam_vit_b_01ec64.pth').to(device=torch.device('cpu'))

Next, we load the image that contains the pins. We use OpenCV for reading the image and convert it to RGB color space, as the model expects the input in this format:

import cv2

image = cv2.imread('pins@high.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Create a Segment Anything Model Predictor

Segment Anything offers a predictor that requires a model to be instantiated. Then we need to set an image using set_image, which will process the image to produce an image embedding; The predictor will store this embedding and will use it for subsequent mask prediction.

from segment_anything import SamPredictor

mask_predictor = SamPredictor(sam)
mask_predictor.set_image(image_rgb)

Prompting with a box

To prompt SAM with a bounding box it is necessary to define a NumPy array, where the order of the values is x1,y1,x2,y2, for example:

box = np.array([759, 913, 1007, 1174])

The image is just an illustration, the model operates on the image alone with the box as a NumPy array

To prompt the model, one has to call the predict method on the mask_predictor:

masks, scores, logits = mask_predictor.predict(
    box=box,
    multimask_output=True,
)

The result will be a triplet, with the following values:

masks: The output masks in CxHxW format, where C is the number of masks, and (H, W) is the original image size.
scores: An array of length C containing the model's predictions for the quality of each mask.
logits: An array of shape CxHxW, where C is the number of masks and H=W=256. These low resolution logits can be passed to a subsequent iteration as mask input.

By the way, if you specify multimask_output = True you will get three masks for each prediction, I find this ability truly useful, as some of the generated masks are not usable, so I rather keep my options with multiple masks.

Ultimately, the result will be masks that when applied to the image, yield the following resit:

Prompting with points

The input to the model is comprised of two arrays:

point_coords: A Nx2 array of point prompts to the model. Each point is in (X,Y) in pixels
point_labels: A length N array of labels for the point prompts. 1 indicates a foreground point and 0 indicates a background point.

point_coords = np.array([
    (box[0]+40, box[1]+50),
    (box[0]+150, box[1]+160),
    (box[0]+200, box[1]+80),
])

point_labels = np.array([1, 1, 1])

If we visualise the points, they look like this:

The call to predict looks like this:

masks, scores, logits = mask_predictor.predict(
    point_coords=point_coords,
    point_labels=point_labels,
    multimask_output=True,
)

And the results… well, they're not great:

Speed

When prompted the model takes significantly less time (<1 second) when compared to my previous attempt using the automatic segmentator.

Conclusion

For my pin collection, manual prompting with bounding boxes proved more effective than using point prompts.

In my next entry, I will demonstrate how I integrated this model into a custom web-based application, enhancing the interactive display of my collection.

Documenting my pin collection with Segment Anything: Part 1

Antonio Feregrino — Sun, 09 Jun 2024 19:46:04 +0000

As a hobby that spans across various cultures and ages, pin collecting allows enthusiasts like me to hold onto pieces of art, history, and personal milestones. Whether they're enamel pins from theme parks or vintage lapel pins, I believe each piece in a collection tells a unique story.

In this blog series, I'm excited to share my journey of documenting my extensive pin collection, which consists of gifts, purchases, and serendipitous finds from the streets.

Version 1

With the help of ChatGPT I built a simple website that allows you to zoom into the whole canvas so that you can look at the pins in more detail, and while I liked the result (you can view it here!), it is far from what I wanted to document my collection.

My ideal collection display

My ideal solution is to create an interactive website where viewers can hover over each pin to see it highlighted, and click for a detailed view and additional information about the pin's background. Using the canvas image shown above, I embarked on a project to bring this vision to life, leveraging modern machine learning techniques.

Enter Segment Anything

To extract the cutouts from the canvas I thought of using an image segmentation algorithm to extract the silhouettes of the pins. Now, the last time I tried to do something related object/edge detection, the model to go with was YOLO V2, with great surprise I discovered that advancements have led to YOLO V10!

However, Intrigued by the capabilities of the latest models, I decided to experiment with Meta AI's Segment Anything Model (SAM), which was released with the promise of being a powerful image segmentation model so I tried it

It turns out that now there is a V10 of YOLO, and it is more powerful than what I was already familiar with. But at the same time, I wanted to try the Segment Anything Model released by Meta AI… so that is what I did.

Installing SAM

I wanted to run everything locally, so I set out to install everything in my Mac M2, it was a bit tricky and involved a lot of trial and error, but here is what in the end worked for me:

1. Create a new Python environment

python -m venv .venv

The version I used to create the environment was 3.10.12

2. Install `torch`

I found there is specific Apple guidance on how to do this, given that on certain Macs it is possible to take advantage of the GPU:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

In the end, these are the versions I ended up having: torch==2.3.1, torchvision==0.18.1 and torchaudio==2.3.1.

3. Fix NumPy

For some reason, I ran into an issue with NumPy and torch, this StackOverflow answer helped me solving it, by re-installing it with the following command:

pip install numpy -I

The final numpy version was 1.26.4

4. Install Segment Anything

As far as I know, the only way to install the necessary code for SAM is through their GitHub repo:

pip install 'git+https://github.com/facebookresearch/segment-anything.git'

5. Download the SAM model

The models and the code for Segment Anything come separately, so to download a model to use:

wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

There are different versions of models, but sam_vit_b_01ec64 was my choice, mainly because as far as I know, it is the smallest.

6. Remaining tools

To develop and test the model, I used Jupyter, and to visualise the results of the image segmentation I used a package called supervision.

Accessing the SAM model

In order to use the model, it is necessary to open it using Python, here is where you can configure where the model should run (either GPU or CPU, for example), in the code below you will see me configuring the model vit_b, I also attempted to use MPS (metal performance shaders) however I found an error and I just decided to run everything in the CPU:

import torch
from segment_anything import sam_model_registry

# DEVICE = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
DEVICE = torch.device('cpu')
MODEL_TYPE = "vit_b"
CHECKPOINT_PATH='sam_vit_b_01ec64.pth'

sam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH).to(device=DEVICE)

Opening the image

import cv2

IMAGE_PATH= 'pins@high.jpg'
image = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

The cv2.cvtColor() function converts the colour space of an image. In this case, it's converting the colour format from BGR to RGB. This is often done because while OpenCV uses BGR, most image applications and libraries use RGB. The converted image is stored in the variable image_rgb.

Generating Automated Masks

SAM has different methods of generating masks, the one I wanted to try initially is by far the easiest one because all you need to do is provide an image and have the model generate the masks for you, all you need is to pass the sam variable (containing the model) to an instance of the SamAutomaticMaskGenerator:

from segment_anything import SamAutomaticMaskGenerator

mask_generator = SamAutomaticMaskGenerator(sam)

Then, to generating the masks is as easy as calling the generate method of the automatic generator passing the RGB image:

output_mask = mask_generator.generate(image_rgb)

SAM is indeed a very powerful model, much more powerful than what I need, at least out of the box, this is the result I get from running the entire image through SAM:

Upon running SAM, the results were not as expected. The model struggled to accurately detect all pins and sometimes misinterpreted parts of pins as separate entities.

I then decided to try to work on a smaller crop of the image, however, I got the same results:

If you are interested in how I managed to display the results, you can have a look at the function I wrote for this task:

import numpy as np
import supervision as sv

def view_masks(source, masks):
    """"
    Display the source image, the segmented image and the binary mask

    :param source: The source image in BGR format
    :param masks: The result of the automatic mask generator call
    """
    mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
    detections = sv.Detections.from_sam(sam_result=masks)
    dark = np.zeros_like(source)
    annotated_image = mask_annotator.annotate(scene=source.copy(), detections=detections)
    masked = mask_annotator.annotate(scene=dark, detections=detections)

    sv.plot_images_grid(
        images=[source, annotated_image, masked],
        grid_size=(1, 3),
        titles=['source image', 'segmented image', 'binary mask'])

Conclusion

It may be possible to modify the behaviour of the SamAutomaticMaskGenerator via arguments, however, when I modified some of these arguments I realised that (I did not know what I was doing, and) sometimes the kernel died on me. I suppose my laptop does not have enough memory to run some combinations.

While the initial attempts with SAM presented challenges, they provided valuable learning opportunities. In the next blog post, I will explore alternative methods and adjustments to enhance pin detection and achieve the interactivity I envision for my collection's display.

Mi biblioteca de MLOps

Antonio Feregrino — Thu, 03 Aug 2023 19:35:40 +0000

En los últimos tres años los temas en mi biblioteca personal han cambiado para inclinarse más hacia MLOps; He leído demasiados libros sobre el tema que finalmente puedo elegir mis favoritos; echemos un vistazo:

Mis favoritos

Building Machine Learning Powered Applications

Este libro de Emmanuel Ameisen (O'Reilly) fue el que me empujó a dejar mi trabajo como científico de datos y comenzar a mover mi carrera hacia MLOps.

El libro es muy ligero en contenido práctico, si bien trae código para practicar, las ideas que presenta son lo que más vale la pena; lo que me encantó de él es la vista de alto nivel que quita el enfoque del desarrollo del modelo de machine learning y lo pone en los componentes que rodean a una aplicación práctica de aprendizaje automático. Cómpralo en Amazon México o en otras partes del mundo.

Reliable Machine Learning

Este libro de Cathy Chen et al. (O'Reilly) fue un hallazgo realmente agradable. No habla de una tecnología en específico y, a pesar de no mencionar MLOps en el título, MLOps es el tema de todo el libro.

Pone ML a través de una lente del concepto de "reliability" (confiabilidad), cubriendo una amplia gama de temas, desde problemas que pueden ocurrir en la capa de datos y almacenamiento, pasando por alto algunas técnicas para calcular costos, hasta quién debe estar a cargo de la calidad del modelo. Cómpralo en Amazon México o en otras partes del mundo.

Machine Learning Engineering in Action

Este es un libro de Ben Wilson (Manning), es extremadamente denso, pero cada una de sus más de 500 páginas vale completamente la pena porque cubre todos los aspectos involucrados en poner en producción una aplicación exitosa de aprendizaje automático, incluida la planificación y el alcance de un proyecto, experimentación, pruebas, despliegue y seguimiento.

Este libro está lleno de ejemplos y diagramas que facilitan la comprensión de los conceptos. Muy recomendable. Cómpralo en Amazon México o en otras partes del mundo.

Effective Data Science Infrastructure

Lo que me encanta de este libro de Ville Tuulos (Manning) es que, a pesar de ser un libro técnico centrado en una herramienta, Metaflow, los conceptos que toca son atemporales cuando se trata de una buena plataforma de experimentación y despliegue de modelos de machine learning.

El autor se basa en su experiencia en la construcción de la infraestructura de ciencia de datos para una de las grandes empresas tecnológicas. Incluso si no estás interesado en Metaflow en absoluto, te recomiendo que consulte este libro. En mi caso, este libro fue una fuente de inspiración a la hora de diseñar la experiencia de ciencia de datos en mi empresa actual. Cómpralo en Amazon México o en otras partes del mundo.

Machine Learning Design Patterns

Si bien este libro de Valliappa Lakshmanan (O'Reilly) no trata sobre MLOps como tal, me pareció un libro de referencia interesante que ayuda a resolver algunos de los problemas comunes que un ingeniero relacionado con ML puede encontrar en el día a día.

No pienses en este libro como el que te guiará de principio a fin mientras crea una aplicación de aprendizaje automático; piensa en este libro como un libro de recetas al que puede seguir haciendo referencia. Hay algunas críticas con respecto a las opciones tecnológicas de los autores, pero para mí, el valor compartido en este libro supera la molesta promoción que los autores hacen de GCP, Tensorflow y BigQuery. Cómpralo en Amazon México o en otras partes del mundo.

Designing Machine Learning Systems

Todavía estoy leyendo este libro de Chip Huyen (O'Reilly), pero hasta ahora es prometedor, de ahí su posición en la lista. Actualizaré este listado cuando lo termine. Cómpralo en Amazon México o en otras partes del mundo.

Estos libros están buenos

Machine Learning Engineering with Python

Un libro de Andrew P. McMahon (Packt) que trata de cubrir muchos temas pero que la cantidad de páginas impide darle profundidad a alguno de ellos. Lo bueno es que cubre algunos de los aspectos que otros libros no cubren, como los flujos de trabajo de Git y el empaquetado de proyectos en Python.

También tiene algunas preguntas de autoevaluación que son buenas para evaluar su conocimiento a medida que avanza en el libro. Cómpralo en Amazon México o en otras partes del mundo.

MLOps engineering at Scale

Un libro relativamente pequeño de Carl Osipov (Manning), es una introducción ligera a PyTorch y cómo servirlo en AWS. Tiene buenas ideas, pero no puedo deshacerme de la sensación de que su tamaño se interpone en el camino de una explicación profunda. Si trabajas con las tecnologías que cubre el libro, sería una buena adición a tu biblioteca. Cómpralo en Amazon México o en otras partes del mundo.

Machine Learning Engineering

Un buen libro de referencia de Andriy Burkov (True Positive Inc) es una descripción general concisa, aunque desorganizada, del campo de la ingeniería de aprendizaje automático. No es un libro que te va a cambiar radicalmente tu visión sobre machine learning en producción, pero tiene algunas ideas valiosas.

Se lo recomiendo a ese científico de datos que recién comienza y quiere comunicarse con científicos de datos experimentados o ingenieros de aprendizaje automático, pero si ya tienes experiencia, no será muy útil. Cómpralo en Amazon México o en otras partes del mundo.

Introducing MLOps

Como sugiere el título, este libro de Mark Treveil et al. (O'Reilly) es una buena introducción a MLOps; Recomiendo este libro si eres completamente nuevo en el tema. Léelo sin esperar demasiada profundidad técnica y más una descripción general de alto nivel. Cómpralo en Amazon México o en otras partes del mundo.

Estos no te los recomiendo

Practical MLOps

No puedo entender qué se trata este libro de Noah Gift et al. (O'Reilly); definitivamente no se trata de MLOps. Hay un montón de ejemplos irrelevantes y experiencias personales que se sienten forzadas. Pero lo que acaba con el libro son los cambios de tema tan radicales de un capítulo a otro; me recordó cuando tenía que trabajar en equipo con personas con las que no me llevaba bien en la universidad y cada quien hacía su parte por separado.

Probablemente uno de los peores libros de O'Reilly que he leído en mi carrera tecnológica. Mantente alejado de él.

Conclusión

Espero que esta lista te dé una buena idea de qué libros agregar a tu biblioteca (o cuáles evitar), y si ya ha leído los que menciono, siéntase libre de comentar sobre ellos o discrepar conmigo en el comentarios.

Siempre estoy abierto a sugerencias de libros, así que déjalas también en los comentarios.

Automating Sketch with GitHub Actions

Antonio Feregrino — Sun, 05 Jun 2022 15:57:18 +0000

Sometimes I use Sketch to create graphics for my content; however, I have always found it challenging to keep track of my work. Saving files here and there, versioning them with what I thought were sensible name schemes, only to realise that following such schemes requires a lot of mental effort.

My dream was always to be able to store my Sketch files in a Git repo; for some reason, I always thought this would be impossible, that Sketch's files were binaries impossible to properly version.

In the following post, I'll explain why I was wrong and how is it that you can version your Sketch files as plain text documents.

So I start with a base document, nothing too complex as I don't want to overcomplicate things:

I have named this document tcsg.sketch; then, the next thing to understand is how Sketch actually saves these documents as a single file. The most important thing is that a .sketch file is nothing more than a .zip file with a bunch of .json files inside.

De-sketchify

We know that Git does not play nice with binary files but plays very nicely with plain text files – and JSON is just that. Why not decompress the .sketch file and keep track of the .json files alone; after all, when we want to open our file in Sketch again, we can compress those files again.

Decompress

With this in mind, we can use:

unzip -d tcsg tcsg.sketch

To unzip the files into the tcsg repository. A quick glance into the newly unzipped repository gives us the following repo structure:

tcsg
├── document.json
├── meta.json
├── pages
│   └── 4FB4BFA1-4E01-4EE8-9962-F07A85622B2F.json
├── previews
│   └── preview.png
└── user.json

I will not discuss the details of the files, as they are well explained in Sketch's documentation.

We should note that for optimisation purposes, the JSON files are saved with no indentation, and all the contents are stored in a single line. As I want to embrace the full power of Git, I need to format these files to be able to view the diffs.

Indent files

There is a useful tool to work with JSON fines from the CLI, it is named jq, I will use it to format the files with indentation:

find ./tcsg -type f -name "*.json" \
    -exec sh -c "jq . {} | sponge {}"  \;

An explanation of the above command:

find ./tcsg: Searches for objects in the ./tcsg folder
-type f: Specifies, with f that we are looking for a regular file
-name "*.json": Filters the files we will find to all those ending in .json
-exec [command]: Executes a command for each file; within this command we can use {} to refer to the file name. The command to execute should be followed by \;
sh -c "jq . {} | sponge {}": In this case, the command that will be executed for each file is jq . [filename] | sponge [filename].

Delete previews (optional)

There is a previews folder where the last page edited by the user is preserved to be used as a thumbnail (and a preview) for the document. Again, this is an image, and for the time being, I will delete it since it is not needed for the file format.

rm -rf ./tcsg/previews/preview.png

And that is it! we now have a Sketch document as a series of plain text files.

Sketchify

Of course, I want this process to be reversible – I want to be able to open my documents in Sketch again.

Create a temporary folder

I rather not modify the original directory, so I will create a copy of the working directory:

cp -r ./tcsg ./tcsg_temp

Un-indent files

When I decompressed the files, I realised that the JSON files contained all the information in a single line; to respect that format, let's apply the jq -c . {} | sponge {} command to all those files. It is pretty similar to the format command above, with the difference of the -c flat of jq, which "compresses" the output.

find ./tcsg_temp -type f -name "*.json" \
    -exec sh -c "jq -c . {} | sponge {}"  \;

Remove previews (optional)

Again, let's delete any preview image, for consistency with the process above:

rm -rf ./tcsg_temp/previews/preview.png

Putting everything together

I placed all the above code into a single file called desketchify.sh:

#!/usr/bin/env bash

unzip -o -d tcsg tcsg.sketch

find ./tcsg -type f -name "*.json" \
    -exec  sh -c "jq . {} | sponge {}" \;

rm -rf ./tcsg/previews/preview.png

Compress

Finally, in the compression step, we need to change directory to the temporary folder I have been working on. Then apply the compression step using the zip utility:

cd ./tcsg_temp; zip -r -X ../tcsg.sketch *

The flag -r specifies that zip should recursively compress the files; the -X flag specifies that the compression should not save any extra file attributes.

At the end of this command I should have a .sketch file that can be opened in the app.

Cleanup

Lastly, let's clean up what I just did:

cd ..; rm -rf ./tcsg_temp

Putting everything together

I placed all the above code into a single file called sketchify.sh:

#!/usr/bin/env bash

cp -r ./tcsg ./tcsg_temp

find ./tcsg_temp -type f -name "*.json" \
    -exec  sh -c "jq -c . {} | sponge {}" \;

rm -rf ./tcsg_temp/previews/preview.png

cd ./tcsg_temp; zip -r -X ../tcsg.sketch *

cd ..; rm -rf ./tcsg_temp

Exporting artboards

But why stop there? what if I want to export the contents of the file as images? This will make it easy to share the assets with people who do not have Sketch installed at all!

This is surprisingly easy using GitHub Actions; all I need to do is use a kind of hidden gem in the Sketch ecosystem: their sketchtool utility, read more about it here. It allows you to interact with Sketch documents without human interaction.

In particular, the command that I am interested in the most is the one that exports artboards: sketchtool export artboards [file].

The tool itself is free to use for my purposes, but we need to download Sketch, I wrote the following code to achieve that:

wget -O sketch.zip \
    https://download.sketch.com/sketch-88.1-145978.zip
unzip -qq sketch.zip
Sketch.app/Contents/MacOS/sketchtool -v

Leaving the sketchtool accesible via the Sketch.app/Contents/MacOS/sketchtool command. Obviously, be mindful of the version you are working with.

Full automation

It is finally time to put everything together using GitHub actions, I want to run all these steps only when the source files of the Sketch document change:

name: Sketchify

on:
  workflow_dispatch:
  push:
    branches: [ "main" ]
    paths:
      - 'tcsg/**'

The jobs are organised in steps, where each step performs one and only one action:

jobs:
  generate_assets:
    name: Generate assets
    runs-on: macos-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      - name: Install dependencies
        run: |
          brew install jq
          brew install moreutils
      - name: Create Sketchfile
        run: ./sketchify.sh
      - name: Install Sketch
        run: |
          wget -O sketch.zip https://download.sketch.com/sketch-88.1-145978.zip
          unzip -qq sketch.zip
          Sketch.app/Contents/MacOS/sketchtool -v
      - name: Export artboards
        run: Sketch.app/Contents/MacOS/sketchtool export artboards tcsg.sketch --output=export --formats=jpg --scales=1,2
      - name: Save exported images
        uses: actions/upload-artifact@v3
        with:
          name: images
          path: export/
      - name: Save generated Sketch file
        uses: actions/upload-artifact@v3
        with:
          name: sketch-file
          path: tcsg.sketch

This will rebuild my Sketch file every time new changes are made to the repository and will export all the artboards in it. The best part? the artefacts will be available for download in the GitHub UI:

Conclusion

I consider this to be a pretty decent way to store Sketch files as assets in a Git repository; of course, depending on the changes you make, the diffs may still be monstrous; but at least they are more trackable than as a single zip file.

To use the code described in this post you will need to make some adjustments to it to refer to your own files.

So, tell me, do you use Sketch? I hope this post was useful for you as it was helpful for me, I discovered so many things about Sketch. If you have any doubts, let me know on Twitter at @feregri_no. As always, find the code for this post in GitHub. Happy Sketch-ing.

Configuring GitHub Actions – Tweeting from a lambda

Antonio Feregrino — Mon, 14 Feb 2022 11:38:13 +0000

Here comes the automation part using GitHub through a CI/CD pipeline. The first thing we're going to do is create a file called aws.yml in the .github/workflows folder, as the extension suggests is a file that follows the YAML format.

The first thing we are going to specify is the name of the pipeline and the conditions under which it should be executed:

name: Build and deploy lambda-cycles image

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
# Job definitions

For now I want the pipeline to run every time someone puts something into main and every time someone opens a pull request to main.

The next thing is to define the jobs that are part of the workflow – in this case we will have two: one to prepare our application and the other to publish it.

Preparing the image – build

To define a job we must specify the steps that form part of it, individually you can specify a more friendly name and the type of runner in which it is executed. We won't be using anything complicated, so ubuntu-latest works fine for us.

  build:
    name: Build
    runs-on: ubuntu-latest
    steps:

The next thing to do is to specify the steps that are part of the job:

Steps

We need to get a copy of our newly pushed code to main, we use the checkout action:

    - name: Checkout
      uses: actions/checkout@v2

Since we are going to interact with AWS, we need to configure the credentials in the runner*, Amazon offers an action for this, what we must specify are our credentials (which we previously set as secrets in our *repo).

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: eu-west-1

This couple of steps are specific to my implementation since I am using pipenv, so it is necessary to install Python, then pipenv and install the dependencies:

    - name: Set up Python 3.8
      uses: actions/setup-python@v1
      with:
        python-version: 3.8

    - name: Install pipenv
      run: |
        pip install pipenv
        pipenv install

The next three steps are all about creating the image that will be used to create our lambda instance.

The first step calls make container, the utility I added in past posts to build and tag as lambda-cycles. The second step exports this image to a compressed file. The third step stores the newly exported Docker image as an artifact, which we will use in the next job, were we will deploy it.

    - name: Build lambda-cycles image
      run: make container

    - name: Pack docker image
      run: docker save lambda-cycles > ./lambda-cycles.tar

    - name: Temporarily save Docker image and dependencies
      uses: actions/upload-artifact@v2
      with:
        name: lambda-cycles-build
        path: |
          ./shapefiles/
          ./requirements.txt
          ./lambda-cycles.tar
        retention-days: 1

We must configure, initialize and finally, plan the creation of the infrastructure using terraform. For the first action, Hashicorp offers a pre-defined action, for the following two using the terraform console tool is enough:

    - name: Set up terraform
      uses: hashicorp/setup-terraform@v1

    - name: Terraform init
      run: terraform -chdir=terraform init

    - name: Terraform plan
      run: terraform -chdir=terraform plan

Creating infrastructure in AWS

Once GitHub Actions has finished the build job, we can move on to the deploy job. To define it (in addition to the name and runner information) I indicate that it depends on the build job and very importantly, that it should only be executed when the branch that is going to execute this job is the main branch, see the if instruction?.

    deploy:
    name: Deploy
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'

    steps:

Steps

As usual, we get a copy of the code with actions/checkout@v2:

    - name: Checkout
      uses: actions/checkout@v2

We configure our credentials, remember, each job is executed in a different runner:

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: eu-west-1

Do you remember that in the previous job we created an artifact named lambda-cycles-build that contained a Docker image and some other dependencies? – well, now we are going to download it, and after that we will use docker load to import the image and make it available to be used by docker.

    - name: Retrieve saved Docker image
      uses: actions/download-artifact@v2
      with:
        name: lambda-cycles-build
        path: ./

    - name: Docker load
      run: docker load < ./lambda-cycles.tar

Finally, we configure terraform again, we initialise it and lastly, we apply the planned changes. Note that we are using the -auto-approve option so that the changes are automatically approved without the need for human interaction.

    - name: Set up terraform
      uses: hashicorp/setup-terraform@v1

    - name: Terraform init
      run: terraform -chdir=terraform init

    - name: Terraform apply
      run: terraform -chdir=terraform apply -auto-approve

And that is it, with this concludes the 6-part series explaining how to automatically build and deploy a lambda from GitHub.

This is how the repository looks like by the end of this post.

Remember that you can find me on Twitter at @feregri_no to ask me about this post – if something is not so clear or you found a typo. The final code for this series is on GitHub and the account tweeting the status of the bike network is @CyclesLondon.

Infrastructure with Terraform – Tweeting from a lambda

Antonio Feregrino — Mon, 14 Feb 2022 11:37:51 +0000

Since we are working in AWS with a lambda we need to create infrastructure in there.

As a programmer I like to define everything in code, however infrastructure provisioning is something that until recently needed to be managed manually – either through a graphical interface or a CLI with limited scripting capabilities.

Over the years, tools have emerged that brought us closer to the dream of being able to create infrastructure just by defining it in code, tools such as Ansible, CloudFormation and Terraform allow us to do just that. And it is precisely the last one that I chose to create the necessary elements for this series of posts.

It is not my interest to explain to you how Terraform works (I don't even know properly myself, in this post I did the minimum for the lambda to work). The way I present this post is by describing the content of the terraform/main.tf file that will contain the infrastructure.

Providers

Terraform interacts with remote systems (such as AWS) through plugins; these plugins are known as providers.

Each terraform module must specify the providers it needs via the block required_providers, each provider has a name, a location, and a version. For example, in the lambda example that I am going to post, I am using 2 providers:

aws, which exists in hashicorp/aws any version adhering to 3.27.X will work
null, it is an special provider, I'll tell you more about it later.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.27"
    }

    null = {
      version = "~> 3.0.0"
    }
  }
  required_version = ">= 0.14.9"

  backend "s3" {
    bucket = "feregrino-terraform-states"
    key    = "lambda-cycles-final"
    region = "eu-west-1"
  }
}

Backend configuration

Within the terraform configuration block you can also see that there is another block defined as backend "s3", this block helps us specify where the state file will be located, in this file we keep the state of the infrastructure that we have created with terraform so far. As I discussed in the first post of the series, this file will exist in an S3 bucket, the specification of which we put in the backend block.

Provider configuration

Some providers require extra configuration, for example, AWS requires us to configure things like the region we want to connect to, the profile and the credentials we are going to use. Although the recommendation is that you do not put passwords or secrets in code, for example, in the AWS configuration I have:

provider "aws" {
  profile = "default"
  region  = "eu-west-1"
}

Data Sources

Terraform allows us to access data defined outside our configuration files, through data blocks, through these we can access information about the user who is executing commands in AWS, using aws_caller_identity:

data "aws_caller_identity" "current_identity" {}

Local Values

I like to think of local values as variables within each module, and we must define them within a locals block; locals can also take values from other sources, such as variables or data sources to simplify access to them:

locals {
  account_id          = data.aws_caller_identity.current_identity.account_id
  prefix              = "lambda-cycles-final"
  ecr_repository_name = "${local.prefix}-image-repo"
  region              = "eu-west-1"
  ecr_image_tag       = "latest"
}

AWS

Secretos

Given the nature of the service I am trying to deploy, it is necessary to access the secrets stored in the AWS secret manager, these must be specified as data sources, with data blocks, in the case of secrets, it is necessary to access the secret with aws_secretsmanager_secret and then to the latest version of it with aws_secretsmanager_secret_version:

data "aws_secretsmanager_secret" "twitter_secrets" {
  arn = "arn:aws:secretsmanager:${local.region}:${local.account_id}:secret:lambda/cycles/twitter-2GMvKu"
}

data "aws_secretsmanager_secret_version" "current_twitter_secrets" {
  secret_id = data.aws_secretsmanager_secret.twitter_secrets.id
}

ECR repository

As the lambda is going to be deployed using a docker container it is necessary to create a repository in ECR, we can use the aws_ecr_repository resource by specifying the repository name from one of the local variables:

resource "aws_ecr_repository" "lambda_image" {
  name                 = local.ecr_repository_name
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = false
  }
}

Creating a Docker image

Once the repository is created, it is necessary to upload an image to it, however Terraform is used to define infrastructure, not to perform tasks such as building a docker image, much less uploading it. I am going to assume that for this step, before executing the Terraform I already have an image built with the name lambda-cycles, the only thing that would be missing then is uploading it to the ECR repository.

We can use a little hack to accomplish this with Terraform by using a null resource (null_resource) and a called provider local-exec that allows you to specify commands to be executed on the local computer:

resource "null_resource" "ecr_image" {
  triggers = {
    python_file_1 = filemd5("../app.py")
    python_file_2 = filemd5("../plot.py")
    python_file_3 = filemd5("../tweeter.py")
    python_file_4 = filemd5("../download.py")
    requirements  = filemd5("../requirements.txt")
    docker_file   = filemd5("../Dockerfile")
  }
  provisioner "local-exec" {
    command = <<EOF
           aws ecr get-login-password --region ${local.region} | docker login --username AWS --password-stdin ${local.account_id}.dkr.ecr.${local.region}.amazonaws.com
           docker tag lambda-cycles ${aws_ecr_repository.lambda_image.repository_url}:${local.ecr_image_tag}
           docker push ${aws_ecr_repository.lambda_image.repository_url}:${local.ecr_image_tag}
       EOF
  }
}

Did you notice the triggers block? this block will help us track changes to files that will determine if the lambda container has changed; with filemd5 we get a hash of the specified files. This would mean that if we make any changes to the .py files the Docker image to be rebuilt and uploaded to the ECR repository.

Image information

It is necessary to generate a data source (in the form of a aws_ecr_image) that specifies a dependency on the creation and publication of the image, we can do this thanks to depends_on:

data "aws_ecr_image" "lambda_image" {
  depends_on = [
    null_resource.ecr_image
  ]
  repository_name = local.ecr_repository_name
  image_tag       = local.ecr_image_tag
}

Policies and permissions

Before creating the lambda, I have to take care of other administrative tasks, the first is to create a role the lambda can assume to be executed:

resource "aws_iam_role" "lambda" {
  name               = "${local.prefix}-lambda-role"
  assume_role_policy = <<EOF
{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Action": "sts:AssumeRole",
           "Principal": {
               "Service": "lambda.amazonaws.com"
           },
           "Effect": "Allow"
       }
   ]
}
 EOF
}

Now, since I want to monitor my lambda, and to know if any errors occurred during its execution, it is necessary to grant it permissions so that it can create logs in CloudWatch:

data "aws_iam_policy_document" "lambda" {
  statement {
    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]
    effect    = "Allow"
    resources = ["*"]
    sid       = "CreateCloudWatchLogs"
  }
}

resource "aws_iam_policy" "lambda" {
  name   = "${local.prefix}-lambda-policy"
  path   = "/"
  policy = data.aws_iam_policy_document.lambda.json
}

Lambda – at least

Now that I have almost everything in place, I can create the lambda via the aws_lambda_function resource, this is one of the more convoluted definitions in this tutorial, so I'll try to explain it a bit more in detail:

The first thing I do is add a dependency to my docker image build with depends_on, then I specify the name of the lambda and the role it should assume with function_name and role. I know in advance that this lambda can take a bit of time so I'll leave it timeout a bit high.

Once we create our image in ECR we must tell the lambda that the package_typeis an image, followed by the image_uriso that it knows where to find it.

Once we create our image in ECR we must tell the lambda that the package_type is an image, followed by the image_uri so that it knows where to find it.

Finally, since my lambda is going to send a Tweet, it is necessary to pass the necessary secrets to it. Again, in the interest of keeping everything as private as possible, we will have to define them as environment variables (instead of hardcoding them); I achieve this from the block environment and extracting the secrets from –yeah, it is repetitivw– the secrets previously stored in AWS:

resource "aws_lambda_function" "lambda" {
  depends_on = [
    null_resource.ecr_image
  ]
  function_name = "${local.prefix}-lambda"
  role          = aws_iam_role.lambda.arn
  timeout       = 300
  image_uri     = "${aws_ecr_repository.lambda_image.repository_url}@${data.aws_ecr_image.lambda_image.id}"
  package_type  = "Image"
  environment {
    variables = {
      API_KEY             = jsondecode(data.aws_secretsmanager_secret_version.current_twitter_secrets.secret_string)["API_KEY"]
      API_SECRET          = jsondecode(data.aws_secretsmanager_secret_version.current_twitter_secrets.secret_string)["API_SECRET"]
      ACCESS_TOKEN        = jsondecode(data.aws_secretsmanager_secret_version.current_twitter_secrets.secret_string)["ACCESS_TOKEN"]
      ACCESS_TOKEN_SECRET = jsondecode(data.aws_secretsmanager_secret_version.current_twitter_secrets.secret_string)["ACCESS_TOKEN_SECRET"]
    }
  }
}

Running every X minutes

So far so good, if you run terraform up to this point we woill have created several resources: an ECR repository, a docker image, and a lambda. But the icing on the cake is missing, and that is that the point of turning the code into a lambda; I want to run it multiple times throughout the day, every so often.

To achieve this task, I can use a trigger with the AWS CloudWatch service, something that executes my lambda at time intervals defined by me, this is possible with Terraform as well.

The first thing is to define an event rule in CloudWatch:

resource "aws_cloudwatch_event_rule" "every_x_minutes" {
  name                = "${local.prefix}-event-rule-lambda"
  description         = "Fires every 20 minutes"
  schedule_expression = "cron(0/20 * * * ? *)"
}

This event needs a target, in this case it's my lambda:

resource "aws_cloudwatch_event_target" "trigger_every_x_minutes" {
  rule      = aws_cloudwatch_event_rule.every_x_minutes.name
  target_id = "lambda"
  arn       = aws_lambda_function.lambda.arn
}

And of course, like almost everything in AWS, we also need to grant it permissions so that the event can invoke the lambda:

resource "aws_lambda_permission" "allow_cloudwatch_to_call_lambda" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.lambda.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.every_x_minutes.arn
}

et voilà ! – we already have all the necessary ingredients to run and create our lambda using Terraform.

Remember, all of this content exists in the terraform/main.tf file within the repository we've been working on.

This is how the repository looks like at this point.

DEV Community: Antonio Feregrino

Understanding (a bit of) the Gradle Kotlin DSL

What is plugins?

What is kotlin and id?

So, what is version?

Conclusion

On Lambdas with receivers

On fluent interfaces and infix methods

As for me…

Creando un notebook con Jupyter y Kotlin

Introducción

Creando un contenedor con Kotlin

Dockerfile

El siguiente paso en interactividad

Actualización del Dockerfile

Personalización extra

Errores y cómo esconderlos

Conclusión

Recursos adicionales

Más allá del pickle: el verdadero resultado de un equipo de aprendizaje automático

La dura realidad

De las fábricas de modelos a las fábricas de fábricas

Ventajas del enfoque de fábrica de fábricas

Hacerlo realidad: La ruta hacia el éxito

Herramientas esenciales

Cambios culturales necesarios

Nota sobre AutoML

La verdadera medida del éxito

Face detection in movie trailers

Overall architecture

Everything starts with a video

Scene detection

First frame extraction

Face detection

Face embedding

Putting it all together

Clustering

Assembling clips

Documenting my pin collection with Segment Anything: Part 4

Improving the quality of the cutout

Parameters

Function Body

Convert Mask to Polygons

Merge Polygons

Select Largest Polygon (if necessary)

Simplify Polygon

Buffering

Extract Coordinates

Convert Polygon to Mask

Returns

Some results

Front-end

A new view endpoint

The view.html.jinja template:

The app’s logic

Conclusion

Documenting my pin collection with Segment Anything: Part 3

Requirements

Solution

Implementation

Project Structure Overview

web/static/app.js

1. Initialization on Window Load:

2. Canvas and Context Setup:

3. Drawing Interactions:

4. Server Interaction:

5. Dynamic Form Generation:

6. Utility Functions:

web/labeller.py

1. Environment Setup and Initialisations:

2. Image Loading and Preprocessing:

3. Model Loading for Image Segmentation:

4. Web Routing and Request Handling:

5. Image Annotation and Segmentation:

6. Handling user selection of cutouts

7. Utility Functions for Image Manipulation:

Libraries I used

Closing thoughts

Documenting my pin collection with Segment Anything: Part 2

Load the model and image

What is `plugins`?

What is `kotlin` and `id`?

So, what is `version`?

A new `view` endpoint

The `view.html.jinja` template:

`web/static/app.js`

`web/labeller.py`

2. Install `torch`