FitVision - devlog #1

#ai #machinelearning #development #python

Hi there! This will be the first devlog that I make public (And hope is the first one I finish ;)
Also, this post is not a full explanation of the current version because I will divide the explanation in two parts, one for each file.
In this series of blogs I'm going to share my journey through the development of an application that I wanted to build since when I started to workout at home.

What is this?

-> The mission: building a desktop app that helps the user to track his/her training evolution in a automated way.
-> The big brain idea: implementing a computer-vision system that detects user's repetitions, resting time, set time and total workout time (along with other possible ideas that may come to me while working on it).
-> The tech stack: Initially only Python.

Now that I have explained a bit the idea, it's time to start the journey.

First steps: vision model

The libraries that the vision code uses currently are the following:

import cv2 
import mediapipe as mp
import numpy as np
import time
from PyQt5.QtCore import QThread, QTimer

mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

mp_drawing and mp_pose are utilities for drawing the detected user's pose and detecting it, respectively (not libraries but I wanted to include them in this section either way).
The imports are dedicated to the computer vision models, arithmetic operations and time calculations.

The full implementation of the vision model is wrapped around a class called DetectionWorker which is imported in the main file, which will be explained in the next blog.
Originally, I designed the model to be called by a simple function called capture_and_detect_video() which will be reviewed later, but for the detection to be started and stopped correctly using the interface buttons, I had to add the wrapper class.

The next is the initializer method of the DetectionWorker class:

    def __init__(self, app_signals):
        super().__init__()
        self.app_signals = app_signals
        self.running = False
        self.last_rep_time = 0 # keeps track of the last rep time for detecting rest or mid-set time
        self.status = "resting" # resting by default until the first rep

        # Total training session timer
        self.total_training_time = 0
        self.training_timer = QTimer()
        self.training_timer.timeout.connect(self.update_total_training_time)

        # Set and rest times
        self.set_training_time = 0
        self.rest_training_time = 0

        self.app_signals.stop_detection.connect(self.stop_detection)
        self.app_signals.start_detection.connect(self.start_detection)

The initializer method is pretty self-explanatory, the object is initialized with a app_signals object which communicate to the app the state of some variables, and the timer is initialized with a QTime() object.

Now I will showcase the start and stop detection methods and other utilities of the class.

    def start_detection(self):
        self.total_training_time = 0
        self.set_training_time = 0
        self.rest_training_time = 0
        self.training_timer.start(1000)
        self.running = True
        self.capture_and_detect_video()

    def stop_detection(self):
        self.training_timer.stop()
        self.running = False

The start and the stop of the video detection actualize the value of the time counts, starts/stops the timers and updates the self.running property, which is the responsible for maintaining the flow of the detections.

    def update_total_training_time(self):
        self.total_training_time += 1
        self.app_signals.update_training_time.emit(self.total_training_time)

        # Detect if user is resting or mid-set
        if np.abs(self.last_rep_time - time.time()) > 5:
            self.status = "resting"
            self.set_training_time = 0
            self.rest_training_time += 1
            self.app_signals.update_rest_time.emit(self.rest_training_time)
        else:
            self.status = "mid-set"
            self.rest_training_time = 0
            self.set_training_time += 1
            self.app_signals.update_set_time.emit(self.set_training_time)

The function dedicated to update the timers first adds 1 to the count of total seconds of the workout and detects wether the user is resting or in the middle of a set. Depending on the result, the respective timers are updated.

Note that the time period used for determining the status of the workout is not definitive, it is currently low for debug purposes.

    def calculate_angle(self, a: list, b: list, c: list) -> float:
        a = np.array(a)
        b = np.array(b)
        c = np.array(c)
        radians = np.arctan2(c[1] - b[1], c[0] - b[0]) - np.arctan2(a[1] - b[1], a[0] - b[0])
        angle = np.abs(radians * 180 / np.pi)

        if angle > 180.0:
            angle = 360 - angle

        return angle

The calculate_angle function simply takes three lists of coordinates (x,y) as arguments for determining the angle formed by such points and returns it as a float.

From now on, all of the next pieces of code will be the ones that form the _capture_and_detect_video() method, which is the method in charge of the pose detection in real time.
cap = cv2.VideoCapture(0) # Camera code may be changed depending on the device
This line initializes the video capturing element.

The remainding code of the method is inside a with statement:

        with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
            counter = 0
            stage = "down"
            times_hist = []
            rep_start_time = None

Using the pose detection utility of mediapipe, the default initial values of counter and stage are set.
A list named times_hist is created with the purpose of storing the time that the user took for completing each rep (for calculating the average time per rep metric)
rep_start_time is also set to its default value of None.

            while cap.isOpened() and self.running:
                ret, frame = cap.read()

                # Recolor image to RGB
                image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                image.flags.writeable = False

                # Make detection
                results = pose.process(image)

                # Recolor back
                image.flags.writeable = True
                image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

Inside the loop of the video detection, the frame is first processed for detecting the user's pose and reprocessed back.
This loop starts and stop running when the self.running property is changed as was shown earlier.

                # Extract landmarks
                try:
                    landmarks = results.pose_landmarks.landmark

                    # Get limbs
                    l_shoulder = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x, landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
                    l_elbow = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x, landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]
                    l_wrist = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x, landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]

                    # Calculate angles
                    angle = self.calculate_angle(l_shoulder, l_elbow, l_wrist)

                    # Visualize angles
                    cv2.putText(image, str(angle), tuple(np.multiply(l_elbow, [640, 480]).astype(int)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2, cv2.LINE_AA)

                    # Curl count
                    if angle > 160:
                        stage = "down"
                        rep_start_time = time.time() if rep_start_time is None else rep_start_time
                    elif angle < 30 and stage == "down":
                        stage = "up"
                        counter += 1
                        end_time = time.time()
                        self.last_rep_time = end_time
                        rep_time = end_time - rep_start_time
                        rep_start_time = None
                        times_hist.append(rep_time)



                    average_time = sum(times_hist) / len(times_hist)
                    self.app_signals.update_reps.emit(counter)
                    self.app_signals.update_avg_time.emit(average_time)


                except:
                    pass

In this fragment of code, the system tries to extract information about the position of each key node of the user's body (currently just the ones needed for counting curl reps) and calculate the angle formed by them.
Then the angle is visualized for debugging purposes and the curl count is executed.

If the formed angle is more than 160º, it means the user has the arm extended (resting position), and the system updates the stage and the start time of the repetition, which is only set one time immediately when the arm reaches the resting position.
If the formed angle is less than 30º and the last tracked position of the curl was "down", the system updates the stage counts one rep, calculates the time took to execute the rep and it is stored to the times_hist list.

Then the average time per repetition is calculated and the information is sent through the app_signals object that was shown previously.

If any errors occur while executing the above instructions (which may be caused by an erroneous detection of the user's pose), the system ignores that frame for the sake of simplicity.

Now, the next fragment is the last one to review!

                #Render detection
                mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS,
                                        mp_drawing.DrawingSpec(color=(255,255,255), thickness=2, circle_radius=2),
                                        mp_drawing.DrawingSpec(color=(255,255,255), thickness=2, circle_radius=2))
                cv2.imshow('frame', image)


                if cv2.waitKey(10) & 0xFF == ord('q') or not self.running:
                    self.stop_detection()

            cap.release()
            cv2.destroyAllWindows()

The last lines of code of the capture_and_detect_video() method render the detected nodes and the connections between them in the returned frames.
They also handle the stop of the detection by the user and, ultimately, releases the video capture element and destroys all instances of the video detection when the loop is stopped.

In a few lines

Summarizing all of this, the algorithm made is able to determine the current state of the user's workout, detecting if he/she is in a rest or mid-set time interval and if he/she is doing a rep or just ended one.
This functionalities are connected with a UI that will be explained in the next devlog, and implemented in a way that will help the user evaluating his/her workouts automatically.

If you have followed through until here, thank you very much for taking the time to read it! Any question, kind suggestion or idea is extremely appreciated. 😉

(and yes, the cover image is ai generated)