How I use "AI" to entertain my cat

#opensource #showdev #ai

Background

It's been a while since I've made a dev.to post and I wanted to share my latest "dumb" project. My cat, Max, really enjoys watching wildlife in my backyard through the back door. There's birds, squirrels, opossums, and even a family of tree climbing rats. It's his favorite thing to do, but there's one "problem". The wildlife isn't always present and he gets pretty pissed about it. He often comes into my home office yelling at me to summon birds, but little does he know, I'm no Disney princess. What's frustrating for me is that even when wildlife is present, he'll often miss them because he's off doing other cat things. I thought, if only I could automate letting him know that he's missing some good bird watchin'!

Solution

I had the idea of pointing an IP camera out the back door and figure out how to detect animals then train him to go peek when a familiar sound is played. This was more difficult than I thought. After hours of reading, googling, browsing GitHub, there was no open source solution available to perform this simple idea out of the box. Welp, guess I'll build it.

Hardware

After doing some research on IP cameras that support an easy access video stream via RTSP, I got a cheap TP Link Tapo C100 camera for $20. I set it up on my home network using the app and placed it at my back door for a similar viewing angle of Max's.

Here's the setup, not super elegant, but it'll do.

And then here's the view from the camera itself.

Perfect view of the back patio to see all the animals.

Software

Next, I needed to figure out, how can I access the stream, recognize an animal, then let Max know? There are tons of examples of recognizing an object via camera frames, but I ultimately found this python library called ultralytics that supports RTSP streams and classifying objects in the video frames using pre-built models. The docs looked like it would be pretty low effort, so after some experimentation, I was successful in having the ultralytics library recognize objects from my cheap camera!

The gist that was needed to get it working:

from ultralytics import YOLO

model = YOLO('yolov8s.pt')
results = model.predict('rtsp://<username>:<password>@192.168.1.101/stream1', conf=0.5, stream=True, verbose=False)

for result in results:
    if result.boxes:
        box = result.boxes[-1]
        class_id = int(box.cls)
        detected_object_name = model.names[class_id]
        detected_object_confidence = round(float(box.conf), 2)
        print(f"Object detected: {detected_object_name}")
        print(f"confidence: {detected_object_confidence}")

OK, now I needed to recognize animals. Since the underlying model used by ultralytics is based on the widely popular COCO model, (Common Objects in Context), I found a list of the objects the model could detect. In that list, there were a few things that stuck out to me:

bird (of course)
cat
dog
mouse

After some testing of the previously mentioned code, I noticed that squirrels were often detected as dogs or cats based on the size of them. So I modified the code a bit to filter based on those objects supported by the model like so:

from ultralytics import YOLO

model = YOLO('yolov8s.pt')
results = model.predict('rtsp://<username>:<password>@192.168.1.101/stream1', conf=0.5, stream=True, verbose=False)

filters = ['bird', 'dog', 'cat', 'mouse']

for result in results:
    if result.boxes:
        box = result.boxes[-1]
        class_id = int(box.cls)
        detected_object_name = model.names[class_id]
        detected_object_confidence = round(float(box.conf), 2)
        if detected_object_name in filters:
            print(f"Object detected: {detected_object_name}")
            print(f"confidence: {detected_object_confidence}")

And boom! We recognize animals!

Next, I needed to wire this up to my home speakers and play a sound familiar to Max. In the before times of not watching live animals outside, Max liked it when I'd play some bird videos on YouTube for him and they would all start with the same "chirp" sound. He knew this sound meant bird watching time. So I downloaded the video, extracted the audio, then split the chirp out into a custom 4 second .mp3 and stored it on my local Home Assistant instance which was already integrated with my Google Nest speakers. Luckily, Home Assistant's API is pretty friendly, but the docs definitely suck. Once I added the .mp3 file onto my Raspberry Pi where Home Assistant is hosted, I was able to trigger the sound to play on my speakers with this simple request to its REST API:

import requests

request_opts = {
    'url': 'http://192.168.1.100:8123/api/services/media_player/play_media',
    'method': 'POST',
    'headers': {
     'Content-Type': 'application/json',
     'Authorization': 'Bearer <Home Assistant API token>'
    },
    'data': {
        "entity_id": "<Nest Speaker ID>",
        "media_content_id": "/local/chirp.mp3",
        "media_content_type": "audio/mp3"
    },
    'timeout': 10
}
response = requests.request(**request_opts)
# Raise an error in the event of a non 2XX response code
response.raise_for_status()

Open source all the things

Then it dawned on me. This should be a configurable open source program that can be configured to do a thing, when a camera sees a thing, but also let a user pick how frequently they want to be annoyed with bird chirp sounds in their house. I wanted to make the program very configurable because there's no way I was going to write all of this again for my next computer vision related project. So I created cv-notifier. A program that's super configurable and allows anyone to replicate what I've done so far.

TLDR, input a config to cv-notifier and it'll handle the rest:

config:
  source: 'rtsp://$STREAM_USER:$STREAM_PASSWORD@192.168.1.101/stream1'
  schedule:
    startTime: '07:00'
    endTime: '18:00'
  webhooks:
    - url: http://localhost:8080/someDumbFutureAPI
      notifyInterval: 900
      objects:
        - bird
        - cat
        - dog
        - mouse
      method: 'POST'
      headers:
        Content-Type: application/json
        Authorization: Bearer $API_TOKEN
      body: >
        {
          "someKey": "$object_name detected with confidence score of $object_confidence"
        }

Results

Lots of techy jargon up until this point, but the question remains... what did Max think? Well at first it confused him. The sound that he previously knew which only came from the TV was now playing in the kitchen and he didn't know how to respond. So when the chirp would play, I'd pick him up, take him to the back door, and show him that it meant animals. After doing that 4-5 times, he started to get it. Then when the chirp sound would play, he'd scurry across the house to go see what animals were out there!

The app has been running every day for a few months now. There lies another question, does it still work for Max? Eh, not really. He's gotten a bit numb to it since he's learned it goes off frequently enough that he can go look at birds pretty much whenever he pleases. He'll still go look occasionally, but he's not excited about it as he once was. Nonetheless, it was a fun project, I learned some cool things about object detection, and now I have a product I can use when I want to detect more objects in the future and call some random API.

Fin

By this point, you may be thinking to yourself, "What does Max look like?". Well here's a picture of the little guy along with a video of Max bird watching!