In my first post as an AWS Community Builder, I laid out my goal for year 1 in the program. It is to define and develop new, open source applications for the BearID Project while touching on the Internet of Things (IoT), machine learning, applications and my passion for the environment. After much consideration, I have settled on the first application I will work on, a BearCam Companion.
For those that don't know the origin story of the BearID Project, it all started with the Brooks Falls Brown Bear cam at Explore. As I explained in one of my first blog posts, Guilty Pleasures, my partner and I are avid wildlife webcam fans. Some of our favorite webcams are the Explore cams at Katmai National Park in Alaska. The bears at Brooks Falls and a desire to start a machine learning project led us to attempt to develop an application which could identify the individual bears we see on the Brooks Falls live cam.
We initially focused on building a photo dataset of the bears at Brooks Falls. We collected photos from individuals and a large set from the National Parks Service's bear monitoring program at Katmai. From there we got involved with Dr. Melanie Clapham, a conservation scientist in British Columbia, studying the bears of Glendale Cove. We built a computer to run our ML training and developed an open source application, bearid, which we provided to Melanie as a Docker container she could run in the field on a laptop.
Now I would like to see if we can bring something from BearID to the bear cams. One of the top questions on the bear cam Disqus comments is "who is that bear?" The idea of the BearCam Companion is to help users identify which bears are which. Initially, this would use machine learning to locate bears and rely on users to help identify them. Later, we can try to train ML models to identify the bears as well (when a face is detected, we can also try the existing BearID model). Here's how I will define the Minimum Lovable Product:
- Grab a frame from the webcam
- Detect the bears (and possibly other animals) in the frame
- Display the frame and bounding boxes on a webpage
- Allow users (and eventually an ML model) to identify the bears (and edit detection errors)
- Provide the top identifications for each bear in the frame on the webpage
This project will tick the boxes for machine learning and applications. I intend to utilize AWS services for the machine learning and for as much of the application as makes sense. I can also evaluate environmental impact by considering the Sustainability Pillar of the AWS Well-Architected Framework. I would like to make use of Graviton-based services where possible, both for the environmental impact as well as my connection with Arm. For now I will leave IoT as a separate topic.
The next step is to decide on the architecture. There are so many AWS services and open source tools to consider. I am thinking about the following:
- Use a Lambda function for grabbing frames
- Use Rekognition or a standard object detector trained and deployed in SageMaker for the bear detector
- Use Amplify for the web application (maybe Vue or React frontend) with 2 main modes:
- Display the latest frame with boxes and labels
- Allow user to edit boxes and labels
- Use S3 for images and a database for metadata (DynamoDB?)
This will be my first time building a full stack web application from scratch. I could use some help. If you have any thoughts or suggestions, please let me know in the comments. Thanks!