DEV Community

Cover image for Expenses made easier with Cloudflare AI
Estee Tey
Estee Tey

Posted on

Expenses made easier with Cloudflare AI

This is a submission for the Cloudflare AI Challenge.

What I Built

This project is a Cloudflare serverless worker with an AI service binding deployed for processing either audio, image or text submitted by users into semantic useful information, especially in the context for submitting expenses.

For users that prefer having a UI to interact with, a simple NextJS web app is created to demonstrate how to upload files and utilizes API routes to make a POST request to the Cloudflare worker endpoint.

How it works

The diagram below looks a bit blurry on this platform, please refer to the raw image here

Illustrates the user journey below

User can pass any of the following types of content - audio, image or text. Based on the input type given, we will process them eventually to form JSON like this that can be used by any app to display useful expense information for them.

const SAMPLE_RECORD: Expense = {
    date: "2024-04-11",
    time: "14:00",
    type: "product",
    item: "Rubber Ducky",
    expenditure: 100.0,
};
Enter fullscreen mode Exit fullscreen mode

Models used in this project:

Demo

Try out the worker!

The worker is deployed at https://cf-journal.senchatea.workers.dev.

Text input

curl --location 'https://cf-journal.senchatea.workers.dev?type=text' \
--header 'Content-Type: text/plain' \
--data '2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, it'\''s really nice but it costs like $36 per person after GST, and there were 2 of us.'
Enter fullscreen mode Exit fullscreen mode

Audio input

curl --location 'https://cf-journal.senchatea.workers.dev?type=audio' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt' \
--data '@postman-cloud:///1eef4b5b-a64c-4c10-ad4b-5f22c528880b'
Enter fullscreen mode Exit fullscreen mode

Image input

curl --location 'https://cf-journal.senchatea.workers.dev?type=image' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt' \
--data '@postman-cloud:///1eef4b68-e5d7-49e0-a28d-f60b3bd7d70e'
Enter fullscreen mode Exit fullscreen mode

Try it in the web app!

https://cf-journal.vercel.app/

On load of the app, user sees a sample record of a Rubber Ducky costing $100, but not for sale. User then tries to upload a .mp3 voice recording file and sees "Uploading" text. After upload is completed, they can see a new expense being added on top of the sample record, with the expenditure of $100, date of "2024-04-11", time of "14:00" and type of expense "Activity".

My Code

The project has 2 folders:

  • serverless: Code that is deployed to a Cloudflare worker. You can find the text generation prompt & code to use the AI models at utils.ts. The Cloudflare worker request handler is found at main.ts
  • app: Nextjs app code that is deployed to Vercel

Introduction

This project is a Cloudflare serverless worker with an AI service binding deployed for processing either audio, image or text submitted by users into semantic useful information, especially in the context for submitting expenses.

For users that prefer a UI, a simple NextJS web app is created to demonstrate how to upload files and utilizes API routes to make a POST request to the Cloudflare worker endpoint.

How it works

An image of a green tea bottle is inputted into an "image-to-text" system which describes the bottle and its features. The description is then converted into a structured JSON format with fields like "date," "type," "brand," and "item," most of which are left null except for "type," which is "product," and "item," which is "Green Tea." An audio clip is transcribed by an "audio-to-text" system, with the spoken words "I went to the gym for an hour today. The session cost around 20 dollars." This transcription is then converted into JSON format with fields such as "date," "time," "type," "item," "expenditure," and others, some filled with specific data from the audio input like "type" as "activity," "item" as "gym session," and "expenditure" as 20. A text input reading "2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, it's really nice but it costs like $36 per person after GST, and there were 2 of us." is converted directly into a JSON format with details of the dining experience including "date," "time," "type," "item," and "expenditure," among others. The "item" is listed as "Swensens Unlimited," and "expenditure" is doubled to 72, considering two people

Models used

From Cloudflare AI Models:

Try out the worker!

The worker is deployed at https://cf-journal.senchatea.workers.dev.

Text input

curl --location 'https://cf-journal.senchatea.workers.dev?type=text' \
--header 'Content-Type: text/plain' \
--data '2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, it'\''s really nice but it costs like $36 per person after GST, and there were
Enter fullscreen mode Exit fullscreen mode

Journey

How the idea came about

Before OpenAI introduced ChatGPT and revitalized the AI scene, I once created Billy, a cute a little expense tracker app to help my mom to track her expenses easier. It actually received good reception with my friends both online & offline because of the clean and straight forward UI.

However, the user retention wasn't very good because the act of logging expenses is very tedious because of the multitude of fields to fill. It was very difficult to encourage the habit of logging expenses for the users.

For many users, it is much direct and easier to talk or take a picture with a phone as compared to typing. But it was rather difficult to interpret information from the data of these input types back then, so I dropped that project eventually.

Now, by utilizing different AI models, we can interpret and organize important information for them more easily, simplifying the expense submission process.

This project is meant to be a MVP to demonstrate that.

Room for improvements

I'm very new to AI so there will definitely a lot of things to improve on!

The webapp in this submission is a very early MVP of what I envision the actual expense tracker app to be.

  • For now, it only create expenses based on the information in the file input (audio/mp3/text).
  • Ideally after submission, you can still modify the individual fields of the expense form. The AI portion is just there to assist the auto-filling process.

I'm also not very familiar with prompt engineering, but I would also like to improve the text generation prompt in the future.

Resources

These are some helpful resources from Cloudflare that I have used to work on this project since I wasn't familiar with the AI models available for text generation/inference.


My project is eligible for the Triple Task type category.

Top comments (4)

Collapse
 
rishavanand profile image
Rishav Anand

Great post. I've also been thinking about a full fledged expense tracker but I want it to be able to read and understand bank statements. The problem with bank statement is that it is not very descriptive hence very difficult to categorise the expenses :( Any tips?

Collapse
 
lyqht profile image
Estee Tey • Edited

Thanks for the qn! Below is my take on it.

For bank statements, usually they have acronyms or codes for specific transactions - (example), so that might narrow the category down a little. Next, they should at least include the name or a short form of the name of the company responsible for the product/activity u bought so that's also good for categorizing. If for some reason, it doesn't have either, then it may be better to corroborate the bank statements with receipts that are clearer in what that thing was for and get users input directly what those refer to. Then train a model for such outlier items.

Collapse
 
sameera_hewage_f9a86d8059 profile image
Sameera Hewage

In my country I get a sms to all the transactions. I’m actually writing one to phrase all those data to a json so I can process it. iPhone shortcut will send sms to a web hook when it receive a sms.

Collapse
 
watson007 profile image
watsonjohns

An expense journal integrated with Cloudflare AI offers advanced expense management capabilities. internet for seniors Leveraging Cloudflare's AI-powered analytics, the system can automatically categorize expenses, detect anomalies, and generate insightful reports in real-time. By harnessing the scalability and security of Cloudflare's infrastructure, businesses can streamline expense tracking processes, enhance financial visibility, and make data-driven decisions to optimize budget allocation and control costs effectively.