Voice Journal with iPhone Action Button, GPT API, and Google Docs

#notetaking #gpt #voice #programming

tl;dr

The main goal of this article is to show how to set up a personal voice note service that works with a single tap on the iPhone’s Action Button. If you don’t have an action button, no problem—this solution leverages Apple Shortcuts, ensuring it works on any iPhone.
GitHub Repo

Why I built this

I believe people avoid taking notes because the process requires too many steps from thought to text. I created a tool that lets you start recording a voice note with one tap, which is then automatically saved to your chosen destination—Google Docs in my case, but it can also be Notion or any other platform.

Overview

Part 1 will cover deploying a backend service on railway.com (I consider Railway the easiest way to deploy such services at the moment). The main challenge in this section is correctly setting up environment variables for the Google Docs API and OpenAI API.

Part 2 will explain how to integrate the service with iPhone Shortcuts and the Action Button for a seamless user experience.

Part 1: Deploying backend service

First, you need to fork the repository: s4ff0x/speech-to-docs.

Then: Go to Railway and authenticate with your GitHub account.
Click Deploy and select the forked repository from your GitHub account.

It’s okay if the deployment process fails at first—that’s expected. We haven’t set up the environment variables or configured the necessary settings yet

After the deployment completes (even if it fails), click on the created project in Railway.
Go to Settings → Networking and press Generate Domain.
Enter port 3000 and press Generate Domain.

In the Build section, the Dockerfile should be automatically selected.

If it’s not, it probably means Railway has excluded this feature from the free plan. In that case, you can either subscribe to a paid plan, deploy the service manually, or use another platform. However, the next steps in this guide will continue using Railway for deployment.

Setup Env vars

OpenAI API Key

Create your OpenAI API key by visiting OpenAI API Keys.

Google Docs API Credentials

Follow Google’s guide to create API credentials for the Google Docs API.
Download the credentials as a JSON file.

Set Environment Variables on Railway:

Go to your project in Railway and open the Settings → Variables section.
Add the required environment variables:

OPENAI_SPEECH_API_KEY=
DOC_ID=The ID of the Google document where the text will be saved
TIMEZONE=your_timezone (example: Asia/Jerusalem)
PERSONAL_AUTH_TOKEN=Create your personal key to use when calling the API from an iPhone.

# Google Service Account Credentials
TYPE=
PROJECT_ID=
PRIVATE_KEY_ID=
PRIVATE_KEY=
CLIENT_EMAIL=
CLIENT_ID=
AUTH_URI=
TOKEN_URI=
AUTH_PROVIDER_X509_CERT_URL=
CLIENT_X509_CERT_URL=

You may experience problems setting the PRIVATE_KEY environment variable. Convert it to a single string using \n for new lines. For example:
PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nJNB3fAD..."

Please don’t forget to enable the Google Docs API in your google console.

Also, you need to share your Google document with the email of your service account and grant it edit permissions; otherwise, nothing will work.

Once all environment variables are set, click Deploy to restart the service with the correct configuration.

Let's test the api

Download a file in m4a format containing speech to your PC (m4a is the default format for Apple audio recordings).

Run the following curl command in your terminal.

curl -F "audio=@[path to file].m4a;type=audio/m4a" \
-H "Authorization:[your personal auth token]" \
-X POST https://[your api url]/transcribe

Now check your Google document. If everything was successful, you should see your text here.

Part 2: Using API with Apple Shortcut + Action Button

Apple Shortcuts allow us to call external APIs. This is exactly what we need.

Create a new Apple Shortcut.
Add the "Record Audio" block to capture audio.
Insert the "Get Contents of URL" block and specify your API URL.
In the "Get Contents of URL" block, configure the following:
Method: POST
Headers: Add an authorization header and set its value to your personal API token from your environment variables.
Request Body Type: Choose Form.
Add Form Field.
Field name: audio
Field value: Recorded Audio (the result of the "Record Audio" block).

Then set this shortcut to the action button.