DEV Community

Anton Mendelson
Anton Mendelson

Posted on

Voice Journal with iPhone Action Button, GPT API, and Google Docs

tl;dr

The main goal of this article is to show how to set up a personal voice note service that works with a single tap on the iPhone’s Action Button. If you don’t have an action button, no problem—this solution leverages Apple Shortcuts, ensuring it works on any iPhone.
GitHub Repo


Why I built this

I believe people avoid taking notes because the process requires too many steps from thought to text. I created a tool that lets you start recording a voice note with one tap, which is then automatically saved to your chosen destination—Google Docs in my case, but it can also be Notion or any other platform.


Overview

Part 1 will cover deploying a backend service on railway.com (I consider Railway the easiest way to deploy such services at the moment). The main challenge in this section is correctly setting up environment variables for the Google Docs API and OpenAI API.

Part 2 will explain how to integrate the service with iPhone Shortcuts and the Action Button for a seamless user experience.


Part 1: Deploying backend service

First, you need to fork the repository: s4ff0x/speech-to-docs.

Then: Go to Railway and authenticate with your GitHub account.
Click Deploy and select the forked repository from your GitHub account.

Setup project from github

It’s okay if the deployment process fails at first—that’s expected. We haven’t set up the environment variables or configured the necessary settings yet

After the deployment completes (even if it fails), click on the created project in Railway.
Go to Settings → Networking and press Generate Domain.
Enter port 3000 and press Generate Domain.

Generate domain


In the Build section, the Dockerfile should be automatically selected.

If it’s not, it probably means Railway has excluded this feature from the free plan. In that case, you can either subscribe to a paid plan, deploy the service manually, or use another platform. However, the next steps in this guide will continue using Railway for deployment.

Automatic Dockerfile detection


Setup Env vars

OpenAI API Key

Create your OpenAI API key by visiting OpenAI API Keys.

Google Docs API Credentials

Follow Google’s guide to create API credentials for the Google Docs API.
Download the credentials as a JSON file.

Set Environment Variables on Railway:

Go to your project in Railway and open the Settings → Variables section.
Add the required environment variables:

OPENAI_SPEECH_API_KEY=
DOC_ID=The ID of the Google document where the text will be saved
TIMEZONE=your_timezone (example: Asia/Jerusalem)
PERSONAL_AUTH_TOKEN=Create your personal key to use when calling the API from an iPhone.

# Google Service Account Credentials
TYPE=
PROJECT_ID=
PRIVATE_KEY_ID=
PRIVATE_KEY=
CLIENT_EMAIL=
CLIENT_ID=
AUTH_URI=
TOKEN_URI=
AUTH_PROVIDER_X509_CERT_URL=
CLIENT_X509_CERT_URL=
Enter fullscreen mode Exit fullscreen mode

You may experience problems setting the PRIVATE_KEY environment variable. Convert it to a single string using \n for new lines. For example:
PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nJNB3fAD..."

Please don’t forget to enable the Google Docs API in your google console.

Also, you need to share your Google document with the email of your service account and grant it edit permissions; otherwise, nothing will work.

Once all environment variables are set, click Deploy to restart the service with the correct configuration.


Let's test the api

Download a file in m4a format containing speech to your PC (m4a is the default format for Apple audio recordings).

Run the following curl command in your terminal.

curl -F "audio=@[path to file].m4a;type=audio/m4a" \
-H "Authorization:[your personal auth token]" \
-X POST https://[your api url]/transcribe
Enter fullscreen mode Exit fullscreen mode

Now check your Google document. If everything was successful, you should see your text here.

Google document with notes


Part 2: Using API with Apple Shortcut + Action Button

Apple Shortcuts allow us to call external APIs. This is exactly what we need.

Create a new Apple Shortcut.
Add the "Record Audio" block to capture audio.
Insert the "Get Contents of URL" block and specify your API URL.
In the "Get Contents of URL" block, configure the following:
Method: POST
Headers: Add an authorization header and set its value to your personal API token from your environment variables.
Request Body Type: Choose Form.
Add Form Field.
Field name: audio
Field value: Recorded Audio (the result of the "Record Audio" block).

Then set this shortcut to the action button.

Apple Shortcut Example


That’s it! Test your setup—it should work just like the curl command but with your voice note.

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

Retry later