Last week we talked about the text note input tab. If you missed it, you can check that out here!
Coding the Audio Note Input Tab
This tab was conceptualized with AI and data vectorization in mind. I wanted a way to take the recordings from a session, transcribe audio to text, and index that whole object in Elastic. This data would then give us more data for when we start asking the Virtual DM questions.
The tab is a single form that appears when you select "Audio" from the log type select box.
Much like the text note input tab, the goal of the form is to get enough relevant data to form a JSON payload to send to Elastic for indexing. From there your notes are stored and you are able to search them.
NOTE:
- Using forms is really nice because the data in the widgets is removed once you hit submit. This saves you from having to manually remove your note that you just typed. However, this only removes that data from the GUI, not the session state.
- I used a combination of the "text_form_variable_list" list and the "clear_session_state" function to maintain a clear session state for every note. If this wasn't in here, you may have lingering data in variables and your notes wouldn't be as accurate.
def app_page2_audio():
# displays audio note form and widgets
import json
#list of variables to clear from session state once finished
audio_form_variable_list = ["log_type","log_session","file","submitted","transcribed_text","log_payload"]
# displays note form widgets, creates note payload, sends payload to an Elastic index, and handles error / success / warning messages
with st.form("audio_form", clear_on_submit=True):
st.session_state["log_type"] = "audio"
st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
st.session_state["file"] = st.file_uploader("Choose audio file",type=[".3ga",".8svx",".aac",".ac3",".aif",".aiff",".alac",".amr",".ape",".au",".dss",".flac",".flv",".m2ts",".m4a",".m4b",".m4p",".m4p",".m4r",".m4v",".mogg",".mov",".mp2",".mp3",".mp4",".mpga",".mts",".mxf",".oga",".ogg",".opus",".qcp",".ts",".tta",".voc",".wav",".webm",".wma",".wv"])
st.session_state["submitted"] = st.form_submit_button("Upload file")
if st.session_state.submitted and st.session_state.file is not None:
st.session_state["transcribed_text"] = transcribe_audio(st.session_state.file)
if st.session_state.transcribed_text is not None:
st.session_state["log_payload"] = json.dumps({"session":st.session_state.log_session,"type":st.session_state.log_type,"message":st.session_state.transcribed_text})
elastic_index_document("dnd-notes-transcribed",st.session_state.log_payload)
else:
error_message("Audio transcription failure")
else:
st.warning('Please upload a file and submit')
# clears session state
clear_session_state(audio_form_variable_list)
Audio Form
As mentioned above, this is the only form for this log type. It has the user input a session number and upload a file. Once submitted, a function turns audio into text, which helps build the JSON payload for indexing into Elastic.
Transcribe Audio Function
This function is the workhorse of the audio note input tab and makes use of AssemblyAI. It takes the file that was uploaded and makes an API call to get the file URL, takes the file URL and makes an API call to gets the transcribe ID, and polls until the status is completed. Once the status is completed, the function returns the text object.
def transcribe_audio(file):
# transcribes an audio file to text
import requests
# get file url
headers = {'authorization':assemblyai_api_key}
response = requests.post('https://api.assemblyai.com/v2/upload',headers=headers,data=file)
url = response.json()["upload_url"]
# get transcribe id
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {"audio_url":url}
headers = {"authorization":assemblyai_api_key,"content-type":"application/json"}
response = requests.post(endpoint, json=json, headers=headers)
transcribe_id = response.json()['id']
result = {}
#polling
while result.get("status") != "processing":
# get text
endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
headers = {"authorization":assemblyai_api_key}
result = requests.get(endpoint, headers=headers).json()
while result.get("status") != 'completed':
# get text
endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
headers = {"authorization":assemblyai_api_key}
result = requests.get(endpoint, headers=headers).json()
return result['text']
Related Functions
def clear_session_state(variable_list):
# deletes variables from streamlit session state
for variable in variable_list:
try:
del st.session_state[variable]
except:
pass
def elastic_index_document(index,document):
# sends a document to an Elastic index
from elasticsearch import Elasticsearch
# creates Elastic connection
client = Elasticsearch(
elastic_url,
ca_certs=elastic_ca_certs,
api_key=elastic_api_key
)
# sends document to index with success or failure message
response = client.index(index=index,document=document)
if response["result"] == "created":
success_message("Note creation successful")
else:
error_message("Note creation failure")
# close Elastic connection
client.close()
def error_message(text):
# displays error message
import time
error = st.error(text)
time.sleep(1)
error.empty()
Closing Remarks
This section is subject to change. As I continue working on making the project more accessible, I may migrate this to a service that doesn't require putting a few dollars into it every month. It is worth it to me and my use case, but I think it's worth trying to make everything completely free.
Next week, I will be covering the process of moving this project to Docker. I did so to make it much easier for people to set this up for their own D&D groups and I think it is worth talking about.
Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!
Happy Coding,
Joe
Top comments (0)