DEV Community: Joe

Elastic D&D - Update 16 - Bug & Logic Fixes

Joe — Fri, 12 Jan 2024 21:12:26 +0000

In the last post we talked about how fixing the password reset function. If you missed it, you can check that out here!

Bug & Logic Fixes

First of all, Happy New Year everyone!

I haven't added any major features, but I have gotten to work on a few minor things:

Added a function to get current and previous session numbers

The query in this function gets a recent log that is NOT from "today" ("now/d" is the current date) and grabs the "session" field. It then adds 1 to that value for current session and returns both numbers.

def elastic_get_session_numbers(log_index):
    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # gets last session number
    response = client.search(index=log_index,size=1,sort=["@timestamp:desc"],source=["session"],query={"bool":{"must":[{"range":{"@timestamp":{"lt":"now/d"}}}]}})

    # grab last session number and calculate current session number
    last_session = response["hits"]["hits"][0]["_source"]["session"]
    current_session = int(last_session) + 1

    return last_session, current_session

Added current session number to the sidebar

The following code is added to the bottom of the sidebar, displaying the current session number.

st.text("Current session: " + str(st.session_state.current_session))

Fixed the function to get previous session summary to use previous session number

The function now uses the following code for the Elastic query, which specifies the session number grabbed from the function described above:

response = client.search(index=log_index,size=1,source=["message"],query={"bool":{"must":[{"match":{"type":"overview"}},{"match":{"session":session_number}}]}})

Fixed the function to get previous session summary to return generic message if no overview note was added in that session

This part was added out of necessity, as an error would occur if there was no overview log to be found. I'm giving a public thank you to @rusty13jr for finding this for me!

try:
        summary = response["hits"]["hits"][0]["_source"]["message"]
    except:
        summary = "No overview log was submitted from last session."

    return summary

Added placeholders for question prompts that Veverbot cannot use AI to answer

These will be used in the near future, and will do the work necessary to return useful information to questions that cannot utilize the KNN search.

# question prompts
    column1, column2, column3, column4, column5 = st.columns(5)
    column1.button("Question 1 Placeholder",type="primary",on_click=None)
    column2.button("Question 2 Placeholder",type="primary",on_click=None)
    column3.button("Question 3 Placeholder",type="primary",on_click=None)
    column4.button("Question 4 Placeholder",type="primary",on_click=None)
    column5.button("Question 5 Placeholder",type="primary",on_click=None)
    st.header("",divider="grey")

Closing Remarks

I finally have my group in the program and taking notes. Them using it has exposed some bugs and code errors, ultimately making the program better. This is much appreciated by me, so please reach out with any issues you run into if you are using the program as well!

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 15 - Fixing Password Reset

Joe — Fri, 22 Dec 2023 18:03:16 +0000

In the last post we talked about how rewriting the Note Input. If you missed it, you can check that out here!

Fixing Password Reset

I finally gave access to my project to my D&D group! This would have been exciting, except when they all went to reset their passwords from "changeme", there were some issues. Specifically, the update_yml function wasn't able to locate the "config" variable.

Old code:

def update_yml():
    # updates login authentication configuration file
    with open(streamlit_project_path + "auth.yml", 'w') as file:
        yaml.dump(config, file, default_flow_style=False)

Now, I should have realized this, but this function wasn't able to grab the variable from outside of the function. I made this change to pass the configuration into the function and it works now.

New code:

def update_yml(updated_config):
    # updates login authentication configuration file
    with open(streamlit_project_path + "auth.yml", 'w') as file:
        yaml.dump(updated_config, file, default_flow_style=False)

NOTE:

I felt really dumb once I realized what was happening. I legitimately couldn't figure out what was wrong and had to sleep off the frustration in order to spot this. Saying that to say, silly mistakes happen sometimes and that's okay.

Logon Issue

Funnily enough, while I was troubleshooting the YAML issue, I noticed that users could still log in even if they put in an incorrect password. This shouldn't have been the case, and it ended up being a logic issue in the code.

See, when inputting an incorrect password, the username field was still being populated, which bypassed my check in the code:

if not st.session_state.username:
    DISPLAY LOGON WIDGET
else:
    DISPLAY HOME PAGE

Now, we initialize the authentication status along with the username, as well as check that instead of the username:

initialize_session_state(["username","authentication_status"])
...
if st.session_state.authentication_status in (False,None):
    DISPLAY LOGON WIDGET
else:
    DISPLAY HOME PAGE

Closing Remarks

I'm glad I was able to identify the faulty parts of both of these issues. The program as a whole functions better with these changes in place. This is a friendly reminder to tell me about issues that you come across via Github. I will get to them!

I will be taking next week off for the holidays, but work on the player dashboard will begin soon after!

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 14 - Note Input Rewrite

Joe — Sat, 16 Dec 2023 18:45:58 +0000

In the last post we talked about how text chunking. If you missed it, you can check that out here!

Note Input Rewrite

The inspiration behind this rewrite actually comes from my girlfriend, who mentioned having some sort of glossary/index added to the application. I thought that was a great idea, but the data structure needed to be tweaked for that. In the process of tweaking the structure, I had an overwhelming urge to clean up my code, so here we are!

Logically, I see the code in two major sections -- a data collection section, and a processing/indexing section.

Full code:

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 12/09/2023
# 
# Streamlit - Note Input Page - Allows the user to store audio or text notes in Elasticsearch.

import streamlit as st
from functions import *
from variables import *

# set streamlit app to use centered format
st.set_page_config(layout="centered")

# initializes session state, loads login authentication configuration
initialize_session_state(["username"])
config, authenticator = load_yml()

# makes user log on to view page
if not st.session_state.username:
    error_message("UNAUTHORIZED: Please login on the Home page.",False)
else:
    with st.sidebar:
        # adds elastic d&d logo to sidebar
        display_image(streamlit_data_path + "banner.png","auto")
        st.divider()
        # add character picture to sidebar, if available
        try:
            display_image(streamlit_data_path + st.session_state.username + ".png","auto")
        except:
            print("Picture unavailable for home page sidebar.")
    st.header('Note Input',divider=True)
    # gather information for log_payload in form
    form_variable_list = ["log_id","log_type","log_session","log_index","file","location_name","location_description","overview_summary","person_name","person_description","quest_name","quest_description","quest_finished","submitted","transcribed_text","content","content_vector"]
    st.session_state["log_type"] = st.selectbox("What kind of note is this?", ["audio","location","miscellaneous","overview","person","quest"])
    if st.session_state.log_type == "quest":
        st.session_state["quest_type"] = st.selectbox("Is this quest new or existing?", ["New","Existing"])
    with st.form(st.session_state.log_type, clear_on_submit=True):
        st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
        st.session_state["log_id"] = "session" + str(st.session_state.log_session) + "-" + generate_unique_id()
        ###CHECK IF LOG_ID EXISTS, RE-GENERATE IF IT DOES###
        if st.session_state.log_type == "audio":
            st.session_state["log_index"] = "dnd-notes-transcribed"
            if assemblyai_api_key:
                st.session_state["file"] = st.file_uploader("Choose audio file",type=[".3ga",".8svx",".aac",".ac3",".aif",".aiff",".alac",".amr",".ape",".au",".dss",".flac",".flv",".m2ts",".m4a",".m4b",".m4p",".m4p",".m4r",".m4v",".mogg",".mov",".mp2",".mp3",".mp4",".mpga",".mts",".mxf",".oga",".ogg",".opus",".qcp",".ts",".tta",".voc",".wav",".webm",".wma",".wv"])
            else:
                st.session_state["file"] = st.file_uploader("Choose audio file",type=[".wav"])
            if st.session_state.file is not None:
                st.session_state["ready_for_submission"] = True
            else:
                st.warning('Please upload a file and submit')
        else:
            st.session_state["log_index"] = "dnd-notes-" + st.session_state.username
            if st.session_state.log_type == "location":
                st.session_state["location_name"] = text_cleanup(st.text_input("Input location name:"))
                st.session_state["location_description"] = text_cleanup(st.text_area("Input location description:"))
                if st.session_state.location_name is not None and st.session_state.location_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the location name, description, and submit')
            elif st.session_state.log_type == "miscellaneous":
                st.session_state["miscellaneous_note"] = text_cleanup(st.text_area("Input miscellaneous note:"))
                if st.session_state.miscellaneous_note is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter miscellaneous note and submit')
            elif st.session_state.log_type == "overview":
                st.session_state["overview_summary"] = text_cleanup(st.text_area("Input session summary:"))
                if st.session_state.overview_summary is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the session overview/summary and submit')
            elif st.session_state.log_type == "person":
                st.session_state["person_name"] = text_cleanup(st.text_input("Input person name:"))
                st.session_state["person_description"] = text_cleanup(st.text_area("Input person description:"))
                if st.session_state.person_name is not None and st.session_state.person_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the person name, description, and submit')
            elif st.session_state.log_type == "quest":
                if st.session_state.quest_type == "Existing":
                    st.session_state["quest_name"] = st.selectbox("Select quest to update", elastic_get_quests())
                else:
                    st.session_state["quest_name"] = st.text_input("Input quest name:")
                st.session_state["quest_description"] = text_cleanup(st.text_area("Input quest description / update:"))
                st.session_state["quest_finished"] = st.checkbox("Is the quest finished?")
                if st.session_state.quest_name is not None and st.session_state.quest_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the quest name, description, mark the status, and submit')
        # submit form, process data, and index log_payload
        st.session_state["submitted"] = st.form_submit_button("Submit")
        if st.session_state.submitted == True and st.session_state.ready_for_submission == True:
            # audio to text transcription
            if st.session_state.log_type == "audio":
                if assemblyai_api_key:
                    st.session_state["transcribed_text"] = text_cleanup(transcribe_audio_paid(st.session_state.file))
                else:
                    st.session_state["transcribed_text"] = text_cleanup(transcribe_audio_free(st.session_state.file))
                if st.session_state.transcribed_text not in (None,""):
                    chunk_array = split_text_with_overlap(st.session_state.transcribed_text)
                    for chunk in chunk_array:
                        st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                        st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                        if st.session_state.content_vector == None:
                            error_message("AI API vectorization failure",2)
                        else:
                            st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.transcribed_text,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                            elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # location logs
            elif st.session_state.log_type == "location":
                chunk_array = split_text_with_overlap(st.session_state.location_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The location is " + st.session_state.location_name + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.location_name + ". " + st.session_state.location_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"location":{"name":st.session_state.location_name,"description":st.session_state.location_description}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # miscellaneous logs
            elif st.session_state.log_type == "miscellaneous":
                chunk_array = split_text_with_overlap(st.session_state.miscellaneous_note)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.miscellaneous_note,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # overview logs
            elif st.session_state.log_type == "overview":
                chunk_array = split_text_with_overlap(st.session_state.overview_summary)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.overview_summary,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # person logs
            elif st.session_state.log_type == "person":
                chunk_array = split_text_with_overlap(st.session_state.person_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The person's name is " + st.session_state.person_name + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.person_name + ". " + st.session_state.person_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"person":{"name":st.session_state.person_name,"description":st.session_state.person_description}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # quest logs
            elif st.session_state.log_type == "quest":
                if st.session_state.quest_finished == True:
                    elastic_update_quest_status(st.session_state.quest_name)
                    status = "The quest has been completed."
                else:
                    status = "The quest has not been completed yet."
                chunk_array = split_text_with_overlap(st.session_state.quest_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The quest is " + st.session_state.quest_name + ". " + status + " " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.quest_name + ". " + st.session_state.quest_description + status,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"quest":{"name":st.session_state.quest_name,"description":st.session_state.quest_description,"finished":st.session_state.quest_finished}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
    clear_session_state(form_variable_list)

Data Collection

The first half of the code mainly deals with data input and sorting that data into variables that the second half of the code will use for manipulation and/or payloads.

These payloads are important, as they provide the new data structure mentioned above.

The data collection code consists of lines 32-89:

    # gather information for log_payload in form
    form_variable_list = ["log_id","log_type","log_session","log_index","file","location_name","location_description","overview_summary","person_name","person_description","quest_name","quest_description","quest_finished","submitted","transcribed_text","content","content_vector"]
    st.session_state["log_type"] = st.selectbox("What kind of note is this?", ["audio","location","miscellaneous","overview","person","quest"])
    if st.session_state.log_type == "quest":
        st.session_state["quest_type"] = st.selectbox("Is this quest new or existing?", ["New","Existing"])
    with st.form(st.session_state.log_type, clear_on_submit=True):
        st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
        st.session_state["log_id"] = "session" + str(st.session_state.log_session) + "-" + generate_unique_id()
        ###CHECK IF LOG_ID EXISTS, RE-GENERATE IF IT DOES###
        if st.session_state.log_type == "audio":
            st.session_state["log_index"] = "dnd-notes-transcribed"
            if assemblyai_api_key:
                st.session_state["file"] = st.file_uploader("Choose audio file",type=[".3ga",".8svx",".aac",".ac3",".aif",".aiff",".alac",".amr",".ape",".au",".dss",".flac",".flv",".m2ts",".m4a",".m4b",".m4p",".m4p",".m4r",".m4v",".mogg",".mov",".mp2",".mp3",".mp4",".mpga",".mts",".mxf",".oga",".ogg",".opus",".qcp",".ts",".tta",".voc",".wav",".webm",".wma",".wv"])
            else:
                st.session_state["file"] = st.file_uploader("Choose audio file",type=[".wav"])
            if st.session_state.file is not None:
                st.session_state["ready_for_submission"] = True
            else:
                st.warning('Please upload a file and submit')
        else:
            st.session_state["log_index"] = "dnd-notes-" + st.session_state.username
            if st.session_state.log_type == "location":
                st.session_state["location_name"] = text_cleanup(st.text_input("Input location name:"))
                st.session_state["location_description"] = text_cleanup(st.text_area("Input location description:"))
                if st.session_state.location_name is not None and st.session_state.location_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the location name, description, and submit')
            elif st.session_state.log_type == "miscellaneous":
                st.session_state["miscellaneous_note"] = text_cleanup(st.text_area("Input miscellaneous note:"))
                if st.session_state.miscellaneous_note is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter miscellaneous note and submit')
            elif st.session_state.log_type == "overview":
                st.session_state["overview_summary"] = text_cleanup(st.text_area("Input session summary:"))
                if st.session_state.overview_summary is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the session overview/summary and submit')
            elif st.session_state.log_type == "person":
                st.session_state["person_name"] = text_cleanup(st.text_input("Input person name:"))
                st.session_state["person_description"] = text_cleanup(st.text_area("Input person description:"))
                if st.session_state.person_name is not None and st.session_state.person_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the person name, description, and submit')
            elif st.session_state.log_type == "quest":
                if st.session_state.quest_type == "Existing":
                    st.session_state["quest_name"] = st.selectbox("Select quest to update", elastic_get_quests())
                else:
                    st.session_state["quest_name"] = st.text_input("Input quest name:")
                st.session_state["quest_description"] = text_cleanup(st.text_area("Input quest description / update:"))
                st.session_state["quest_finished"] = st.checkbox("Is the quest finished?")
                if st.session_state.quest_name is not None and st.session_state.quest_description is not None:
                    st.session_state["ready_for_submission"] = True
                else:
                    st.warning('Please enter the quest name, description, mark the status, and submit')

Processing / Indexing

The second half of the code manipulates data inside of the variables set in the first half of code, builds payloads, and sends those off for indexing into Elastic.

The processing / indexing code consists of lines 90-168:

# submit form, process data, and index log_payload
        st.session_state["submitted"] = st.form_submit_button("Submit")
        if st.session_state.submitted == True and st.session_state.ready_for_submission == True:
            # audio to text transcription
            if st.session_state.log_type == "audio":
                if assemblyai_api_key:
                    st.session_state["transcribed_text"] = text_cleanup(transcribe_audio_paid(st.session_state.file))
                else:
                    st.session_state["transcribed_text"] = text_cleanup(transcribe_audio_free(st.session_state.file))
                if st.session_state.transcribed_text not in (None,""):
                    chunk_array = split_text_with_overlap(st.session_state.transcribed_text)
                    for chunk in chunk_array:
                        st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                        st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                        if st.session_state.content_vector == None:
                            error_message("AI API vectorization failure",2)
                        else:
                            st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.transcribed_text,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                            elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # location logs
            elif st.session_state.log_type == "location":
                chunk_array = split_text_with_overlap(st.session_state.location_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The location is " + st.session_state.location_name + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.location_name + ". " + st.session_state.location_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"location":{"name":st.session_state.location_name,"description":st.session_state.location_description}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # miscellaneous logs
            elif st.session_state.log_type == "miscellaneous":
                chunk_array = split_text_with_overlap(st.session_state.miscellaneous_note)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.miscellaneous_note,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # overview logs
            elif st.session_state.log_type == "overview":
                chunk_array = split_text_with_overlap(st.session_state.overview_summary)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.overview_summary,"content":st.session_state.content,"content_vector":st.session_state.content_vector})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # person logs
            elif st.session_state.log_type == "person":
                chunk_array = split_text_with_overlap(st.session_state.person_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The person's name is " + st.session_state.person_name + ". " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.person_name + ". " + st.session_state.person_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"person":{"name":st.session_state.person_name,"description":st.session_state.person_description}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
            # quest logs
            elif st.session_state.log_type == "quest":
                if st.session_state.quest_finished == True:
                    elastic_update_quest_status(st.session_state.quest_name)
                    status = "The quest has been completed."
                else:
                    status = "The quest has not been completed yet."
                chunk_array = split_text_with_overlap(st.session_state.quest_description)
                for chunk in chunk_array:
                    st.session_state["content"] = "This note is from session " + str(st.session_state.log_session) + ". The quest is " + st.session_state.quest_name + ". " + status + " " + chunk
                    st.session_state["content_vector"] = api_get_vector_object(st.session_state.content)
                    if st.session_state.content_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.quest_name + ". " + st.session_state.quest_description + status,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"quest":{"name":st.session_state.quest_name,"description":st.session_state.quest_description,"finished":st.session_state.quest_finished}})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)

Data Structure

Audio Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.transcribed_text,"content":st.session_state.content,"content_vector":st.session_state.content_vector}

Location Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.location_name + ". " + st.session_state.location_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"location":{"name":st.session_state.location_name,"description":st.session_state.location_description}}

id: a unique identifier for a log or group of logs if split by text chunking function
type: what kind of log it is (audio, location, etc.)
session: the session number
message: a combination of location name and description
content: a combination of all relevant information (session number, etc.) and the chunk of text provided by the text chunking function
content_vector: the vector object of the content field, used by Veverbot for returning relevant results
location.name: the name of the location, to be used in the glossary/index
location.description: the description of the location, to be used in the glossary/index

Miscellaneous Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.miscellaneous_note,"content":st.session_state.content,"content_vector":st.session_state.content_vector}

Overview Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.overview_summary,"content":st.session_state.content,"content_vector":st.session_state.content_vector}

Person Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.person_name + ". " + st.session_state.person_description,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"person":{"name":st.session_state.person_name,"description":st.session_state.person_description}}

id: a unique identifier for a log or group of logs if split by text chunking function
type: what kind of log it is (audio, location, etc.)
session: the session number
message: a combination of person name and description
content: a combination of all relevant information (session number, etc.) and the chunk of text provided by the text chunking function
content_vector: the vector object of the content field, used by Veverbot for returning relevant results
person.name: the name of the NPC, to be used in the glossary/index
person.description: the description of the NPC, to be used in the glossary/index

Quest Logs

{"id":st.session_state.log_id,"type":st.session_state.log_type,"session":st.session_state.log_session,"message":st.session_state.quest_name + ". " + st.session_state.quest_description + status,"content":st.session_state.content,"content_vector":st.session_state.content_vector,"quest":{"name":st.session_state.quest_name,"description":st.session_state.quest_description,"finished":st.session_state.quest_finished}}

id: a unique identifier for a log or group of logs if split by text chunking function
type: what kind of log it is (audio, location, etc.)
session: the session number
message: a combination of quest name, description, and status
content: a combination of all relevant information (session number, etc.) and the chunk of text provided by the text chunking function
content_vector: the vector object of the content field, used by Veverbot for returning relevant results
quest.name: the name of the quest, to be used in the glossary/index
quest.description: the description/update of the quest, to be used in the glossary/index
quest.finished: the status of the quest, to be used in the glossary/index

Closing Remarks

Overall, the rewrite went smoothly! I feel that I can do much more with the new structure and it will be easier to add fields in the future under the location, person, and quest objects.

Next week, I may begin talking about the new player dashboard that will be replacing the home page. However, it has come to my attention that the password reset functionality is broken, so I may be fixing that instead.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 13 - Text Chunking

Joe — Fri, 08 Dec 2023 15:44:00 +0000

In the last post we talked about how Veverbot works. If you missed it, you can check that out here!

Chunking

Chunking is the process of breaking something large into smaller, more manageable pieces. For example, the free audio transcription method uses this on the audio file. You can see that here.

While using Veverbot, I noticed that larger text passages were awful for returning relevant information back to the AI assistant. To make Veverbot better, I have been working on breaking these large text passages into smaller ones with context; meaning that the text chunks have some overlap in order to return better responses.

Python Function

Accomplishing chunking with overlap ended up being fairly easy. Using the Natural Language Toolkit, specifically Punkt, we are able to tokenize text passages into an array of sentences. From there, we can loop through this array and check the length of the chunk and sentence. If the sum is greater than the chunk_size variable, it is added to the chunks array and the overlap is calculated. The overlap is calculated the same way, except in reverse, which makes this process quite fast. When it is finished, the function returns an array of text chunks to use in the log_payload for Elastic indexing.

def split_text_with_overlap(text, chunk_size=500, overlap_size=100):
    # download punky and initialize tokenizer
    nltk.download("punkt")
    tokenizer = nltk.tokenize.punkt.PunktSentenceTokenizer()

    # separate text into an array of sentences
    array = tokenizer.tokenize(text)

    # if length of text chunk > 500, index document
    # afterwards, prepend previous 100 characters for context overlap
    chunks = []
    chunk = ""
    for index, sentence in enumerate(array):
        if (len(chunk) + len(sentence)) >= chunk_size:
            chunks.append(chunk)

            overlap = ""
            overlap_length = len(overlap)
            overlap_index = index - 1
            while ((overlap_length + len(array[overlap_index])) < overlap_size) and overlap_index != -1:
                overlap = (array[overlap_index] + overlap)
                overlap_length = len(overlap)
                overlap_index = overlap_index - 1
            chunk = overlap + sentence
        else:
            chunk += sentence
    # index last bit of text that may not hit length limit
    chunks.append(chunk)

    return chunks

NOTE:

I will show how this process fits into note input once I finish my rewrite of that page. It is almost done and I am super happy with it.

Closing Remarks

I am quite pleased with how this process panned out. It works very well and it is lightning fast, which is something that I was worried about.

I plan on finishing my note input rewrite by next week so I hope to talk about that in the next post. If not, I can begin talking about the new player dashboard that will be replacing the home page.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 12 - Veverbot - Asking Questions and Receiving Answers

Joe — Fri, 17 Nov 2023 17:31:20 +0000

In the last post we talked about Veverbot and data vectorization. If you missed it, you can check that out here!

NOTE:

The first bit of this post will be similar to the last post. If you are caught up, you can skip ahead.

Veverbot

Veverbot is my own custom AI assistant that aims to help players get quick answers about things that happened during their campaign so far. This is absolutely a work-in-progress, but even the first iteration of him is very cool.

We have already talked about the logging process, so today I will be talking about what needs to be done to ask questions and receive answers from Veverbot.

Elastic Configuration

To refresh your memory, I want to provide the Elastic templates in place for this data. Currently, I am using two templates: one for the "dnd-notes-*" indices, and another for an index named "virtual_dm-questions_answers". The second index contains the questions that players ask Veverbot, as well as the responses that Veverbot provides back to the players.

dnd-notes-* component template

{
      "name": "dnd-notes",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "type": "date"
              },
              "session": {
                "type": "long"
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "finished": {
                "type": "boolean"
              },
              "message": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "message_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

virtual_dm-questions_answers component template

{
      "name": "virtual_dm-questions_answers",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "question_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              },
              "answer": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "question": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "answer_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

NOTE:

The mappings and templates are automatically created via the docker-compose file! This is simply educational, a user will not have to deal with the creation of any of this.

Asking Questions

I showed the code for this page in the Streamlit app here. Definitely go check that out.

Asking Veverbot a question is fairly straightforward with the chat window implementation -- type a questions into the chat bar!

From here, the question is stored in a variable, the question is vectorized via FastAPI (see this post) and stored in another variable.

Receiving Answers

To receive an answer, a Kibana KNN query is run with the vectorized question.

Both the question and the query results are then sent to OpenAI via FastAPI (see above link) to formulate a coherent response. This response is returned to the chat window below the question.

The question and the response from OpenAI is also stored in an Elastic index for later use that is to be determined.

Closing Remarks

Full disclosure -- our D&D group hasn't played in a few weeks and I haven't put as much effort into the project in that time. Saying that to say, I have no clue what I will talk about next week. Maybe keeping up with the blog will motivate me to dedicate a few hours each week to this; only time will tell.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 11 - Veverbot - Data Vectorization

Joe — Sat, 04 Nov 2023 15:31:57 +0000

Last week we talked about audio transcription changes. If you missed it, you can check that out here!

Veverbot

This is a fairly involved process, so today I will be talking about what needs to be done from the logging / Elastic configuration side of things in order for Veverbot to work.

Elastic Configuration

For Veverbot to work, we simply need to add/adjust the mappings of index templates. Currently, I am using two templates: one for the "dnd-notes-*" indices, and another for an index named "virtual_dm-questions_answers". The second index contains the questions that players ask Veverbot, as well as the responses that Veverbot provides back to the players.

dnd-notes-* component template

{
      "name": "dnd-notes",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "type": "date"
              },
              "session": {
                "type": "long"
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "finished": {
                "type": "boolean"
              },
              "message": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "message_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

virtual_dm-questions_answers component template

{
      "name": "virtual_dm-questions_answers",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "question_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              },
              "answer": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "question": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "answer_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

NOTE:

The mappings and templates are automatically created via the docker-compose file! This is simply educational, a user will not have to deal with the creation of any of this.

Logging

With the mappings in place, we can now ingest logs with a dense_vector field. If you recall, this step happens on the note input page of Streamlit and is applied to every note that gets sent to Elastic.

Audio Note

st.session_state["message_vector"] = api_get_vector_object(st.session_state.transcribed_text)

Text Note

st.session_state["message_vector"] = api_get_vector_object(st.session_state.log_message)

The function that gets called simply makes a get request to the FastAPI that was talked about in the week 9 blog post!

def api_get_vector_object(text):
    # returns vector object from supplied text

    fastapi_endpoint = "/get_vector_object/"
    full_url = fastapi_url + fastapi_endpoint + text
    response = requests.get(full_url)

    try:
        message_vector = response.json()
    except:
        message_vector = None
        print(response.content)

    return message_vector

The API accepts the text as a variable, creates an embedding via OpenAI, and returns the vector object from the embedding. This vector object is what will allow Veverbot to compare user questions to player notes and return an answer.

@app.get("/get_vector_object/{text}")
async def get_vector_object(text):
    import openai

    openai.api_key = "API_KEY"
    embedding_model = "text-embedding-ada-002"
    openai_embedding = openai.Embedding.create(input=text, model=embedding_model)

    return openai_embedding["data"][0]["embedding"]

The log is indexed as normal, now with dense_vector field. This field is what will allow Veverbot to compare user questions to player notes and return an answer, which we will talk about next week!

Closing Remarks

As previously stated, next week I will be talking about Veverbot from the Streamlit side. I will essentially walk through the user experience and what is happening in the background to produce the "conversation" that happens on the front end.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 10 - Audio Transcription Changes

Joe — Sat, 28 Oct 2023 14:38:00 +0000

Last week we talked about FastAPI. If you missed it, you can check that out here!

Introduction

I decided to write about the audio transcription changes this week, as I finally got some code in place to give users an alternative method. Previously, audio to text was using something called AssemblyAI. However, transcribing 15-20 hours of audio was costing ~$8-15 per month. This code gives users the option to do it for free, though it does take much longer.

Speech Recognition

Speech Recognition is a Python library for performing speech recognition via multiple APIs. It has support for both online and offline APIs, which makes it pretty powerful. For our use-case, I utilized the OpenAI Whisper method.

Here's the full code:

def transcribe_audio_free(file_object):
    # get extension
    filename, file_extension = os.path.splitext(file_object.name)

    # create temp file
    with NamedTemporaryFile(suffix=file_extension,delete=False) as temp:
        temp.write(file_object.getvalue())
        temp.seek(0)

        # split file into chunks
        audio = AudioSegment.from_file(temp.name)
        audio_chunks = split_on_silence(audio,
            # experiment with this value for your target audio file
            min_silence_len=3000,
            # adjust this per requirement
            silence_thresh=audio.dBFS-30,
            # keep the silence for 1 second, adjustable as well
            keep_silence=100,
        )

        # create a directory to store the audio chunks
        folder_name = "audio-chunks"
        if not os.path.isdir(folder_name):
            os.mkdir(folder_name)
        whole_text = ""

        # process each chunk 
        for i, audio_chunk in enumerate(audio_chunks, start=1):
            # export audio chunk and save it in the `folder_name` directory.
            chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
            audio_chunk.export(chunk_filename, format="wav")
            # recognize the chunk
            try:
                # audio to text
                r = sr.Recognizer()
                uploaded_chunk = sr.AudioFile(chunk_filename)
                with uploaded_chunk as source:
                    chunk_audio = r.record(source)
                text = r.recognize_whisper(chunk_audio,"medium")
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text

        # close temp file
        temp.close()
        os.unlink(temp.name)

    # clean up the audio-chunks folders
    shutil.rmtree(folder_name)

    # return the text for all chunks detected
    return whole_text

There's really not much here, so I'll quickly step through the process:

Creates a temporary file
Loads temporary file into PyDub and splits it into smaller files
Creates a directory to store the smaller files
Iterates through the smaller files a. Places the file into the directory b. Performs speech-to-text via Whisper c. Adds transcribed text to "whole_text" variable
Closes the temporary file
Removes the directory
Returns "whole_text"

NOTE:

You may have to change the values inside of audio_chunks = split_on_silence() to better work with your file. 3000, -30, 100 was the sweet spot during testing for me.
You may have to use a different model for Whisper. You can change "medium" to a model that better fits your use-case here: text = r.recognize_whisper(chunk_audio,"medium")

Closing Remarks

Please note that the paid method takes significantly less time and is generally worth using in my opinion. I may work on writing in a progress bar for the free method at some point. Regardless, both methods will be available for use.

Next week, I will begin showing off Veverbot and the mechanisms in place to get him to work. I promise.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 9 - FastAPI

Joe — Fri, 20 Oct 2023 17:07:54 +0000

Last week we talked about the changes to the Streamlit application. If you missed it, you can check that out here!

FastAPI

FastAPI is a Python library used for creating, you guessed it, APIs. As the name implies, it's quick and completely custom, which is powerful.

Currently, I have a few endpoints built; both of which help with the functionality of Veverbot. Here's the full API:

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# FastAPI app that facilitates Virtual DM processes and whatever else I think of.

import uvicorn
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return {"message":"Hello World"}

@app.get("/get_vector_object/{text}")
async def get_vector_object(text):
    import openai

    openai.api_key = "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"
    embedding_model = "text-embedding-ada-002"
    openai_embedding = openai.Embedding.create(input=text, model=embedding_model)

    return openai_embedding["data"][0]["embedding"]

@app.get("/get_question_answer/{question}/{query_results}")
async def get_question_answer(question,query_results):
    import openai

    summary = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Answer the following question:" 
            + question 
            + "by using the following text:" 
            + query_results},
        ]
    )

    answers = []
    for choice in summary.choices:
        answers.append(choice.message.content)

    return answers

if __name__ == '__main__':
    uvicorn.run("main:app", port=8000, host='0.0.0.0', reload=True)

API Endpoints

You define custom API endpoints with @app.get(). The great thing about FastAPI is that it can handle variable input, which is done by including {variable_name} in the endpoint path. Multiple variable input is supported as well!

Root

The root endpoint is simply here to allow us to test if we can access the API from remote locations. If you see "Hello World", then you're good to go!

@app.get("/")
async def root():
    return {"message":"Hello World"}

Get Vector Object

This endpoint does exactly what the name says: gets a vector object of the variable text input. We then use this vector object in KNN queries to assist Veverbot in returning helpful results.

@app.get("/get_vector_object/{text}")
async def get_vector_object(text):
    import openai

    openai.api_key = "API_KEY"
    embedding_model = "text-embedding-ada-002"
    openai_embedding = openai.Embedding.create(input=text, model=embedding_model)

    return openai_embedding["data"][0]["embedding"]

Get Question Answer

Again, this endpoint does exactly what the name says: returns an answer to a question that is asked to Veverbot. There are two variables here -- the question that is asked to Veverbot, and the KNN query results of the asked question. Both of these are sent to OpenAI and a sentence(s) answer is returned, which is used for Veverbot's response.

@app.get("/get_question_answer/{question}/{query_results}")
async def get_question_answer(question,query_results):
    import openai

    summary = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Answer the following question:" 
            + question 
            + "by using the following text:" 
            + query_results},
        ]
    )

    answers = []
    for choice in summary.choices:
        answers.append(choice.message.content)

    return answers

Closing Remarks

This is a work-in-progess. I have plans to add more endpoints, mainly moving some Python functions over here since it would keep some of the larger ones together. Once I get audio transcription swapped from AssemblyAI to something free, I will probably move that to the API as well.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 8 - Streamlit Changes

Joe — Fri, 13 Oct 2023 18:45:37 +0000

Last week we talked about setting up port forwarding in order to access the application remotely. If you missed it, you can check that out here!

Streamlit Changes

As I began working on the AI assistant, I quickly ran into an issue: Streamlit's chat widgets could not be added into a sidebar, tab, or other existing container. This meant that I either had to include it on the first "page" in my current code, which was the logon page so that was impossible, or implement a true page system. Luckily enough, Streamlit had support for this per their documentation.

Because the code is mostly the same, I won't go in-depth on most of it.

Structure

The Streamlit application now consists of 6 parts: variables, functions, a home page, a note input page, an AI assistant page, and an account page. The home page is in the same directory that "main.py" was in, along with the functions and variables, and the pages are in a new directory named "pages".

Functions and Variables

The purpose of splitting the variables and functions into their own files was mainly a cleanliness thing. It keeps everything separate and out of the page files.

Doing this and loading them into other scripts the way I did is generally frowned upon, but it worked out great for me.

functions.py

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# Streamlit - Backend - Houses all functions used in pages of the application.

import json
import requests
import streamlit as st
import streamlit_authenticator as stauth
import time
import yaml
from elasticsearch import Elasticsearch
from PIL import Image
from variables import *
from yaml.loader import SafeLoader

### FUNCTIONS ###
def api_get_question_answer(question,query_results):
    # returns an answer to a question asked to virtual DM

    fastapi_endpoint = "/get_question_answer/"
    full_url = fastapi_url + fastapi_endpoint + question + "/" + query_results
    response = requests.get(full_url)

    try:
        answer = response.json()
    except:
        answer = None
        print(response.content)

    return answer

def api_get_vector_object(text):
    # returns vector object from supplied text

    fastapi_endpoint = "/get_vector_object/"
    full_url = fastapi_url + fastapi_endpoint + text
    response = requests.get(full_url)

    try:
        message_vector = response.json()
    except:
        message_vector = None
        print(response.content)

    return message_vector

def text_cleanup(text):
    punctuation = ["/", "?"]
    for symbol in punctuation:
        text = text.replace(symbol," ")

    return text

def clear_session_state(variable_list):
    # deletes variables from streamlit session state
    for variable in variable_list:
        try:
            del st.session_state[variable]
        except:
            pass

def display_image(image_path):
    # displays an image via path relative to streamlit app script
    image = Image.open(image_path)
    st.image(image)

def elastic_ai_notes_query(vector_object):
    # queries Elastic via a KNN query to return answers to questions via virtual DM

    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # sends document to index with success or failure message
    response = client.search(index="dnd-notes-*",knn={"field":"message_vector","query_vector":vector_object,"k":10,"num_candidates":100})

    return response['hits']['hits'][0]['_source']["message"]

    # close Elastic connection
    client.close()

def elastic_get_quests():
    # queries Elastic for unfinished quests and returns array    
    quest_names = []

    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # gets unfinished quests
    response = client.search(index=st.session_state.log_index,size=0,query={"bool":{"must":[{"match":{"type.keyword":"quest"}}],"must_not":[{"match":{"finished":"true"}}]}},aggregations={"unfinished_quests":{"terms":{"field":"name.keyword"}}})

    for line in response["aggregations"]["unfinished_quests"]["buckets"]:
        quest_names.append(line["key"])

    return quest_names

    # close Elastic connection
    client.close()

def elastic_index_document(index,document,status_message):
    # sends a document to an Elastic index

    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # sends document to index with success or failure message
    response = client.index(index=index,document=document)

    if status_message == True:
        if response["result"] == "created":
            success_message("Note creation successful")
        else:
            error_message("Note creation failure",2)
    else:
        pass

    # close Elastic connection
    client.close()

def elastic_kibana_setup(yml_config):
    # creates empty placeholder indices and data views for each player, as well as for transcribed notes

    # builds list of index patterns and descriptive data view names from YAML configuration
    kibana_setup = {"dnd-notes-*":"All Notes","dnd-notes-transcribed":"Audio Transcription Notes","virtual_dm-questions_answers":"Virtual DM Notes"}
    for username in yml_config["credentials"]["usernames"]:
        index = "dnd-notes-" + username
        name = yml_config["credentials"]["usernames"][username]["name"] + "'s Notes"
        kibana_setup[index] = name

    # creates indices and data views from usernames
    for entry in kibana_setup:
        index = entry
        name = kibana_setup[entry]

        # creates Elastic connection
        client = Elasticsearch(
            elastic_url,
            ca_certs=elastic_ca_certs,
            api_key=elastic_api_key
        )

        # creates index if it does not already exist
        response = client.indices.exists(index=index)
        if response != True:
            try:
                client.indices.create(index=index)
            except:
                pass

        # close Elastic connection
        client.close()

        # check if data view already exists
        url = kibana_url + "/api/data_views/data_view/" + index
        auth = "ApiKey " + elastic_api_key
        headers = {"kbn-xsrf":"true","Authorization":auth}
        response = requests.get(url,headers=headers)
        # if data view doesn't exist, create it
        if response.status_code != 200:
            url = kibana_url + "/api/data_views/data_view"
            json = {"data_view":{"title":index,"name":name,"id":index,"timeFieldName":"@timestamp"}}
            response = requests.post(url,headers=headers,json=json)
            # could put some error message here, don't think I need to yet

def elastic_update_quest_status(quest_name):
    # queries Elastic for unfinished quests and returns array

    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # gets unfinished quests
    query_response = client.search(index=st.session_state.log_index,size=10000,query={"bool":{"must":[{"match":{"name.keyword":quest_name}}],"must_not":[{"match":{"finished":"true"}}]}})

    for line in query_response["hits"]["hits"]:
        line_id = line["_id"]
        update_response = client.update(index="dnd-notes-corver_flickerspring",id=line_id,doc={"finished":st.session_state.quest_finished})

    # close Elastic connection
    client.close()

def error_message(text,timeframe):
    # displays error message    
    error = st.error(text)

    if timeframe == False:
        pass
    else:
        time.sleep(seconds)
        error.empty()

def initialize_session_state(variable_list):
    # creates empty variables in streamlit session state
    for variable in variable_list:
        if variable not in st.session_state:
            st.session_state[variable] = None

def load_yml():
    # loads login authentication configuration    
    with open(streamlit_project_path + "auth.yml") as file:
        config = yaml.load(file, Loader=SafeLoader)

    authenticator = stauth.Authenticate(
        config['credentials'],
        config['cookie']['name'],
        config['cookie']['key'],
        config['cookie']['expiry_days'],
        config['preauthorized']
    )

    return config, authenticator

def success_message(text):
    # displays success message
    success = st.success(text)
    time.sleep(2)
    success.empty()

def transcribe_audio(file):
    # transcribes an audio file to text

    # get file url
    headers = {'authorization':assemblyai_api_key}
    response = requests.post('https://api.assemblyai.com/v2/upload',headers=headers,data=file)
    url = response.json()["upload_url"]
    # get transcribe id
    endpoint = "https://api.assemblyai.com/v2/transcript"
    json = {"audio_url":url}
    headers = {"authorization":assemblyai_api_key,"content-type":"application/json"}
    response = requests.post(endpoint, json=json, headers=headers)
    transcribe_id = response.json()['id']
    result = {}
    #polling
    while result.get("status") != "processing":
        # get text
        endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
        headers = {"authorization":assemblyai_api_key}
        result = requests.get(endpoint, headers=headers).json()

    while result.get("status") != 'completed':
        # get text
        endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
        headers = {"authorization":assemblyai_api_key}
        result = requests.get(endpoint, headers=headers).json()

    return result['text']

def update_yml():
    # updates login authentication configuration file
    with open(streamlit_project_path + "auth.yml", 'w') as file:
        yaml.dump(config, file, default_flow_style=False)

variables.py

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/03/2023
# 
# Streamlit - Backend - Houses variables that are loaded into pages of the application.

### VARIABLES ###
# *** change this to fit your environment ***
assemblyai_api_key = "API_KEY"
elastic_api_key = "API_KEY"

# *** DO NOT CHANGE ***
elastic_url = "https://es01:9200"
elastic_ca_certs = "certs/ca/ca.crt"
fastapi_url = "http://api:8000"
kibana_url = "http://kibana:5601"
streamlit_data_path = "data/"
streamlit_project_path = "streamlit/"

The Home Page

The home page consists mostly of the old logon page code. When logged in, however, it now displays a welcome message and instructions on how to use the application!

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# Streamlit - Main Page - Displays a welcome message and explains how to navigate and use the application.

import streamlit as st
from functions import *
from variables import *

# displays application title
display_image(streamlit_data_path + "banner.png")

# initializes session state, loads login authentication configuration, and performs index/data view setup in Elastic
initialize_session_state(["username"])
config, authenticator = load_yml()
elastic_kibana_setup(config)

# makes user log on to view page
if not st.session_state.username:
    # displays login and registration widgets
    tab1, tab2 = st.tabs(["Login", "Register"])
    # login tab
    with tab1:
        try:
            name,authentication_status,username = authenticator.login("Login","main")
            if authentication_status:
                st.rerun()
            elif authentication_status == False:
                error_message('Username/password is incorrect')
            elif authentication_status == None:
                st.warning('Please enter your username and password')
        except:
            pass
    # registration tab
    with tab2:
        try:
            if authenticator.register_user('Register', preauthorization=True):
                success('User registered successfully')
                update_yml()
        except Exception as e:
            error_message(e)
else:
    st.header('Welcome!',divider=True)
    welcome_message = '''
    ## Elastic D&D is an ongoing project to facilitate note-taking and other functions derived from elements of D&D (Veverbot the AI assistant, roll data, etc.)

    ### You can navigate between pages of the application with the sidebar on the left:
    ##### The Home page is where you can go to refresh your memory on how to use the Elastic D&D application.
    ##### The Note Input page is used for storing notes for viewing and use with Virtual DM functions. Currently, you can input notes via an audio file or text.
    ##### The Veverbot page is an active chat session with your own personal AI assistant! Ask Veverbot questions about your campaign and it will give you answers, hopefully.
    ##### The Account page is used for changing your password and logging off.

    ### Stay up-to-date with the progress of this project on the [Github](https://github.com/thtmexicnkid/elastic-dnd) and the [blog](https://dev.to/thtmexicnkid)!

    ## **Thanks for using Elastic D&D!**
    '''
    st.markdown(welcome_message)

The Note Input Page

The note input page consists mostly of the old note input tab code. If trying to access this page while not logged in, it will display an unauthorized message.

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# Streamlit - Note Input Page - Allows the user to store audio or text notes in Elasticsearch.

import streamlit as st
from functions import *
from variables import *

# displays application title and sets page accordingly
display_image(streamlit_data_path + "banner.png")

# initializes session state, loads login authentication configuration, and performs index/data view setup in Elastic
initialize_session_state(["username"])
config, authenticator = load_yml()

# makes user log on to view page
if not st.session_state.username:
    error_message("UNAUTHORIZED: Please login on the Home page.",False)
else:
    st.header('Note Input',divider=True)
    st.session_state["log_index"] = "dnd-notes-" + st.session_state.username
    st.session_state["note_type"] = st.selectbox("Audio or Text?", ["Audio","Text"], index=0)
    # runs app_page2_* functions depending on what is selected in selectbox
    if st.session_state.note_type == "Audio":
        #list of variables to clear from session state once finished
        audio_form_variable_list = ["log_type","log_session","file","submitted","transcribed_text","log_payload","message_vector"]

        # displays note form widgets, creates note payload, sends payload to an Elastic index, and handles error / success / warning messages
        with st.form("audio_form", clear_on_submit=True):
            st.session_state["log_type"] = "audio"
            st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
            st.session_state["file"] = st.file_uploader("Choose audio file",type=[".3ga",".8svx",".aac",".ac3",".aif",".aiff",".alac",".amr",".ape",".au",".dss",".flac",".flv",".m2ts",".m4a",".m4b",".m4p",".m4p",".m4r",".m4v",".mogg",".mov",".mp2",".mp3",".mp4",".mpga",".mts",".mxf",".oga",".ogg",".opus",".qcp",".ts",".tta",".voc",".wav",".webm",".wma",".wv"])
            st.session_state["submitted"] = st.form_submit_button("Upload file")
            if st.session_state.submitted and st.session_state.file is not None:
                # removes forward slash that will break the API call for AI functionality
                st.session_state["transcribed_text"] = text_cleanup(transcribe_audio(st.session_state.file))
                if st.session_state.transcribed_text is not None:
                    # gets vector object for use with AI functionality
                    st.session_state["message_vector"] = api_get_vector_object(st.session_state.transcribed_text)
                    if st.session_state.message_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"session":st.session_state.log_session,"type":st.session_state.log_type,"message":st.session_state.transcribed_text,"message_vector":st.session_state.message_vector})
                        elastic_index_document("dnd-notes-transcribed",st.session_state.log_payload,True)
                else:
                    error_message("Audio transcription failure",2)
            else:
                st.warning('Please upload a file and submit')

        # clears session state
        clear_session_state(audio_form_variable_list)
    elif st.session_state.note_type == "Text":
        #list of variables to clear from session state once finished
        text_form_variable_list = ["log_type","log_session","note_taker","log_index","quest_type","quest_name","quest_finished","log_message","submitted","log_payload","message_vector"]

        # displays note form widgets, creates note payload, sends payload to an Elastic index, and handles error / success / warning messages
        st.session_state["log_type"] = st.selectbox("What kind of note is this?", ["location","miscellaneous","overview","person","quest"])
        # displays note form for quest log type
        if st.session_state.log_type == "quest":
            st.session_state["quest_type"] = st.selectbox("Is this a new or existing quest?", ["New","Existing"])
            if st.session_state.quest_type == "New":
                with st.form("text_form_new_quest", clear_on_submit=True):
                    st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
                    st.session_state["quest_name"] = st.text_input("What is the name of the quest?")
                    st.session_state["quest_finished"] = st.checkbox("Did you finish the quest?")
                    # removes forward slash that will break the API call for AI functionality
                    st.session_state["log_message"] = text_cleanup(st.text_area("Input note text:"))
                    st.session_state["submitted"] = st.form_submit_button("Upload note")
                    if st.session_state.submitted == True and st.session_state.log_message is not None:
                        # gets vector object for use with AI functionality
                        st.session_state["message_vector"] = api_get_vector_object(st.session_state.log_message)
                        if st.session_state.message_vector == None:
                            error_message("AI API vectorization failure",2)
                        else:
                            st.session_state["log_payload"] = json.dumps({"finished":st.session_state.quest_finished,"message":st.session_state.log_message,"name":st.session_state.quest_name,"session":st.session_state.log_session,"type":st.session_state.log_type,"message_vector":st.session_state.message_vector})
                            elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
                            st.rerun()
                    else:
                        st.warning('Please input note text and submit')
            else:
                quest_names = elastic_get_quests()
                with st.form("text_form_existing_quest", clear_on_submit=True):
                    st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
                    st.session_state["quest_name"] = st.selectbox("Which quest are you updating?", quest_names)
                    st.session_state["quest_finished"] = st.checkbox("Did you finish the quest?")
                    st.session_state["log_message"] = text_cleanup(st.text_area("Input note text:"))
                    st.session_state["submitted"] = st.form_submit_button("Upload note")
                    if st.session_state.submitted == True and st.session_state.log_message is not None:
                        # updates previous quest records to finished: true
                        if st.session_state.quest_finished == True:
                            elastic_update_quest_status(st.session_state.quest_name)
                        else:
                            pass
                        # gets vector object for use with AI functionality
                        st.session_state["message_vector"] = api_get_vector_object(st.session_state.log_message)
                        if st.session_state.message_vector == None:
                            error_message("AI API vectorization failure",2)
                        else:
                            st.session_state["log_payload"] = json.dumps({"finished":st.session_state.quest_finished,"message":st.session_state.log_message,"name":st.session_state.quest_name,"session":st.session_state.log_session,"type":st.session_state.log_type,"message_vector":st.session_state.message_vector})
                            elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
                            st.rerun()
                    else:
                        st.warning('Please input note text and submit')
        # displays note form for all other log types
        else:
            with st.form("text_form_wo_quest", clear_on_submit=True):
                st.session_state["log_session"] = st.number_input("Which session is this?", 0, 250)
                st.session_state["log_message"] = text_cleanup(st.text_area("Input note text:"))
                st.session_state["submitted"] = st.form_submit_button("Upload Note")
                if st.session_state.submitted == True and st.session_state.log_message is not None:
                    # gets vector object for use with AI functionality
                    st.session_state["message_vector"] = api_get_vector_object(st.session_state.log_message)
                    if st.session_state.message_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"message":st.session_state.log_message,"session":st.session_state.log_session,"type":st.session_state.log_type,"message_vector":st.session_state.message_vector})
                        elastic_index_document(st.session_state.log_index,st.session_state.log_payload,True)
                        st.rerun()
                else:
                    st.warning('Please input note text and submit')

        # clears session state
        clear_session_state(text_form_variable_list)
    else:
        pass

The AI Assistant Page

Meet Veverbot, your D&D AI assistant! This page is all brand new code. I will be getting into this in-depth in a couple of weeks, but here is your preview! If trying to access this page while not logged in, it will display an unauthorized message.

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# Streamlit - Virtual DM Page - Allows the user to ask questions and receive answers automatically.

import streamlit as st
from functions import *
from variables import *

# displays application title and sets page accordingly
display_image(streamlit_data_path + "banner.png")

# initializes session state, loads login authentication configuration, and performs index/data view setup in Elastic
initialize_session_state(["username"])
config, authenticator = load_yml()

# makes user log on to view page
if not st.session_state.username:
    error_message("UNAUTHORIZED: Please login on the Home page.",False)
else:
    st.header('Veverbot',divider=True)
    st.session_state["log_index"] = "dnd-notes-" + st.session_state.username
    virtual_dm_variable_list = ["question","response","question_vector","query_results","answer","answer_vector","log_payload"]

    # Initialize chat history
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Display chat messages from history on app rerun
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # React to user input
    st.session_state["question"] = st.chat_input("Ask Veverbot a question")
    if st.session_state.question:
        st.session_state["question"] = text_cleanup(st.session_state.question)
        # Display user message in chat message container
        st.chat_message("user").markdown(st.session_state.question)
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": st.session_state.question})
        # Display assistant response in chat message container
        response = f"Veverbot searching for answer to the question -- \"{st.session_state.question}\""
        with st.chat_message("assistant"):
            st.markdown(response)
            st.session_state.messages.append({"role": "assistant", "content": response})
            # gets vector object for use with AI functionality
            st.session_state["question_vector"] = api_get_vector_object(st.session_state.question)
            if st.session_state.question_vector == None:
                error_message("AI API vectorization failure",2)
            else:
                st.session_state["query_results"] = elastic_ai_notes_query(st.session_state.question_vector)
                st.session_state["answers"] = api_get_question_answer(st.session_state.question,st.session_state.query_results)
                for answer in st.session_state.answers:
                    st.markdown(answer)
                    st.session_state.messages.append({"role": "assistant", "content": answer})
                    st.session_state["answer_vector"] = api_get_vector_object(answer)
                    if st.session_state.answer_vector == None:
                        error_message("AI API vectorization failure",2)
                    else:
                        st.session_state["log_payload"] = json.dumps({"question":st.session_state.question,"question_vector":st.session_state.question_vector,"answer":answer,"answer_vector":st.session_state.answer_vector})
                        elastic_index_document("virtual_dm-questions_answers",st.session_state.log_payload,False)

    clear_session_state(virtual_dm_variable_list)

The Account Page

The note input page consists mostly of the old account tab code. If trying to access this page while not logged in, it will display an unauthorized message.

# Elastic D&D
# Author: thtmexicnkid
# Last Updated: 10/04/2023
# 
# Streamlit - Account Page - Allows the user to change their password and log out.

import streamlit as st
from functions import *
from variables import *

# displays application title and sets page accordingly
display_image(streamlit_data_path + "banner.png")

# initializes session state, loads login authentication configuration, and performs index/data view setup in Elastic
initialize_session_state(["username"])
config, authenticator = load_yml()

# makes user log on to view page
if not st.session_state.username:
    error_message("UNAUTHORIZED: Please login on the Home page.",False)
else:
    st.header('Account',divider=True)
    try:
        if authenticator.reset_password(st.session_state.username, 'Reset password'):
            success_message('Password modified successfully')
            update_yml()
    except Exception as e:
        error_message(e,2)
    authenticator.logout('Logout', 'main')

Closing Remarks

I really like the way that the Streamlit application turned out. It is organized, neat, and functions great with this page structure.

Next week, I want to get into the code for the API I am actively working on. Mostly, it is handling functions for the AI assistant, so it would make sense to get into that next.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 7 - Port Forwarding

Joe — Sat, 07 Oct 2023 17:17:55 +0000

Last week we talked about moving to a docker implementation. If you missed it, you can check that out here!

Port Forwarding

Port Forwarding is the process of allowing remote devices to connect to local devices by means of network redirection via a router or firewall.

As far as for Elastic D&D, this process is necessary to expose both Kibana and Streamlit to the internet so my group members in other countries can still use the application. In my use-case, all configuration is done in my router settings.

How-To

IMPORTANT:

This may be specific to my router. Your settings may be in a different menu, called something completely different, etc. Please be mindful of that!

Find your default gateway address Open Command Prompt, run "ipconfig", and grab your "default gateway" address. This should allow you to log into your router.

Open a web browser and navigate to your default gateway address

NOTE:

You may need a password to access your router settings or certain menus. You can usually find this on the back of your router.

Navigate to port forwarding settings
Per my router, port forwarding settings are under Firewall -> NAT/Gaming.
Configure port forwarding for both Kibana and Streamlit
Per my router, I set up a service entry for both Kibana and Streamlit...

...and then pointed the service applications to my device.

Access the applications via your public IP address I found my public IP address via https://www.whatismyip.com/.

Kibana can now be accessed by remote machines at http://PUBLIC_IP:5601

Streamlit can now be accessed by remote machines at http://PUBLIC_IP:8501

NOTE:

This is only for remote access. If you are trying to access the application from the local network, you need to use "localhost" instead of the public IP. I learned this the hard way...for almost 3 weeks.

Closing Remarks

This part took a very long time because, as I mentioned above, I was testing from my local network instead of remotely. Definitely a lesson learned.

I also got the chance to rewrite the entire Streamlit app to utilize pages! This allowed me to proceed with Veverbot, your very own D&D AI assistant! Lots of cool stuff to talk about in the coming weeks.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 6 - Docker Implementation

Joe — Fri, 29 Sep 2023 15:27:41 +0000

Last week we talked about the audio note input tab. If you missed it, you can check that out here!

Introduction

After finishing the first iteration of the Streamlit application, I started thinking about how to make this project accessible to a wider group of people. In it's current state, you had to know how to configure Elasticsearch/Kibana, as well as have an environment to effectively run them, in addition to running the Python Streamlit application. I had heard about Docker, but I had never used it before; so I decided to give it a try.

Docker

Docker is a product that serves virtualization in containers on a host machine.

I utilize Docker Compose to perform all of the setup for me, creating necessary volumes, networks, and containers. The full Docker Compose file is as follows:

version: "3.8"

volumes:
    certs:
        driver: local
    esdata01:
        driver: local
    kibanadata:
        driver: local
    streamlitdata:
        driver: local

networks:
    default:
        name: elastic-dnd-internal
        external: false

services:
    setup:
        image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
        volumes:
            - certs:/usr/share/elasticsearch/config/certs
        user: "0"
        command: >
            bash -c '
                if [ x${ELASTIC_PASSWORD} == x ]; then
                    echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
                    exit 1;
                elif [ x${KIBANA_PASSWORD} == x ]; then
                    echo "Set the KIBANA_PASSWORD environment variable in the .env file";
                    exit 1;
                fi;
                if [ ! -f config/certs/ca.zip ]; then
                    echo "Creating CA";
                    bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
                    unzip config/certs/ca.zip -d config/certs;
                fi;
                if [ ! -f config/certs/certs.zip ]; then
                    echo "Creating certs";
                    echo -ne \
                    "instances:\n"\
                    "  - name: es01\n"\
                    "    dns:\n"\
                    "      - es01\n"\
                    "      - localhost\n"\
                    "    ip:\n"\
                    "      - 127.0.0.1\n"\
                    "  - name: kibana\n"\
                    "    dns:\n"\
                    "      - kibana\n"\
                    "      - localhost\n"\
                    "    ip:\n"\
                    "      - 127.0.0.1\n"\
                    > config/certs/instances.yml;
                    bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
                    unzip config/certs/certs.zip -d config/certs;
                fi;
                echo "Setting file permissions"
                chown -R root:root config/certs;
                find . -type d -exec chmod 750 \{\} \;;
                find . -type f -exec chmod 640 \{\} \;;
                echo "Waiting for Elasticsearch availability";
                until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
                echo "Setting kibana_system password";
                until curl -s -X POST --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_ingest/pipeline/add_timestamp -d "{\"description\":\"Pipeline to automatically add @timestamp to incoming logs.\",\"processors\":[{\"set\":{\"field\":\"@timestamp\",\"value\":\"{{_ingest.timestamp}}\",\"ignore_empty_value\":true,\"ignore_failure\":true}}]}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_ingest/pipeline/dnd-notes -d "{\"description\":\"Pipeline to manipulate dnd notes logs.\",\"processors\":[{\"pipeline\":{\"name\":\"add_timestamp\"}}]}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_component_template/dnd-notes -d "{\"template\":{\"mappings\":{\"dynamic\":\"true\",\"dynamic_date_formats\":[\"strict_date_optional_time\",\"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z\"],\"dynamic_templates\":[],\"date_detection\":true,\"numeric_detection\":false,\"properties\":{\"@timestamp\":{\"type\":\"date\",\"format\":\"strict_date_optional_time\"},\"finished\":{\"type\":\"boolean\"},\"message\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}},\"name\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}},\"session\":{\"type\":\"long\"},\"type\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}}}}}}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_index_template/dnd-notes -d "{\"index_patterns\":[\"dnd-notes-*\"],\"template\":{\"settings\":{\"index\":{\"number_of_shards\":\"1\",\"number_of_replicas\":\"0\",\"default_pipeline\":\"dnd-notes\"}},\"mappings\":{\"_routing\":{\"required\":false},\"numeric_detection\":false,\"dynamic_date_formats\":[\"strict_date_optional_time\",\"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z\"],\"dynamic\":true,\"_source\":{\"excludes\":[],\"includes\":[],\"enabled\":true},\"dynamic_templates\":[],\"date_detection\":true}},\"composed_of\":[\"dnd-notes\"]}"
                echo "All done!";
            '
        healthcheck:
            test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
            interval: 1s
            timeout: 5s
            retries: 120
    es01:
        depends_on:
            setup:
                condition: service_healthy
        image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
        labels:
            co.elastic.logs/module: elasticsearch
        volumes:
            - certs:/usr/share/elasticsearch/config/certs
            - esdata01:/usr/share/elasticsearch/data
        ports:
            - ${ES_PORT}:9200
        environment:
            - node.name=es01
            - cluster.name=${CLUSTER_NAME}
            - discovery.type=single-node
            - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
            - bootstrap.memory_lock=true
            - xpack.security.enabled=true
            - xpack.security.http.ssl.enabled=true
            - xpack.security.http.ssl.key=certs/es01/es01.key
            - xpack.security.http.ssl.certificate=certs/es01/es01.crt
            - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
            - xpack.security.transport.ssl.enabled=true
            - xpack.security.transport.ssl.key=certs/es01/es01.key
            - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
            - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
            - xpack.security.transport.ssl.verification_mode=certificate
            - xpack.license.self_generated.type=${LICENSE}
        mem_limit: ${ES_MEM_LIMIT}
        ulimits:
            memlock:
                soft: -1
                hard: -1
        healthcheck:
            test:
                [
                "CMD-SHELL",
                "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
                ]
            interval: 10s
            timeout: 10s
            retries: 120
    api:
        depends_on:
            es01:
                condition: service_healthy
        build:
            dockerfile: .\dockerfile-api
            context: .\
        ports:
            - ${API_PORT}:8000
        volumes:
            - '.\data:/usr/src/app/data:delegated'
            - '.\project\api:/usr/src/app/api:delegated'
    kibana:
        depends_on:
            es01:
                condition: service_healthy
        image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
        labels:
            co.elastic.logs/module: kibana
        volumes:
            - certs:/usr/share/kibana/config/certs
            - kibanadata:/usr/share/kibana/data
        ports:
            - ${KIBANA_PORT}:5601
        environment:
            - SERVERNAME=kibana
            - ELASTICSEARCH_HOSTS=https://es01:9200
            - ELASTICSEARCH_USERNAME=kibana_system
            - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
            - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
            - XPACK_SECURITY_ENCRYPTIONKEY=${ENCRYPTION_KEY}
            - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${ENCRYPTION_KEY}
            - XPACK_REPORTING_ENCRYPTIONKEY=${ENCRYPTION_KEY}
        mem_limit: ${KB_MEM_LIMIT}
        healthcheck:
            test:
                [
                "CMD-SHELL",
                "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
                ]
            interval: 10s
            timeout: 10s
            retries: 120
    streamlit:
        depends_on:
            kibana:
                condition: service_healthy
        build:
            dockerfile: .\dockerfile-streamlit
            context: .\
        ports:
            - ${STREAMLIT_PORT}:8501
        volumes:
            - certs:/usr/src/app/certs
            - '.\data:/usr/src/app/data:delegated'
            - '.\project\streamlit:/usr/src/app/streamlit:delegated'
            - '.\.streamlit:/usr/src/app/.streamlit:delegated'

The first few lines take care of setting up "volumes", which are essentially data drives that store information for Docker to use, "networks", which are internal or external networks that Docker can use, and "services", which are the containers.

As you can see, my current Docker implementation consists of 3 Elastic containers (a setup container, an Elasticsearch container, and a Kibana container), and 2 Python containers (a Streamlit container, and a FastAPI container).

Elastic Containers

Funnily enough, the Elastic containers were quite easy to set up because of a great article by my contact for this project -- the man himself: Eddie. Check it out here!

For the most part, I followed this guide and added additional pieces to automate some settings, templates, etc. associated with this project.

NOTE:

The .env file in the project directory is very important here. It defines passwords, port numbers, names, etc. for use with the variables inside of the Docker Compose file. Be sure to set these variables before trying to set this up!

Setup Container

The setup container sets up passwords, creates certs, and places the Elastic D&D backend pipelines and templates.

setup:
        image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
        volumes:
            - certs:/usr/share/elasticsearch/config/certs
        user: "0"
        command: >
            bash -c '
                if [ x${ELASTIC_PASSWORD} == x ]; then
                    echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
                    exit 1;
                elif [ x${KIBANA_PASSWORD} == x ]; then
                    echo "Set the KIBANA_PASSWORD environment variable in the .env file";
                    exit 1;
                fi;
                if [ ! -f config/certs/ca.zip ]; then
                    echo "Creating CA";
                    bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
                    unzip config/certs/ca.zip -d config/certs;
                fi;
                if [ ! -f config/certs/certs.zip ]; then
                    echo "Creating certs";
                    echo -ne \
                    "instances:\n"\
                    "  - name: es01\n"\
                    "    dns:\n"\
                    "      - es01\n"\
                    "      - localhost\n"\
                    "    ip:\n"\
                    "      - 127.0.0.1\n"\
                    "  - name: kibana\n"\
                    "    dns:\n"\
                    "      - kibana\n"\
                    "      - localhost\n"\
                    "    ip:\n"\
                    "      - 127.0.0.1\n"\
                    > config/certs/instances.yml;
                    bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
                    unzip config/certs/certs.zip -d config/certs;
                fi;
                echo "Setting file permissions"
                chown -R root:root config/certs;
                find . -type d -exec chmod 750 \{\} \;;
                find . -type f -exec chmod 640 \{\} \;;
                echo "Waiting for Elasticsearch availability";
                until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
                echo "Setting kibana_system password";
                until curl -s -X POST --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_ingest/pipeline/add_timestamp -d "{\"description\":\"Pipeline to automatically add @timestamp to incoming logs.\",\"processors\":[{\"set\":{\"field\":\"@timestamp\",\"value\":\"{{_ingest.timestamp}}\",\"ignore_empty_value\":true,\"ignore_failure\":true}}]}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_ingest/pipeline/dnd-notes -d "{\"description\":\"Pipeline to manipulate dnd notes logs.\",\"processors\":[{\"pipeline\":{\"name\":\"add_timestamp\"}}]}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_component_template/dnd-notes -d "{\"template\":{\"mappings\":{\"dynamic\":\"true\",\"dynamic_date_formats\":[\"strict_date_optional_time\",\"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z\"],\"dynamic_templates\":[],\"date_detection\":true,\"numeric_detection\":false,\"properties\":{\"@timestamp\":{\"type\":\"date\",\"format\":\"strict_date_optional_time\"},\"finished\":{\"type\":\"boolean\"},\"message\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}},\"name\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}},\"session\":{\"type\":\"long\"},\"type\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":256}}}}}}}"
                curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_index_template/dnd-notes -d "{\"index_patterns\":[\"dnd-notes-*\"],\"template\":{\"settings\":{\"index\":{\"number_of_shards\":\"1\",\"number_of_replicas\":\"0\",\"default_pipeline\":\"dnd-notes\"}},\"mappings\":{\"_routing\":{\"required\":false},\"numeric_detection\":false,\"dynamic_date_formats\":[\"strict_date_optional_time\",\"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z\"],\"dynamic\":true,\"_source\":{\"excludes\":[],\"includes\":[],\"enabled\":true},\"dynamic_templates\":[],\"date_detection\":true}},\"composed_of\":[\"dnd-notes\"]}"
                echo "All done!";
            '
        healthcheck:
            test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
            interval: 1s
            timeout: 5s
            retries: 120

Elasticsearch Container

The Elasticsearch container creates an Elasticsearch node for storing data and connecting with Kibana, and will only begin when the Setup container is healthy.

es01:
        depends_on:
            setup:
                condition: service_healthy
        image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
        labels:
            co.elastic.logs/module: elasticsearch
        volumes:
            - certs:/usr/share/elasticsearch/config/certs
            - esdata01:/usr/share/elasticsearch/data
        ports:
            - ${ES_PORT}:9200
        environment:
            - node.name=es01
            - cluster.name=${CLUSTER_NAME}
            - discovery.type=single-node
            - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
            - bootstrap.memory_lock=true
            - xpack.security.enabled=true
            - xpack.security.http.ssl.enabled=true
            - xpack.security.http.ssl.key=certs/es01/es01.key
            - xpack.security.http.ssl.certificate=certs/es01/es01.crt
            - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
            - xpack.security.transport.ssl.enabled=true
            - xpack.security.transport.ssl.key=certs/es01/es01.key
            - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
            - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
            - xpack.security.transport.ssl.verification_mode=certificate
            - xpack.license.self_generated.type=${LICENSE}
        mem_limit: ${ES_MEM_LIMIT}
        ulimits:
            memlock:
                soft: -1
                hard: -1
        healthcheck:
            test:
                [
                "CMD-SHELL",
                "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
                ]
            interval: 10s
            timeout: 10s
            retries: 120

Kibana Container

The Kibana container creates a Kibana instance that connects to the Elasticsearch container, and allows users to view their notes.

kibana:
        depends_on:
            es01:
                condition: service_healthy
        image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
        labels:
            co.elastic.logs/module: kibana
        volumes:
            - certs:/usr/share/kibana/config/certs
            - kibanadata:/usr/share/kibana/data
        ports:
            - ${KIBANA_PORT}:5601
        environment:
            - SERVERNAME=kibana
            - ELASTICSEARCH_HOSTS=https://es01:9200
            - ELASTICSEARCH_USERNAME=kibana_system
            - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
            - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
            - XPACK_SECURITY_ENCRYPTIONKEY=${ENCRYPTION_KEY}
            - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${ENCRYPTION_KEY}
            - XPACK_REPORTING_ENCRYPTIONKEY=${ENCRYPTION_KEY}
        mem_limit: ${KB_MEM_LIMIT}
        healthcheck:
            test:
                [
                "CMD-SHELL",
                "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
                ]
            interval: 10s
            timeout: 10s
            retries: 120

Python Containers

Both of these containers are built with dockerfiles, which allows for more control of container creation in this instance; especially since we have Python dependencies and have to run the programs with a command.

NOTE:

The dockerfiles in the project directory are very important here. They handle all of the container setup for the applications.

Streamlit Container

The Streamlit container hosts and runs the application. I am still working on exposing the application to a public IP address. Getting this piece right will be essential for D&D groups that are not on the same network.

streamlit:
        depends_on:
            kibana:
                condition: service_healthy
        build:
            dockerfile: .\dockerfile-streamlit
            context: .\
        ports:
            - ${STREAMLIT_PORT}:8501
        volumes:
            - certs:/usr/src/app/certs
            - '.\data:/usr/src/app/data:delegated'
            - '.\project\streamlit:/usr/src/app/streamlit:delegated'
            - '.\.streamlit:/usr/src/app/.streamlit:delegated'

Streamlit Dockerfile

The Streamlit dockerfile handles installation of Python dependencies, creating directories, and running the application command.

###############
# BUILD IMAGE #
###############
FROM python:3.8.2-slim-buster AS build

# set root user
USER root

# virtualenv
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# add and install requirements
RUN pip install --upgrade pip
COPY ./requirements-streamlit.txt .
RUN pip install -r requirements-streamlit.txt

#################
# RUNTIME IMAGE #
#################
FROM python:3.8.2-slim-buster AS runtime

# create app directory
RUN mkdir -p /usr/src/app

# copy from build image
COPY --from=build /opt/venv /opt/venv

# set working directory
WORKDIR /usr/src/app

# disables lag in stdout/stderr output
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

# Path
ENV PATH="/opt/venv/bin:$PATH"

# Run streamlit
CMD streamlit run streamlit/main.py

FastAPI Container

The FastAPI container hosts and runs the API code.

api:
        depends_on:
            es01:
                condition: service_healthy
        build:
            dockerfile: .\dockerfile-api
            context: .\
        ports:
            - ${API_PORT}:8000
        volumes:
            - '.\data:/usr/src/app/data:delegated'
            - '.\project\api:/usr/src/app/api:delegated'

FastAPI Dockerfile

The FastAPI dockerfile handles installation of Python dependencies, creating directories, and running the Python command.

###############
# BUILD IMAGE #
###############
FROM python:3.8.2-slim-buster AS build

# set root user
USER root

# virtualenv
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# add and install requirements
RUN pip install --upgrade pip
COPY ./requirements-api.txt .
RUN pip install -r requirements-api.txt

#################
# RUNTIME IMAGE #
#################
FROM python:3.8.2-slim-buster AS runtime

# create app directory
RUN mkdir -p /usr/src/app

# copy from build image
COPY --from=build /opt/venv /opt/venv

# set working directory
WORKDIR /usr/src/app

# disables lag in stdout/stderr output
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

# Path
ENV PATH="/opt/venv/bin:$PATH"

# Run streamlit
CMD python3 api/main.py

Closing Remarks

This section is subject to change. More containers may be added (such as NGINX) as needs arise, especially since I am working with network-related tasks.

Next week, I will hopefully be covering my progress of exposing Kibana and Streamlit to my public IP, which allows use of the project by my entire D&D group. I am currently having trouble with Streamlit, so it may not go as planned.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

Elastic D&D - Update 5 - Audio Note Input

Joe — Fri, 22 Sep 2023 13:29:34 +0000

Last week we talked about the text note input tab. If you missed it, you can check that out here!

Coding the Audio Note Input Tab

This tab was conceptualized with AI and data vectorization in mind. I wanted a way to take the recordings from a session, transcribe audio to text, and index that whole object in Elastic. This data would then give us more data for when we start asking the Virtual DM questions.

The tab is a single form that appears when you select "Audio" from the log type select box.

Much like the text note input tab, the goal of the form is to get enough relevant data to form a JSON payload to send to Elastic for indexing. From there your notes are stored and you are able to search them.

NOTE:

Using forms is really nice because the data in the widgets is removed once you hit submit. This saves you from having to manually remove your note that you just typed. However, this only removes that data from the GUI, not the session state.

I used a combination of the "text_form_variable_list" list and the "clear_session_state" function to maintain a clear session state for every note. If this wasn't in here, you may have lingering data in variables and your notes wouldn't be as accurate.

def app_page2_audio():
    # displays audio note form and widgets
    import json

    #list of variables to clear from session state once finished
    audio_form_variable_list = ["log_type","log_session","file","submitted","transcribed_text","log_payload"]

    # displays note form widgets, creates note payload, sends payload to an Elastic index, and handles error / success / warning messages
    with st.form("audio_form", clear_on_submit=True):
        st.session_state["log_type"] = "audio"
        st.session_state["log_session"] = st.slider("Which session is this?", 0, 250)
        st.session_state["file"] = st.file_uploader("Choose audio file",type=[".3ga",".8svx",".aac",".ac3",".aif",".aiff",".alac",".amr",".ape",".au",".dss",".flac",".flv",".m2ts",".m4a",".m4b",".m4p",".m4p",".m4r",".m4v",".mogg",".mov",".mp2",".mp3",".mp4",".mpga",".mts",".mxf",".oga",".ogg",".opus",".qcp",".ts",".tta",".voc",".wav",".webm",".wma",".wv"])
        st.session_state["submitted"] = st.form_submit_button("Upload file")
        if st.session_state.submitted and st.session_state.file is not None:
            st.session_state["transcribed_text"] = transcribe_audio(st.session_state.file)
            if st.session_state.transcribed_text is not None:
                st.session_state["log_payload"] = json.dumps({"session":st.session_state.log_session,"type":st.session_state.log_type,"message":st.session_state.transcribed_text})
                elastic_index_document("dnd-notes-transcribed",st.session_state.log_payload)
            else:
                error_message("Audio transcription failure")
        else:
            st.warning('Please upload a file and submit')

    # clears session state
    clear_session_state(audio_form_variable_list)

Audio Form

As mentioned above, this is the only form for this log type. It has the user input a session number and upload a file. Once submitted, a function turns audio into text, which helps build the JSON payload for indexing into Elastic.

Transcribe Audio Function

This function is the workhorse of the audio note input tab and makes use of AssemblyAI. It takes the file that was uploaded and makes an API call to get the file URL, takes the file URL and makes an API call to gets the transcribe ID, and polls until the status is completed. Once the status is completed, the function returns the text object.

def transcribe_audio(file):
    # transcribes an audio file to text
    import requests

    # get file url
    headers = {'authorization':assemblyai_api_key}
    response = requests.post('https://api.assemblyai.com/v2/upload',headers=headers,data=file)
    url = response.json()["upload_url"]
    # get transcribe id
    endpoint = "https://api.assemblyai.com/v2/transcript"
    json = {"audio_url":url}
    headers = {"authorization":assemblyai_api_key,"content-type":"application/json"}
    response = requests.post(endpoint, json=json, headers=headers)
    transcribe_id = response.json()['id']
    result = {}
    #polling
    while result.get("status") != "processing":
        # get text
        endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
        headers = {"authorization":assemblyai_api_key}
        result = requests.get(endpoint, headers=headers).json()

    while result.get("status") != 'completed':
        # get text
        endpoint = f"https://api.assemblyai.com/v2/transcript/{transcribe_id}"
        headers = {"authorization":assemblyai_api_key}
        result = requests.get(endpoint, headers=headers).json()

    return result['text']

Related Functions

def clear_session_state(variable_list):
    # deletes variables from streamlit session state
    for variable in variable_list:
        try:
            del st.session_state[variable]
        except:
            pass

def elastic_index_document(index,document):
    # sends a document to an Elastic index
    from elasticsearch import Elasticsearch

    # creates Elastic connection
    client = Elasticsearch(
        elastic_url,
        ca_certs=elastic_ca_certs,
        api_key=elastic_api_key
    )

    # sends document to index with success or failure message
    response = client.index(index=index,document=document)

    if response["result"] == "created":
        success_message("Note creation successful")
    else:
        error_message("Note creation failure")

    # close Elastic connection
    client.close()

def error_message(text):
    # displays error message
    import time

    error = st.error(text)
    time.sleep(1)
    error.empty()

Closing Remarks

This section is subject to change. As I continue working on making the project more accessible, I may migrate this to a service that doesn't require putting a few dollars into it every month. It is worth it to me and my use case, but I think it's worth trying to make everything completely free.

Next week, I will be covering the process of moving this project to Docker. I did so to make it much easier for people to set this up for their own D&D groups and I think it is worth talking about.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe