DEV Community: Reaminated

Run ChatGPT-Style Questions Over Your Own Files Using the OpenAI API and LangChain!

Reaminated — Sat, 22 Apr 2023 17:55:43 +0000

Introduction

I'm sure you've all heard about ChatGPT by now. It's an amazing Large Language Model (LLM) system that is opening up new and exciting innovative capabilities. However, it's been trained over a huge corpus of text from across the internet but what if you want to query your own file or files? Thanks to the simple (but powerful!) OpenAI API and the amazing work done by the team at LangChain, we can knock up a basic Question and Answering application that answers questions from your files. This is all very new technology so I'm also learning as I go along and am always open to hearing feedback and improvements I can make - feel free to comment!

The goal of the article is to get you started with Question and Answering your own document(s). However, as described in the Improvements section below, various aspects can be optimised. If there's enough interest, I can go into more detail about those topics in future articles.

Sound good? Let's get to it! (Full code is on my GitHub)

High-Level Steps

Set up our development environment, API Key, and dependencies
Load in our file or directory containing multiple files
Create and persist (optional) our database of embeddings (will briefly explain what they are later)
Set up our chain and ask questions about the document(s) we loaded in

Prerequisites

You'll need an OpenAI API Key (I recommend putting a hard limit on pricing so you don't accidentally go over, especially when experimenting with code (you may automatically get free credit for new users but I've had my account for more than 3 months so those credits expired for me). You can also use the OpenAI calculator to estimate costs - we'll be using gpt-3.5-turbo model in this article)
Developer Environment (of course). I'm using OpenAI Python SDK, LangChain, and VS Code for the IDE. The requirements.txt file is available in the GitHub repo.
A file or files to test with. I recommend starting with a single file to test with. As someone with a quant fund background and using this for trading information, I'll be using the Microsoft Q2 FY23 Earnings Call Transcript (from this page)

Set up the OpenAI Key

If you haven't done so already, create an account over at OpenAI
(Optional but recommended) - Go to Billing...Usage Limits... and set your Soft and Hard limits. I used £10 but feel free to use whatever you're comfortable with. This prevents you from overspending more than you expected, especially useful when prototyping and experimenting with the API
If you haven't got free credits, you may need to enter your payment details to gain access
Head over to the API Keys section and generate a new secret - Copy this secret before closing the window otherwise you won't get a chance to see it in full again

When dealing with API keys and secrets, I like to use environment variables for security. So in your directory, create a file called ".env" (note the full-stop/period at the beginning)

In the .env file, type OPENAI_API_KEY = '<your secret key from above>'

# [.env file]
OPENAI_API_KEY = 'sk-....' # enter your entire key here

If you're using Git, create a .gitignore file and add ".env" in the file as we don't want to commit this to our repo on accident and leak our secret key! I've also added "db/" which will be our database folder. I don't want to commit the database which could contain personal document data so ensuring that doesn't get committed either.

# [.gitignore file]
.env # This will prevent the .env file from being commmitted to your repo
db/ # This will be our database folder. I don't want to commit it so adding here

Install all the required dependencies. Download the requirements.txt file from here and run

pip3 install -r requirements.txt

alternatively, you can manually use pip to install the dependencies below:

chromadb==0.3.21
langchain==0.0.146
python-dotenv==1.0.0

Let's open our main Python file and load our dependencies. I'm calling the app "ChatGPMe" (sorry, couldn't resist the pun...😁) but feel free to name it what you like. In this article, I have removed the type annotations for clarity but the GitHub version contains the strongly typed version (I think it's good practice to add strong typing to Python code, I miss it from C#!)

# dotenv is a library that allows us to securely load env variables
from dotenv import load_dotenv 

# used to load an individual file (TextLoader) or multiple files (DirectoryLoader)
from langchain.document_loaders import TextLoader, DirectoryLoader

# used to split the text within documents and chunk the data
from langchain.text_splitter import CharacterTextSplitter

# use embedding from OpenAI (but others available)
from langchain.embeddings import OpenAIEmbeddings

# using Chroma database to store our vector embeddings
from langchain.vectorstores import Chroma

# use this to configure the Chroma database  
from chromadb.config import Settings

# we'll use the chain that allows Question and Answering and provides source of where it got the data from. This is useful if you have multiple files. If you don't need the source, you can use RetrievalQA
from langchain.chains import RetrievalQAWithSourcesChain

# we'll use the OpenAI Chat model to interact with the embeddings. This is the model that allows us to query in a similar way to ChatGPT
from langchain.chat_models import ChatOpenAI

# we'll need this for reading/storing from directories
import os

You may notice that many of the LangChain libraries above end in the plural. This is because LangChain is a framework for apps powered by language models so it allows numerous different chains, database stores, chat models and such, not just OpenAI/ChatGPT ones! This opens up huge possibilities for running offline models, open-source models and other great features.

We'll load the .env file using dotenv. This library makes it easier and more secure to work with environment files to help secure secret keys and such. You could hardcode the API key directly in your file but this way is more secure and generally considered good practice.

# looks for the .env file and loads the variable(s) 
load_dotenv()

Excellent, we now have our dependencies and API key set up, let's get to the fun bit!

Load the Files and Embeddings

This is optional but I found it worthwhile. By default, if you don't persist the database, it will be transient which means that the database is deleted when your program ends. Your documents will have to be analysed every time you run the program. For a small number of files, it's fine, but can quickly add to the loading time if you need to analyse multiple files every time you run the app. So let's create a couple of variables we'll use to store the database in a folder.

# get the absolute path of this Python file
FULL_PATH = os.path.dirname(os.path.abspath(__file__))

# get the full path with a folder called "db" appended
# this is where the database and index will be persisted
DB_DIR = os.path.join(FULL_PATH, "db")

Let's load in the file we want to query. I'm going to query the Microsoft's Earnings Call transcript from Q2 2023 but feel free to load whatever document(s) you like.

# use TextLoader for an individual file
# explicitly stating the encoding is also recommmended
doc_loader = TextLoader('MSFT_Call_Transcript.txt', encoding="utf8")

# if you want to load multiple files, place them in a directory 
# and use DirectoryLoader; comment above and uncomment below
#doc_loader = DirectoryLoader('my_directory')

# load the document
document = doc_loader.load()

I'll only be using TextLoader but the syntax is the same for DirectoryLoader so you can do a drop-in replacement with the load() method.

We've loaded the files but now we need to split the text into what's called chunks. Essentially, chunking allows you to group words into "chunks" to allow more meaning to a sentence. For example, the sentence below in the context of a football (soccer) game:

"The striker scored a goal in the final minute of the game."

One possible way to chunk this sentence is:

Chunk 1: "The striker"
Chunk 2: "scored"
Chunk 3: "a goal in the final minute"
Chunk 4: "of the game"

However, notice that Chunk 3 and Chunk 4 share the words "final minute" contextually. This is an example of chunk overlap. While this chunking still conveys the essential information of the sentence, it is not as precise as it could be. A better way to chunk the sentence would be:

Chunk 1: "The striker"
Chunk 2: "scored"
Chunk 3: "a goal"
Chunk 4: "in the final minute"
Chunk 5: "of the game"

In this revised version, there is no overlap between the chunks, and each chunk conveys a more distinct and specific idea. Ideally, when you chunk, you choose values that prevent chunk overlap. However, chunking is a whole topic of its own so will leave it there. If you want to find out more, you can search for chunking in Natural Language Processing (NLP) where good chunking is critical to the optimum usage of NLP models.

So, with the quick chunking detour above, let's split our document with 512 as a chunk size and 0 as the overlap - feel free to play with these depending on your document.

# obtain an instance of the splitter with the relevant parameters 
text_splitter = CharacterTextSplitter(chunk_size=512 , chunk_overlap=0)

# split the document data
split_docs = text_splitter.split_documents(document)

We now want to load the OpenAI embeddings. An embedding is essentially converting language as we use it to numerical values (vectors) so that a computer understands the words and their relationship to other words. Words with similar meanings will have a similar representation. Like chunking, Embedding is a huge topic but here's a nice article on Word2Vec which is one way to create word embeddings. Let's get back on track with using embeddings created by OpenAI.

# load the embeddings from OpenAI
openai_embeddings = OpenAIEmbeddings()

Simple! Let's now create our Chroma database to store these embeddings. Chroma was written from the ground up to be an AI-native database and works well with LangChain to quickly develop and iterate AI applications.

We'll start by configuring the parameters of the database

# configure our database
client_settings = Settings(
    chroma_db_impl="duckdb+parquet", #we'll store as parquet files/DuckDB
    persist_directory=DB_DIR, #location to store 
    anonymized_telemetry=False # optional but showing how to toggle telemetry
)

Now let's create the actual vector store (i.e. the database storing our embeddings).

# create a class level variable for the vector store
vector_store = None

# check if the database exists already
# if not, create it, otherwise read from the database
if not os.path.exists(DB_DIR):
    # Create the database from the document(s) above and use the OpenAI embeddings for the word to vector conversions. We also pass the "persist_directory" parameter which means this won't be a transient database, it will be stored on the hard drive at the DB_DIR location. We also pass the settings we created earlier and give the collection a name
    vector_store = Chroma.from_documents(texts, embeddings,  persist_directory=DB_DIR, client_settings=client_settings,
                    collection_name="Transcripts_Store")

    # It's key to called the persist() method otherwise it won't be saved 
    vector_store.persist()
else:
    # As the database already exists, load the collection from there
    vector_store = Chroma(collection_name="Transcripts_Store", persist_directory=DB_DIR, embedding_function=embeddings, client_settings=client_settings)

We now have our embeddings stored! The final step is to load our chain and start querying.

Create the Chain and Query

LangChain, as the name implies, has main chains to use and experiment with. Chains essentially allow you to "chain" together multiple components, such as taking input data, formatting it to a prompt template, and then passing it to an LLM. You can create your own chains or, as I'm doing here, use pre-existing chains which cover common use cases. For our case, I'm going to use RetrievalQAWithSourcesChain. As the name implies, it also returns the source(s) used to obtain the answer. I'm doing this to show that the demo you see above is only using my document and not reaching out to the web for answers (shown by the Google question at the end).

# create and configure our chain
# we're using ChatOpenAI LLM with the 'gpt-3.5-turbo' model
# we're setting the temperature to 0. The higher the temperature, the more 'creative' the answers. In my case, I want as factual and direct from source info as possible
# 'stuff' is the default chain_type which means it uses all the data from the document
# set the retriever to be our embeddings database
qa_with_source = RetrievalQAWithSourcesChain.from_chain_type(
     llm=ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo'),
     chain_type="stuff",     
     retriever = vector_store.as_retriever()
    )

There are currently four chain types but we're using the default one, 'stuff', which uses the entire document in one go. However, other methods like map_reduce can help with batching documents so you don't surpass token limits but that's a whole other topic.

We're almost there! Let's create a quick function which handles the answering of the question and then create a loop for the user to ask questions to the document.

# function to use our RetrievalQAWithSourcesChain
def query_document(question):
    response = qa_with_source({"question": question})

# loop through to allow the user to ask questions until they type in 'quit'
while(True):
    # make the user input yellow using ANSI codes
    print("What is your query? ", end="")
    user_query = input("\033[33m")
    print("\033[0m")
    if(user_query == "quit"):
        break
    response = query_document(user_query)
    # make the answer green and source blue using ANSI codes
    print(f'Answer: \033[32m{response["answer"]}\033[0m')
    print(f'\033[34mSources: {response["sources"]}\033[0m')

And that's it! Hope that starts you what is an exciting field of development. Please feel free to comment and provide feedback.

Improvements

This is just the tip of the iceberg! For me personally, automating and running this with preset prompts across transcripts from various companies can provide good insights to help with trading decisions. For those interested in the financial/trading aspects of AI, you might like to read my short post on BloombergGPT. There is so much potential for alternative data and fundamentals analysis, it's a very exciting field. However, outside of that, it's also useful for your own personal files and organisation/searching and almost limitless other possibilities!

Specifically, there are several improvements to be made, here are a few:

Offline - This is a big one and maybe a topic for another blog if there's interest. Your data is still sent to OpenAI unless you opt-out or use the Azure version of the API which has a more strict usage policy for your data. A great open-source project called Hugging Face has numerous models and datasets to get your AI projects up and running. LangChain also supports Hugging Face so you could start experimenting with using offline Hugging Face models with LangChain to run everything without the internet or API costs
Automate - Individually querying is useful but some situations may require a large number of actions or sequential actions. This is where AutoGPT can come in,
Chunking - I've hardcoded 512 and you may have seen messages saying that some of the chunking surpassed that. An improvement would be to use more dynamic chunking numbers tailored to the input documents
Token management and Prompt Templates - tokens are key to the API and you can optimise them such that you don't waste unnecessary tokens in your API call and still get the same results. This saves you money as you're using less of the limit and also allows more tailored prompts to provide better answers.

As I say, many more features can be explored but this was my first foray into trying to utilise OpenAI models for my personal documents and trading data. A lot of documentation, bug tickets, and workaround reading was involved so I hope I've saved you some time!

The full code can be found on my GitHub

Enjoy :)

Are your PDFs Actually Redacted? Double Check!

Reaminated — Sat, 08 Apr 2023 16:34:25 +0000

Introduction

PDF is a very common file format for documents but one thing people may not realise is that the complexity the format provides can mean some things aren't quite as they seem. Specifically, I've seen numerous "redacted" documents where the user has drawn over the text and saved the file. However, under the hood, PDFs have layers which means that the block or marker you used is actually may actually be saved as a separate layer from the text it's covering. This means anyone who opens the file can simply move the block away and see the text that was underneath!

Drawing over the text that should be redacted:

After saving the file as "Seemingly_Redacted.pdf", and opening it in Acrobat, a user can easily remove the line to expose the text that should be hidden:

Easy Fix

Adobe Acrobat Pro (paid versions) supports proper redacting of text but it can be too expensive to buy for people to justify their use, depending on how often you used advanced features. Fortunately, the free version of Adobe Acrobat can help with that.

Instead of saving the PDF by either using the "Save" option or "Save As" option, go to "File...Print" in Adobe Acrobat. Set the printer to "Microsoft Print to PDF" (or a similar PDF-related name). If you have comments that you would like appended within the document, you can click the "Summarize Comments" button:

Click "Print" and it should ask to save the file to a location and once you choose a location, it will create a PDF file that is now flattened. I recommend creating a new final file, instead of overwriting the original version, in case further edits need to be made or data went missing during flattening. That's it! The text under your drawing is no longer visible by moving the line or block out of the way. This also means nothing can be edited, so it's advisable to use it once you have the final version you want to send as opposed to constantly flattening it after every edit.

Things to be aware of

Flattening PDFs can cause a decrease in quality but generally should retain a high enough quality for screen usage.

Flattening a PDF file can sometimes lead to errors or inconsistencies in the document, especially if there are complex graphics. Always proofread after to ensure everything looks as expected.

Finally, flattening can also affect OCR and accessibility tools making them harder to use. Consider your audience and requirements for flattening.

However, for a lot of use cases, the downsides to flattening are outweighed by the pros. I had to use it to send a bank statement for proof of address but there was no need for the company to be able to see all my transactions on the statement.

The paid version of Acrobat, and other readers, feature more advanced levels of redaction and such so depending on your requirements, it may be better value to use that. Additionally, PDFs offer other security aspects such as encryption and password-protection but this post was specifically for ensuring text that's been drawn over stays hidden.

Advanced usage using ImageMagick

For those who want to automate this, you can use a command line too like ImageMagick. The following command flattens the PDF with a DPI of 300. You can toy around with the DPI value if you need to, the higher the value, the bigger the file size.

convert -density 300 -quality 100 input.pdf -flatten flattened_output.pdf

I hope that helps, feel free to leave any comments or questions below.

Thoughts on BloombergGPT and Domain Specific LLMs

Reaminated — Wed, 05 Apr 2023 19:39:04 +0000

Bloomberg announced BloombergGPT which looks incredible (direct link to the paper). I think this is a glimpse into the future of LLM models - the idea of domain-specific models, as suggested in the paper, to optimise processes and output.

What will be interesting to see in the finance/trading field is how democratised data insight through LLM, and the ease of access to a larger number of people, would affect the alpha.

On the one hand, having raw data analysed differently by different companies means that some companies may identify insights others haven't and could potentially trade on them, thus outperforming the market. On the other hand, it now potentially means that pressure is put on the rest of the trading pipeline, something that LLMs may not necessarily be able to do (?) (e.g. how quickly can your systems trade on the data based on your preferred horizons, how well can you update and run backtesters to work efficiently with LLM data, what's the optimal parameterization of a portfolio to trade with etc...)

I think this is a microcosm of the bigger effects of domain specific LLMs - they'll put pressure, and create new job roles and remits, on not just the data side that LLMs produce but the rest of the technical and business pipelines to optimise their functions to capitalise on the LLM data - e.g. imagine combining trained decision trees to generate specific prompts with LLMs to reduce hallucination or gather hidden insights.

LLMs being just one of many forms of AI models, it'll be interesting to see if other forms of AI models are implemented elsewhere in the system's execution path to collaborate with the new LLMs coming out to take full advantage. A lot of focus on data at the moment, which is critical, but the knock on effects on data utilisation research will be interesting to watch.

Discipline and Development

Reaminated — Thu, 30 Mar 2023 21:06:28 +0000

I posted a comment over on Reddit where someone was asking how people maintain discipline when developing games (although a lot of the theory could be applicable in other fields). I posted an answer with a few points and figured I’d post it here for future reference too.

I find discipline becomes easier when seeing progress so I always try to look back at what’s been achieved every few weeks, however small, and be proud of it. Not to make it sound like a feel-good wishy-washy thing but I think it’s easy to forget how much you’ve achieved relative to your skillset and resources when it’s so often compared to other people’s work (which may be further down the line).
If you still like the game idea, clearly something else is stopping you. Identify what it is. For me, as more of a programmer than an artist, testing the visual side of things seemed like a pain (setting up Unity, making sure camera is correct, finding an image to use etc..). To counter this, I’ve now created an ever-growing folder of assets I accumulate from the web and set up a prototype project for my game where I can quickly test ideas. This includes backgrounds, sprite-sheets, random images, sound effects, music etc… Therefore whenever I want to prototype anything, I can focus on the programming more than the environment setup.
Find out what your bottlenecks are during development. For me, my PC was slowing down too much and Unity would take a while to load up (didn’t take too long but when you’re not motivated enough anyway, it’s another excuse to ‘do it later’), which took me out my zone so I upgraded the RAM and installed an SSD drive. A faster machine makes so much of a difference to me when developing.
I now maintain a personal dev diary with screenshots. In order to be good at this, I customised my workflow to help take GIFs quickly without losing focus (wrote a post here).
The internet is probably the biggest distraction so I now make time at the end of the day to catch up on articles that look interesting but aren’t necessarily pertinent to what I’m doing at that time. I wrote a post here on how I did that.
Play to your mood. If you’re really motivated, work on the more 'boring’ things if possible such as non-visual/backend stuff. When you’re not in the mood, work on something that can give instant feedback which I usually find has visual aspects. The latter also works for when you want to share your work with others here or on Twitter or something and get feedback. (Of course, that’s just for me, you may find the backend aspects much more rewarding)

Essentially investing time up front on improving work flow and reducing friction is what got me out of a rut. Doing this felt productive, even if it was something simple as gathering assets for my prototype folder. Then it’s a positive cycle as you develop more, you become more self-motivated and I find it perpetuates - just be sure not to burn out and pace yourself; nothing wrong with taking up other hobbies IMHO.

The system’s not perfect and I still hit developer’s block but it’s the best thing I’ve found that works for me personally. I've also started to spend time on blogging (wrote my first dev.to post here) which I find helps become productive.

Transform Your Viewing Experience: How to Create an Immersive Ambient Monitor with Simple LED Lights and Code Magic

Reaminated — Thu, 30 Mar 2023 17:32:17 +0000

Introduction

For those not familiar with Ambient TVs, it’s a way to soften the jump from the edge of the TV screen and its immediate surroundings to provide a more immersive experience. I had some LED lights lying around and decided to see if it was possible to control the lights through code and, in turn, make my computer screen an ambient monitor. Whilst I wanted to use it for my monitor, it can be used anywhere and with whatever colours you can send it, including other features your lights may have such as audio reaction or random patterns. I’ve been meaning to write this post for a while as I’ve been using it on an earlier monitor but I never got round to adding it to my new monitor so I documented as I went along for anyone who might find it useful. So let’s get to it! (Please note, LED lights are likely to be Bluetooth Low Energy (BLE) so your computer will need to support BLE in order to interact with them). Full code is on GitHub.

High Level Steps

Find out what commands the LED light’s Bluetooth receiver accepts
Send commands to the LED lights via my computer’s Bluetooth
Obtain the dominant colour of the current screen
Send the dominant colour to the LED lights

Prerequisites

Bluetooth-supported RGB LED lights and accompanying app (I’m using Android, iOS would likely require an alternative approach than the one described here but should be possible using Wireshark directly to monitor Bluetooth traffic). I've attached these lights to the back of my monitor
Wireshark
Android’s SDK tools (specifically adb.exe)
Developer tools (I’ll be using Python 3.10, though any 3.x versions should work, but the principles should be the same whatever for language you prefer)
A device to send BLE commands from (e.g. a laptop that supports BLE)

Getting Bluetooth data

The first step we need to do is ensure that the app that comes with the lights is working as expected. This can easily be tested by running the light’s original app and making sure that the lights react accordingly depending on the on/off/lighting buttons you’re pressing on your app. We do this because we will shortly be pressing and detecting the specific codes sent to the Bluetooth receiver on the lights.

There are two approaches that I could take. One was to decompile the app’s JAR file and find the codes that were being sent, but I wanted to learn more about the Bluetooth protocol so opted to log all Bluetooth activity on my Android and extract it from there. Here’s how:

1) Enable Developer Options on your Android device

2) Enable Bluetooth HCI snoop log (HCI stands for Host-Controller Interface). You can find this option in Settings > System > Developer or search for it in settings as in the image below

3) We now need to perform specific actions so we can identify what each action sends to the light’s Bluetooth receiver. I’m going to keep it simple to On/Red/Green/Blue/Off, in that order, but if your lights support other features, you can toy around with those too.

4) Run the app and press On, Red, Green, Blue, and Off. It may also be useful to keep an eye on the approximate time to make it easier to filter if you have a lot of Bluetooth activity on your device.

5) Turn Bluetooth off so we don’t get any more noise. In the following steps, we’ll analyse the Bluetooth commands and, as we know the order of what we pressed, we can find out which values correspond to which button press.

6) We now need to access to the Bluetooth logs on the phone. There are several ways to do this, but I will generate and export a bug report. To do this, enable USB Debugging in the phone’s Settings, connect the phone to the computer, and use the adb.exe command line tool.

               adb bugreport led_bluetooth_report

7) This will generate a zip file on your computer’s local directory with the filename “led_bluetooth_report.zip”. You can specify a path if you prefer (e.g. C:\MyPath\led_bluetooth_report”)

8) Within this zip are the logs that we need. This may vary device to device (please comment if you found it elsewhere on your device). On my Google Pixel phone, it was in FS\data\misc\bluetooth\logs\btsnoop_hci.log

9) Now we have the log files, let’s analyse them! To do this, I decided to use Wireshark so start Wireshark and go to File...Open... and select the btsnoop_hci log file.

Whilst it may look daunting, let’s make it easy for ourselves to find what we’re looking for by filtering the BTL2CAP on 0x0004 which is the Attribute Protocol in the Wireshark source code. The attribute protocol defines the way two BLE devices talk to each other, so this is what we need to help find how the app talks to the lights. You can filter the logs in Wireshark by typing btl2cap.cid == 0x0004 in the “Apply a display filter” bar near the top and press Enter

Now we have filtered the log, it should make looking for the commands easier. We can look at the timestamps (Go to View…Time Display Format…Time of Day to convert the time if it’s the wrong format). We want to look at the Sent Write Command logs as those are the ones where we sent a value to the lights. Assuming your most recent time is at the bottom, scroll down to the last five events. These should be On, Red, Green, Blue, and Off in that order with Off being last.

Take note of the Destination BD_ADDR as we'll need that shortly and don on your Sherlock Holmes hat as this is where we need to unlock the pattern of how the colours and on/off commands are encoded within the message. This will vary depending on the light manufacturer but here’s the list of values I got for my device:

On: 7e0404f00001ff00ef
Red: 7e070503ff000010ef
Green: 7e07050300ff0010ef
Blue: 7e0705030000ff10ef
Off: 7e0404000000ff00ef

These are clearly hexadecimal values and if you look carefully, you’ll see there are some fixed patterns. Let’s split the patterns out as this should make things much clearer.

On: 7e0404 f00001 ff00ef
Red: 7e070503 ff0000 10ef
Green: 7e070503 00ff00 10ef
Blue: 7e070503 0000ff 10ef
Off: 7e0404 000000 ff00ef

For those familiar with hexadecimal values of pure red, green, and blue, you’ll know that the values are #FF000, #00FF00, and #0000FF respectively, which is exactly what we can see above. This means we now know the format to change the colours to whatever we want! (or at least to what the lights themselves are capable of). We can also see that On and Off have a different format from the colours and similar to each other with On having f00001 and Off having 00000.

That’s it! We now have enough information to start coding and interacting with the lights.

Connecting to LED lights

There are three key things we need:

The address of the device (this is the Destination BD_ADDR from above)
The values to send to the device (the hexadecimal values obtained above)
The characteristic we want to change. A Bluetooth LE characteristic is a data structure that essentially defines data that can be sent between a host and client Bluetooth devices. We need to find the characteristic (a 16-bit or 128-bit UUID) that refers to the lights. There are some commonly used assigned numbers that can be found here but unless the device conforms to those, they could be using a custom UUID. As my lights aren’t in the assigned numbers list, let’s find it via code.

I’m using Python 3.10 and Bleak 0.20.1. Ensure Bluetooth on your computer is turned on (no need to pair with the device, we’ll connect to it through code).

# Function to create a BleakClient and connect it to the address of the light's Bluetooth reciever
async def init_client(address: str) -> BleakClient:
    client =  BleakClient(address)  
    print("Connecting")
    await client.connect()
    print(f"Connected to {address}")
    return client

# Function we can call to make sure we disconnect properly otherwise there could be caching and other issues if you disconnect and reconnect quickly
async def disconnect_client(client: Optional[BleakClient] = None) -> None:
    if client is not None :
        print("Disconnecting")
        if characteristic_uuid is not None:
            print(f"charUUID: {characteristic_uuid}")
            await toggle_off(client, characteristic_uuid)
        await client.disconnect()
        print("Client Disconnected")
    print("Exited")

# Get the characteristic UUID of the lights. You don't need to run this every time
async def get_characteristics(client: BleakClient) -> None:
    # Get all the services the device (lights in this case) 
    services = await client.get_services() 
    # Iterate the services. Each service will have characteristics
    for service in services: 
        # Iterate and subsequently print the characteristic UUID
        for characteristic in service.characteristics: 
            print(f"Characteristic: {characteristic.uuid}") 
    print("Please test these characteristics to identify the correct one")
    await disconnect_client(client)

I’ve commented to the code so should be self-explanatory but essentially, we connect to the lights and find all the characteristics it exposes. My output was:

Characteristic: 00002a00-0000-1000-8000-00805f9b34fb
Characteristic: 00002a01-0000-1000-8000-00805f9b34fb
Characteristic: 0000fff3-0000-1000-8000-00805f9b34fb
Characteristic: 0000fff4-0000-1000-8000-00805f9b34fb

A quick Google of the first two UUIDs shows this refers to the name and appearance of the service which is irrelevant for us. However, the third and fourth seem the most suitable with the third (0000fff3-0000-1000-8000-00805f9b34fb) being the write characteristic according to this page.
Excellent, we now have the characteristic we need for this particular device to write to with a value (the colour hexadecimal).

Controlling LED lights

We finally have all the pieces this we need. At this stage, you can get creative with what colour input you’d like to use. You could, for example, connect the lights to a trading market API to change colours according to how your portfolio is doing. In this case, we want to make our monitors ambient aware, so we need to obtain the dominant colour of the screen and send that through.

There are many ways to do this so feel free to experiment with whatever algorithms you would like. One of the simplest approaches would be to iterate every X number of pixels across the screen and take an average while more complicated solutions would look for colours human eyes perceive to be more dominant. Feel free to comment any findings you’d like to share!

For the sake of this blog post, I’m going to keep it simple by using the fast_colorthief library’s get_dominant_color method.

'''
Instead of taking the whole screensize into account, I'm going to take a 640x480 resolution from the middle. 
This should make it faster but you can toy around depending on what works for you. You may, for example, want
to take the outer edge colours instead so it the ambience blends to the outer edges and not the main screen colour 
'''
screen_width, screen_height = ImageGrab.grab().size #get the overall resolution size 
region_width = 640
region_height = 480
region_left = (screen_width - region_width) // 2
region_top = (screen_height - region_height) // 2
screen_region = (region_left, region_top, region_left + region_width, region_top + region_height)

# Create an BytesIO object to reuse
screenshot_memory = io.BytesIO(b"")

# Method to get the dominant colour on screen. You can change this method to return whatever colour you like
def get_dominant_colour() -> str:
    # Take a screenshot of the region specified earlier
    screenshot = ImageGrab.grab(screen_region)
    '''
    The fast_colorthief library doesn't work directly with PIL images but we can use an in memory buffer (BytesIO) to store the picture
    This saves us writing then reading from the disk which is costly
    '''

    # Save screenshot region to in-memory bytes buffer (instead of to disk)
    # Seeking and truncating fo performance rather than using "with" and creating/closing BytesIO object
    screenshot_memory.seek(0)
    screenshot_memory.truncate(0)
    screenshot.save(screenshot_memory, "PNG") 
    # Get the dominant colour
    dominant_color = fast_colorthief.get_dominant_color(screenshot_memory, quality=1) 
    # Return the colour in the form of hex (without the # prefix as our Bluetooth device doesn't use it)
    return '{:02x}{:02x}{:02x}'.format(*dominant_color)

The code is commented so hopefully it should be clear as to what’s happening but we’re taking a smaller region of the screen from the middle then getting the dominant colour from that region. The reason I’m taking a smaller region is for performance; fewer pixels would need to be analysed.

We’re almost there! We now know what to send it and where to send it. Let’s finish the last major part of this challenge which is to actually send it. Fortunately, with the Bleak library, this is quite straightforward.

async def send_colour_to_device(client: BleakClient, uuid: str, value: str) -> None:
    #write to the characteristic we found, in the format that was obtained from the Bluetooth logs
    await client.write_gatt_char(uuid, bytes.fromhex(f"7e070503{value}10ef"))

async def toggle_on(client: BleakClient, uuid: str) -> None:
    await client.write_gatt_char(uuid, bytes.fromhex(ON_HEX))
    print("Turned on")

async def toggle_off(client: BleakClient, uuid: str) -> None:
    await client.write_gatt_char(uuid, bytes.fromhex(OFF_HEX))
    print("Turned off")

As we discovered from the logs, each colour has a fixed template so we can use f-strings to hardcode the common part and simply pass a hexadecimal of a colour for the value in the middle. This can be called from our loop. On and Off had unique hexademicals so I created individual functions and passed in a constant value that contained the relevant hex.

 while True: 
         # send the dominant colour to the device
         await send_colour_to_device(client, characteristic_uuid, get_dominant_colour())
         # allow a small amount of time before update
         time.sleep(0.1)

And there we have it, our Bluetooth LED lights are now controlled by the colours on the screen creating our own Ambient Monitor.

You can see the full code on GitHub which has a small amount of infrastructure code that wasn't specific to this post. I’ve tried to comment the code to be self-explanatory but feel free to ask any questions or make suggestions.

Hopefully this gives you an idea on how you can start getting creative with your LED lights.

Future Improvements

Multi-colour - My light strips can only have one colour at a time, but I have another set that can have four quadrants, each with their own colours. This means it could be possible to have sections of the monitor match four sections on screen giving an even more accurate ambience setting. Those lights run on Wifi instead of Bluetooth and could be a future project.
Brightness – to keep it simple, I just looked for the colour changing and the on and off commands. However, this can easily be improved by detecting the brightness control commands and throwing that into the colour algorithm.
Performance - As we want to get the lights to change in realtime, performance is critical. There are some complex algorithms to detect which colour would be considered the most dominant, especially when perceived by humans (which leads to a whole world of colour conversions). However, since this needs to run quite quickly, there needs to be a balance between performance and accuracy. A future improvement could be to try and access the graphics card directly to read from the buffer rather than analysing the pixels on the screen directly. If this is possible, you would also eliminate the time taken from the graphics buffer to the screen which could optimise the reaction of the lights.

Feel free to comment below if you have any feedback or questions.