DEV Community: Shanelle

Testing GPT-4 Vision with Lightrail

Shanelle — Wed, 15 Nov 2023 00:03:26 +0000

Intro

With the recent Developer Day, OpenAI has announced several exciting updates to ChatGPT and its sister products. In addition to faster APIs and the GPT App Store, they also updated the GPT-4 model so that it can take images as input. Before, GPT-4 was a purely text model, whereas now it can take text and images as input and output text in response. As you can imagine, this opens up a variety of exciting applications for its users.

Today, I wanted to walk through some interesting applications of GPT-4 Vision with Lightrail (an open-source Desktop app that I’ve been contributing to). It’s an open-source AI command center for developers. It has an always-on ChatGPT instance and integrates with apps like Jupyter, VScode, & Chrome. I’ve personally found it an easier way to access an LLM and automatically give it the necessary context in my queries without constantly copying & pasting into ChatGPT.

Generate Code from Screenshots

With the plethora of coding assistants on the market today, you can use LLMs to generate code based on text instructions. For example, you can instruct Github Copilot to create a SignUp Button in React or bootstrap a new microservice for an enterprise-scale codebase. However, if you want to generate a clone of a screenshot or replicate the color & component styling of one of your favorite websites, it was previously difficult to give ChatGPT or the other coding assistants the necessary context.

Now, with GPT-4 Vision and Lightrail, you can take a screenshot and describe what you would like to build or how you would like to modify the screenshot, and Lightrail can generate the relevant code right in your VSCode editor. The video below shows how a freelance frontend engineer might use Lightrail to tweak the styling of a React component based on a client request from a screenshot.

Ask Q’s about an Image

Beyond using Lightrail’s vision feature to generate code, you can feed in images and ask questions. Think of it as a more powerful Google Lens that can be accessed cross-application. Some of my favorite applications include analyzing charts & graphs, interpreting content in a PDF, or providing feedback on presentations / UX mockups.

What’s next?

With the latest OpenAI release, it feels like every day we’re getting closer to a future of persistent AI assistants that can ingest and interpret content in a similar manner to the way we do. Imagine having a helpful voice on your shoulder as you navigate the digital world. With Lightrail, you can already access content from across your apps and save long-form text content to a local vectorDB to create a long-term memory for your LLM.

However, I’d be interested to see how technology continues to develop for multi-modal AI. I’d love to be able to access a longterm memory where I could search semantically across images, text, & video — all with equal accuracy to textual content. Similarly, right now, GPT-4 can use images as input, but it would be amazing if it could output both images & text.

Anyways, thanks so much for reading. If you’re interested in testing out Lightrail, you can download it here!

Creating a Second Brain with Lightrail

Shanelle — Fri, 03 Nov 2023 19:17:11 +0000

Ever since OpenAI emerged, ChatGPT has been my go-to for various tasks like drafting emails, planning trips, and generating and debugging code. However, its memory is confined to each session in the chat window. I'm personally interested in creating a long-term memory system where I can add notes, papers, websites, etc., and then seamlessly search across them — a concept often referred to online as a second brain. There are intriguing products like Mem.ai and Dot (which caught my eye on HN this week), but I steer clear of tools that host my data due to privacy concerns and the potential for losing access if they change prices or product direction.

Lightrail

I've been actively contributing to an open-source tool called Lightrail. It combines an always-on GPT-4 instance with seamless integration across various apps like Google Chrome and VSCode. However, my favorite new feature is the ability to create a personal knowledge base. I can add any text content, and Lightrail indexes and stores the information in an on-device vectorDB for use in future queries. It's open-source and local-first, making it a personal second brain that’s searchable with GPT-4. While I primarily use it for dev-focused tasks rather than miscellaneous notes, I thought it might be worth sharing a couple of my favorite hacks.

Indexing Dev Documentation

Like many on here, I use ChatGPT for coding-related tasks. However, I've found that Lightrail is particularly helpful whenI want to implement an obscure library or leverage documentation updated post the GPT-4 cutoff date (January 2022). I can crawl through the latest docs and use them to generate code. For instance, by providing Lightrail with the base URL to the Supabase docs, it not only indexes the intro page but also all the backlinks to ingest the entire documentation.

Then, I can reference this knowledge base to implement Supabase authentication in my VSCode project.

Simple Code Search

I've also crafted a simple semantic code search. By feeding all the files in a local project into my knowledge base, I can use the [Send to AI] command to ask questions like where payments are implemented or explain how a specific code snippet works in the context of the overall project—eliminating the need for continuous copy-pasting of files into ChatGPT.

Saving & Summarizing

In line with the second brain concept, I've started using Lightrail to save web pages, articles, and papers. Any long-form content becomes part of my knowledge base. I can then use Lightrail to summarize or ask questions. The token kb.relevant-content dynamically pulls in the pertinent information from the knowledge base to answer my queries.

While it's early days for Lightrail, I'm excited about the promise of on-device personal AI companions. In an industry that is evolving day by day, I hold out hope for a more local-first and open-source approach to prevail. If you're interested in giving Lightrail a try, you can download it here!

Reviewing AI Code Search Tools

Shanelle — Thu, 28 Sep 2023 16:24:44 +0000

Introduction

Over the past year, there’s been an explosion of AI-assisted coding tools — Github Copilot, Codeium, and Cursor. Large language models (LLMs) have been applied for a variety of use cases for dev tools — debugging, generating code, data analysis, and more. However, one application that I’ve been particularly interested in is code search, particularly over large enterprise-scale codebases. For me personally, the most annoying part of coding has always been onboarding onto a new codebase and keeping track of the system in my head to figure out where to make my desired changes. I’ve seen a bunch of articles comparing the different code assistants, but I was curious whether any of the new AI-first code search tools were an improvement over plain old Github search.

In this blog post, I’ll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). I’ll be evaluating them along the dimensions of user-friendliness as well as their accuracy.

Why Code Search Differs from Code Generation

However, before I delve into the comparison, let me quickly touch on why code search is different than code generation.

Code search requires effectively retrieving all of the relevant files/snippets, rather than generating one of many correct answers. If you were to ask ChatGPT to generate a Tic-Tac-Toe game, many different versions of code will produce a functioning game. On the other hand, if you’re searching a codebase for “all of the files that interface with the database,” you would expect the tool to exhaustively retrieve all of the relevant files/snippets of code.

In addition, code search tools need to index the entire codebase to effectively search and retrieve relevant snippets of code. If you were to type into ChatGPT, “implement Supabase authentication,” ChatGPT leverages all of the lines of code used in training to generate the implementation code. On the other hand, for search, you need to index the entire codebase to effectively extract the correct snippets from a project. Including the entire codebase in the prompt is not an option for most non-hobby projects. Currently, the longest context window for an LLM is approximately 100,000 tokens (courtesy of Anthropic). To put this into perspective within a codebase context, companies like Square and Google have millions of lines of code. To get around this, most code search tools will pre-index the codebase with a vector DB to quickly find the relevant code snippets.

SeaGOAT

This was an open-source repo I saw trending on Hacker News. It seemed promising — a local-first semantic search engine. While the other code search tools are proxying OpenAI’s or other LLM providers’ servers, I liked that this was running the model fully locally on my machine. However, I found that the semantic search did a poor job of pulling relevant snippets. For example, for the query “Where do I implement authentication?”, it pulled everything from a correct payment-wrapper file to random library files. Ultimately, while I appreciated the fact that it was local-first and open-source, I found that its precision was too poor to be useful — each query returned pages of irrelevant content where 3 files/snippets would have been more helpful.

Cody

This is a new product by the late-stage code search startup Sourcegraph. I found the setup to be a little bit confusing — I had to download a VSCode extension and a separate Desktop app, as well as sign up for a Sourcegraph account. In the Desktop app, I could then select which Github repos (public or private) to index.

In terms of accuracy, it did a good job of both understanding my queries & extracting relevant snippets (e.g. correctly identifying the relevant components when asked “Where is the Stripe integration for payments”).

However, it falls prey to the classic LLM hallucination issue — when asked where the code checks if a user has signed in for checkout, it responds with a snippet that does not exist. Similarly, when asked for all of the relevant files that interface with a database, it hallucinates several files that do not exist in the project.

Bloop

From a UX perspective, this was my favorite tool — it had a simple onboarding experience (OAuth through Github) and also linked the relevant files in its responses so I could easily expand beyond the response to delve into the code. While it did not have a VSCode extension, I found the Desktop app easy enough to use. I also found that it did a much better job at not hallucinating responses — correctly identifying the list of files that interface with the database as well as that there were no user sign-in checks during the checkout process.

Conclusion

For me, my favorite was Bloop — it did a better job of not hallucinating, and I preferred the UX. As an aside, I found that using this in combination with Github Copilot or Perplexity was a little bit frustrating — I’m constantly switching between different tools for the same project. I keep wishing for one tool that can wrap all of this functionality and context within one entry point. I've been exploring some of these ideas in Lightrail, but it doesn't include code search (as we discussed, that's a more complex challenge than code generation). Nevertheless, I'm excited about the possibilities in the field. Thanks for reading so far - I'd love to hear your thoughts and ideas on this!