<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aaron Ploetz</title>
    <description>The latest articles on DEV Community by Aaron Ploetz (@aploetz).</description>
    <link>https://dev.to/aploetz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F744821%2F368e3eea-905a-408f-9b40-60ecaf1e2e49.jpeg</url>
      <title>DEV Community: Aaron Ploetz</title>
      <link>https://dev.to/aploetz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aploetz"/>
    <language>en</language>
    <item>
      <title>Building a Weather App with a Raspberry Pi, Astra DB, and Langflow</title>
      <dc:creator>Aaron Ploetz</dc:creator>
      <pubDate>Fri, 14 Mar 2025 19:11:35 +0000</pubDate>
      <link>https://dev.to/datastax/building-a-weather-app-with-a-raspberry-pi-astra-db-and-langflow-1fdl</link>
      <guid>https://dev.to/datastax/building-a-weather-app-with-a-raspberry-pi-astra-db-and-langflow-1fdl</guid>
      <description>&lt;p&gt;To celebrate PI Day this year, we thought it would be fun to build something with a Raspberry Pi that uses Astra DB and/or Langflow. Fortunately, I have just the project in-mind: A weather application!&lt;/p&gt;

&lt;p&gt;To that end, our goal will be to use the National Weather Service’s (NWS) data API. Essentially, we will call this API to get the most recent weather data, store it in Astra DB, and display it on a simple front-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;p&gt;To build our project, we’re going to need a few things. First of all, our development environment will use the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Java 17&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Spring Boot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Maven&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vaadin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An Astra DB account with an active database and Langflow instance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And of course, we’ll also need a Raspberry Pi. For this project, we used a &lt;a href="https://www.canakit.com/canakit-raspberry-pi-5-starter-kit-turbine-black.html?srsltid=AfmBOoom-Ao7vUEyjsHw82_uCTXSI9467dsXh8lqSu9nGP2zi-1K6zvg" rel="noopener noreferrer"&gt;Cana Kit™ Raspberry Pi 5 Starter Kit PRO Turbine Black&lt;/a&gt; (4GB RAM / 128GB Micro SD).&lt;/p&gt;

&lt;h2&gt;
  
  
  The weather application
&lt;/h2&gt;

&lt;p&gt;The weather application we will use can be found in this GitHub repository: &lt;a href="https://github.com/aar0np/weather-app/tree/main" rel="noopener noreferrer"&gt;https://github.com/aar0np/weather-app/tree/main&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This application originally appeared in Chapter 8 of the book &lt;a href="https://www.amazon.com/Code-Java-practical-efficient-applications/dp/9355519990/" rel="noopener noreferrer"&gt;Code with Java 21&lt;/a&gt;. The original project was designed to work with DataStax Astra DB, using the CQL protocol. Our fork of it is a bit different, as it can refresh its data view from either the Astra DB Data API or from a Langflow API endpoint.&lt;/p&gt;

&lt;p&gt;At its core, the project is a Java Spring Boot application, which has a Vaadin web front end, and also exposes two restful endpoints. The idea behind the endpoints is as much for testing as it is functional. One endpoint pulls the most recent update from the NWS API for a given weather station ID, and stores it in Astra DB. The other endpoint retrieves the most-recent reading from Astra DB for a particular station and year/month combination.&lt;/p&gt;

&lt;h3&gt;
  
  
  Restful examples
&lt;/h3&gt;

&lt;p&gt;Pulling the latest reading from the NWS for the station KMSP (Minneapolis/St.Paul International Airport), and storing it in Astra DB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This endpoint returns a response similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"stationId":"https://api.weather.gov/stations/KMSP","monthBucket":202503,"timestamp":"2025-03-07T22:53:00Z","readingIcon":"https://api.weather.gov/icons/land/day/few?size=medium","stationCoordinatesLatitude":-93.22,"stationCoordinatesLongitude":44.88,"temperatureCelsius":5.6,"windDirectionDegrees":310,"windSpeedKMH":20.52,"windGustKMH":0.0,"visibilityM":16090,"precipitationLastHour":0.0,"cloudCover":{"7620":"FEW","1830":"FEW"}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This endpoint pulls the latest reading for a specific station and year/month combination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X GET http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp/month/202503
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This endpoint returns a response similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"stationId":"kmsp","monthBucket":202503,"timestamp":"2025-03-07T22:53:00Z","readingIcon":"https://api.weather.gov/icons/land/day/few?size=medium","stationCoordinatesLatitude":-93.22,"stationCoordinatesLongitude":44.88,"temperatureCelsius":5.6,"windDirectionDegrees":310,"windSpeedKMH":0.0,"windGustKMH":0.0,"visibilityM":0,"precipitationLastHour":0.0,"cloudCover":{"7620":"FEW","1830":"FEW"}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: The above restful GET call is also used to populate the web frontend.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Astra DB Data API
&lt;/h3&gt;

&lt;p&gt;First, create a new Astra DB database. We can also use an existing database, as long as we create the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Keyspace named: “weatherapp”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Non-vector collection named: “weather_data”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are two primary controller methods that handle the above data calls. The first method is named &lt;code&gt;putLatestAstraAPIData&lt;/code&gt; and handles the restful PUT call. First, it performs a GET call on the NWS API endpoint for the &lt;code&gt;stationid&lt;/code&gt; that was passed-in. It takes the payload, maps it to an Astra DB Data API Document type named weatherDoc. Then, it saves weatherDoc in Astra DB (via the Data API). Finally, it maps the response to a &lt;code&gt;WeatherReading&lt;/code&gt; object named &lt;code&gt;currentReading&lt;/code&gt;, and returns it. The code for the &lt;code&gt;putLatestAstraAPIData()&lt;/code&gt; method is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@PutMapping("/astradb/api/latest/station/{stationid}")
public ResponseEntity&amp;lt;WeatherReading&amp;gt; putLatestAstraAPIData(
             @PathVariable(value="stationid") String stationId) {

       LatestWeather response = restTemplate.getForObject(
                    "https://api.weather.gov/stations/" + stationId + 
                    "/observations/latest", LatestWeather.class);

       Document weatherDoc = mapLatestWeatherToDocument(response, stationId);

       // save weather reading
       collection.insertOne(weatherDoc);

       // build response
       WeatherReading currentReading =
                    mapLatestWeatherToWeatherReading(response);

      return ResponseEntity.ok(currentReading);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other controller method is named &lt;code&gt;getLatestAstraAPIData&lt;/code&gt; and it handles the RESTful GET call. This method takes the &lt;code&gt;stationid&lt;/code&gt; and the &lt;code&gt;monthBucket&lt;/code&gt; that were passed-in, and uses the Data API to find any matching documents. As there might be multiple, the results are sorted in descending order by timestamp, and the top document is processed. This ensures that the latest document is mapped and returned.&lt;/p&gt;

&lt;p&gt;The code for the &lt;code&gt;getLatestAstraAPIData&lt;/code&gt;() method is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@GetMapping("/astradb/api/latest/station/{stationid}/month/{month}")
public ResponseEntity&amp;lt;WeatherReading&amp;gt; getLatestAstraAPIData(
             @PathVariable(value="stationid") String stationId,
             @PathVariable(value="month") int monthBucket) {

       Filter filters = Filters.and(eq("station_id",(stationId)),
                           (eq("month_bucket",monthBucket)));
       Sort sort = Sorts.descending("timestamp");
       FindOptions findOpts = new FindOptions().sort(sort);
       FindIterable&amp;lt;Document&amp;gt; weatherDocs = collection.find(filters, findOpts);
       List&amp;lt;Document&amp;gt; weatherDocsList = weatherDocs.all();

       if (weatherDocsList.size() &amp;gt; 0) {
              Document weatherTopDoc = weatherDocsList.get(0);
              WeatherReading currentReading =
                           mapDocumentToWeatherReading(weatherTopDoc);
              return ResponseEntity.ok(currentReading);
       }

       return ResponseEntity.ok(new WeatherReading());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Langflow API
&lt;/h3&gt;

&lt;p&gt;This entire process can also work through Langflow. Open up Langflow, create a new flow, and pick the “Simple Agent” template. &lt;em&gt;This simple agent is all that we need.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61w0oze19un6hzdc8r8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61w0oze19un6hzdc8r8l.png" alt="A sample flow created by selecting the “Simple Agent” template in Langflow." width="753" height="675"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A sample flow created by selecting the “Simple Agent” template in Langflow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The agent is built with the URL “tool,” which allows the agent to call out to external web addresses, including APIs. To expose this agent, we simply need to click on the “API” tab and make note of the Langflow endpoint URL. We will add this URL as an environment variable with our application.&lt;/p&gt;

&lt;p&gt;Inside our application, our call to Langflow is handled by a method named askAgent. Simply put, this method calls our Langflow API endpoint, maps the result, and returns it to the UI. The code for the &lt;code&gt;askAgent()&lt;/code&gt; method can be seen below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public WeatherReading askAgent (AgentRequest req) {

       String reqJSON = new Gson().toJson(req);
       HttpEntity&amp;lt;String&amp;gt; requestEntity =
                    new HttpEntity&amp;lt;&amp;gt;(reqJSON, langflowHeader);

       ResponseEntity&amp;lt;LangflowResponse&amp;gt; resp =
                    restTemplate.exchange(LANGFLOW_URL,
                    HttpMethod.POST,
                    requestEntity,
                    LangflowResponse.class);

       LangflowResponse lfResp = resp.getBody();
       LangflowOutput1[] outputs = lfResp.getOutputs();

       return mapLangflowResponseToWeatherReading(outputs);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The method inside our Vaadin UI code that calls the &lt;code&gt;askAgent()&lt;/code&gt; method, is named refreshLangflow() and is triggered by a button on the UI. It composes a message for our Langflow agent, sends it, and uses the data returned to refresh the UI. The code can be seen below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private void refreshLangflow() {

      String message = "Please retrieve the latest weather data (including the weather icon url) in a text format using this endpoint: "
+ "https://api.weather.gov/stations/" + stationId.getValue() + 
"/observations/latest";
       latestWeather = controller.askAgent(new AgentRequest(message));

       refreshData(latestWeather);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After reviewing the code, we should now be ready to build and configure our hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Raspberry Pi
&lt;/h2&gt;

&lt;p&gt;First of all, we will need to assemble the Pi. Fortunately, Cana Kit has a great &lt;a href="https://www.canakit.com/pi5-case" rel="noopener noreferrer"&gt;setup video&lt;/a&gt; that walks through the entire process.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: I prefer to use Cana Kits, because they come with everything that you need, such as a Micro HDMI cable, a heat sink with fan, and a Micro SD card.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once the Pi is assembled and running, it will check for updates and reboot. When we get to the Raspberry Pi OS desktop, we will have a few things to install (using the Terminal application).&lt;/p&gt;

&lt;h3&gt;
  
  
  Java
&lt;/h3&gt;

&lt;p&gt;For our application to run, we need a Java Virtual Machine (JVM). As we will also need to build our application locally, we’ll need a Java Development Kit (JDK) as well. In our case, our Pi had Java 17 installed, and this is sufficient for our purposes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The Raspberry Pi OS makes it difficult to install and configure newer versions of Java. Fortunately, our project compiles just fine with Java 17.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Maven
&lt;/h3&gt;

&lt;p&gt;Maven is a build- and dependency-management tool for Java. Our project was built with Maven, so we will need to install it as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install maven
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Git
&lt;/h3&gt;

&lt;p&gt;Our Pi also had Git installed. After creating a new SSH key and adding it to your GitHub account, we should be able to clone the project repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone git@github.com:aar0np/weather-app.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download the code and create a local directory for our application, where we can build and run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;First, we will cd into our project directory and then build it with Maven:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd weather-app
mvn clean install -Pproduction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to define three environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export ASTRA_DB_API_ENDPOINT=https://not-real-us-east1.apps.astra.datastax.com
export ASTRA_DB_APP_TOKEN=AstraCS:wtqNOTglg:725REAL238dEITHER563486d
export ASTRA_LANGFLOW_URL=https://api.langflow.astra.datastax.com/lf/6f-not-real-9493/api/v1/run/060d2-not-real-caef?stream=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now run our application. Maven will create a JAR file in the &lt;code&gt;weather-app/target&lt;/code&gt; directory. If we locate this JAR file, we can run it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;java -jar target/weatherapp-0.0.1-SNAPSHOT.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A successful run should produce several log messages, the last of which should look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-03-07T20:02:02.259-06:00  INFO 53787 --- [WeatherApp] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2025-03-07T20:02:02.278-06:00  INFO 53787 --- [WeatherApp] [           main] c.d.weatherapp.WeatherappApplication     : Started WeatherappApplication in 1.795 seconds (process running for 2.147)
2025-03-07T20:02:04.855-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-03-07T20:02:04.855-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2025-03-07T20:02:04.856-06:00  INFO 53787 --- [WeatherApp] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 0 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From another terminal window/tab, let’s add some data to the DB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, navigate to the local IP on port 8080: &lt;a href="http://127.0.0.1:8080/" rel="noopener noreferrer"&gt;http://127.0.0.1:8080/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Station ID is pre-populated with “kmsp” and the current year/month is auto-generated. Clicking either “Astra DB Refresh” or “Langflow Refresh” should produce something similar to this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3i0uug0pq1ylomb71iz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3i0uug0pq1ylomb71iz.png" alt="Our finished application running on a Raspberry Pi" width="720" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The Astra DB Refresh will be faster than the Langflow Refresh.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Depending on the intended usage, it might be preferable to add a crontab entry for the PUT call to keep the data recent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;crontab -e
...
15 * * * * curl -X PUT http://127.0.0.1:8080/weather/astradb/api/latest/station/kmsp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, we should have a working weather application running on a Raspberry Pi! And without having to mount any pesky sensors in the backyard. Want to try this yourself? Get started with &lt;a href="https://www.datastax.com/products/datastax-astra?utm_medium=byline&amp;amp;utm_campaign=building-a-weather-app-with-raspberry-pi-astra-db-langflow&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Astra DB&lt;/a&gt; and &lt;a href="https://www.datastax.com/products/langflow?utm_medium=byline&amp;amp;utm_campaign=building-a-weather-app-with-raspberry-pi-astra-db-langflow&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt; today!&lt;/p&gt;

&lt;p&gt;Happy Pi Day!&lt;/p&gt;

</description>
      <category>programming</category>
      <category>langflow</category>
      <category>raspberrypi</category>
      <category>ai</category>
    </item>
    <item>
      <title>How to Build a Crystal Image Search App with Vector Search</title>
      <dc:creator>Aaron Ploetz</dc:creator>
      <pubDate>Mon, 29 Apr 2024 16:54:49 +0000</pubDate>
      <link>https://dev.to/datastax/how-to-build-a-crystal-image-search-app-with-vector-search-2lcl</link>
      <guid>https://dev.to/datastax/how-to-build-a-crystal-image-search-app-with-vector-search-2lcl</guid>
      <description>&lt;p&gt;There are lots of ways to leverage generative AI (GenAI) in a variety of business use cases at companies of all sizes. In this post, we will explore how a store selling crystals and precious stones can use DataStax’s RAGStack to help their customers to identify and find certain crystals. Specifically, we will walk through creating an application designed to help the customers of &lt;a href="https://healinghouseenergy.com/" rel="noopener noreferrer"&gt;Healing House Energy Spa&lt;/a&gt; (owned by the author’s wife). This will also demonstrate how small businesses can take advantage of GenAI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RAGStack?
&lt;/h2&gt;

&lt;p&gt;RAGStack is DataStax’s Python library that’s designed to help developers build advanced GenAI applications based on &lt;a href="https://www.datastax.com/guides/what-is-retrieval-augmented-generation?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;retrieval-augmented generation&lt;/a&gt; (RAG) techniques. These applications require developers to configure and access data parsers, large language models (LLMs), and vector databases. &lt;/p&gt;

&lt;p&gt;With RAGStack, developers can increase their productivity with GenAI toolsets by interacting with them through a single development stack. DataStax’s integrations with many commonly used libraries and providers enable developers to prototype and build applications faster than ever before. All of this happens on top of &lt;a href="https://www.datastax.com/products/datastax-astra?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;DataStax Astra DB&lt;/a&gt;, which is DataStax’s powerful, multi-region vector database (as shown in Figure 1).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudesepydw5kpm6lo6a1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudesepydw5kpm6lo6a1n.png" alt="Image description" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1 - A high-level view of the Crystal Search application architecture, showing how it leverages RAGStack.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As Astra DB is a key component of RAGStack, we should spend some time discussing vector databases. These are special kinds of databases capable of storing vector data in native structures. When we build RAG applications, we interact with an LLM by using a “vectorized” version of our data. Essentially, the vectors returned are a numerical representation of the individual elements or “chunks” of our data. We will discuss this process in more detail below.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crystal Search application
&lt;/h2&gt;

&lt;p&gt;Here we'll walk through how to build up a simple web application to search an inventory of crystals (and other precious stones). We’ll load our data from a &lt;a href="https://github.com/aar0np/crystalSearch/blob/main/gemstones_and_chakras.csv" rel="noopener noreferrer"&gt;CSV file&lt;/a&gt;, and then query it using a Flask-based web application with navigation drop-downs and a search-by-image function.&lt;/p&gt;

&lt;p&gt;The crystals themselves have several properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name&lt;/strong&gt; What the crystal is known as.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image&lt;/strong&gt; The filename of the on-disk image of the crystal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chakras&lt;/strong&gt; One or more of the seven centers of spiritual power in the human body that the crystal can help attune.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Birth month&lt;/strong&gt; People with certain birth months will be more receptive to this crystal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zodiac sign&lt;/strong&gt; People born under certain zodiac signs will be more receptive to this crystal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mohs hardness&lt;/strong&gt; A measure of the crystal’s resistance to scratching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For our drop-down navigation, we will use a crystal’s recommended chakras, birth month, and zodiac signs. The remaining properties will be added to the collection’s metadata (except for the image itself, which will be used to generate the crystal’s vector embedding).&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="https://huggingface.co/sentence-transformers/clip-ViT-B-32?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;CLIP&lt;/a&gt; model to generate our vector embeddings. CLIP (Contrastive Language-Image Pre-training) is a sentence transformer model (developed by OpenAI) used to store both images and text in the same vector space. The CLIP model is pre-trained with images and text descriptions, and enables us to return results using an &lt;a href="https://www.datastax.com/guides/what-is-nearest-neighbor?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;approximate nearest neighbor&lt;/a&gt; (ANN) algorithm. Leveraging CLIP in this way allows us to support an “identify this crystal” function, where users will be able to search with a picture from their device.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;p&gt;Before building our application, let’s make sure that we properly configure our development environment. We will start by making sure that our Python version is at least on version 3.9. We will also need the following libraries (and versions), as specified in our &lt;code&gt;[requirements.txt](https://github.com/aar0np/crystalSearch/blob/main/requirements.txt)&lt;/code&gt; file.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Flask==2.3.2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Flask-WTF==1.2.1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sentence-transformers==2.2.2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ragstack-ai==0.8.0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;python-dotenv==1.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Flask directory structure
&lt;/h2&gt;

&lt;p&gt;As we are working with a Flask web application, we will need the following directory structure, with &lt;code&gt;crystalSearch&lt;/code&gt; as the “root” of the project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;crystalSearch/
      templates/
      static/
            images/
            input_images/
            web_images/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  DataStax Astra DB
&lt;/h2&gt;

&lt;p&gt;First, we need to sign up for a free account with &lt;a href="https://www.datastax.com/products/datastax-astra?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;DataStax Astra DB&lt;/a&gt;, and create a new vector database. Once we have our Astra DB vector database, we will make note of the token and API endpoint. We will define those as environment variables in the next section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment variables
&lt;/h2&gt;

&lt;p&gt;For our application to run properly, we'll need to set some environment variables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ASTRA_DB_API_ENDPOINT&lt;/code&gt; - Connection endpoint for our Astra DB vector database instance.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ASTRA_DB_APPLICATION_TOKEN&lt;/code&gt; - Security token used to authenticate to our Astra DB instance.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FLASK_APP&lt;/code&gt; - The name of the application’s primary Python file in a Flask web project.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FLASK_ENV&lt;/code&gt; - Indicates to Flask if the application is in development or production mode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, the easiest way to do that is with an &lt;code&gt;.env&lt;/code&gt; file. Our &lt;code&gt;.env&lt;/code&gt; file, should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASTRA_DB_API_ENDPOINT=https://notreal-blah-4444-blah-blah-region.apps.astra.datastax.com
ASTRA_DB_APPLICATION_TOKEN=AstraCS:NotReal:ButYourTokenWillLookSomethingLikeThis
FLASK_APP=crystalSearch
FLASK_ENV=development
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting the FLASK_APP variable to “crystalSearch” is important, as it tells Flask which Python module is the primary entrypoint to the application.&lt;/p&gt;

&lt;h2&gt;
  
  
  crystalLoader.py
&lt;/h2&gt;

&lt;p&gt;With our database and environment all set up, we can build our Python data loader. Create a new Python file named &lt;code&gt;crystalLoader.py&lt;/code&gt;, and set up its imports like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import csv
import json

from os import path, environ
from dotenv import load_dotenv
from PIL import Image
from astrapy.db import AstraDB
from sentence_transformers import SentenceTransformer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will start by bringing in the environment variables from our &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;basedir = path.abspath(path.dirname(__file__))
load_dotenv(path.join(basedir, '.env'))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we will pull in the application endpoint and token, instantiate a database connection object, and then create a new collection named “crystal_data”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Astra connection
ASTRA_DB_APPLICATION_TOKEN = environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT= environ.get("ASTRA_DB_API_ENDPOINT")

db = AstraDB(
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
)

# create "collection"
col = db.create_collection("crystal_data", dimension=512, metric="cosine")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that our collection will have a vector capable of supporting 512 dimensions, so that it matches the dimensions of the vector embeddings created with the CLIP model. Astra DB supports the use of ANN searches with a cosine, dot product, or Euclidean algorithm. For our purposes, a cosine-based ANN will be fine.&lt;/p&gt;

&lt;p&gt;Next, we will define some constants to help our loader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = SentenceTransformer('clip-ViT-B-32')
IMAGE_DIR = "static/images/"
CSV = "gemstones_and_chakras.csv"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These will instantiate the clip-ViT-B-32 model locally, define a location for our images, and data filename, respectively.&lt;/p&gt;

&lt;p&gt;Now let’s open the CSV file in a &lt;code&gt;with&lt;/code&gt; block and initialize the data reader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with open(CSV) as csvHandler:
    crystalData = csv.reader(csvHandler)
    # skip header row
    next(crystalData)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our CSV file has a header row that we will skip at read-time. The &lt;code&gt;next()&lt;/code&gt; function (from Python’s CSV library) is an easy way to iterate over it.&lt;/p&gt;

&lt;p&gt;With that complete, we can now use a &lt;code&gt;for&lt;/code&gt; loop to work through the remaining lines in the file. We will first read the line’s &lt;code&gt;image&lt;/code&gt; column. As our application is very image-centric, we do not want to spend time processing a line if it doesn’t have a valid image. We will use an if conditional to make sure that the file referenced by &lt;code&gt;image&lt;/code&gt; column &lt;em&gt;is both&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;not empty&lt;/li&gt;
&lt;li&gt;a valid file that exists
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for line in crystalData:

    image = line[1]
    # Only load crystals with images
    if image != "" and path.exists(IMAGE_DIR + image):
        # map columns
        gemstone = line[0]
        alt_name = line[2]
        chakras = line[3]
        phys_attributes = line[4]
        emot_attributes = line[5]
        meta_attributes = line[6]
        origin = line[7]
        description = line[8]
        birth_month = line[9]
        zodiac_sign = line[10]
        mohs_hardness = line[11]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the image for each line in the CSV file is indeed valid, we will then map the remaining columns to local variables.&lt;/p&gt;

&lt;p&gt;Two of our variables, &lt;code&gt;chakras&lt;/code&gt; and &lt;code&gt;mohs_hardness&lt;/code&gt;, will require some extra processing before being written into Astra DB. Our chakra data comes from the file as a comma-delimited list. Crystals can affect multiple chakras. Therefore, we will need to reconstruct it into an array with each item wrapped in quotation marks, so that it is recognized as valid JSON. To do that, we will simply replace the commas with double-quoted commas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            # reformat chakras to be more JSON-friendly
            chakras = chakras.replace(', ','","')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will not make it valid JSON on its own, so we will account for that later when we write the chakra data.&lt;/p&gt;

&lt;p&gt;Precious stones all have a rating on the &lt;a href="https://en.wikipedia.org/wiki/Mohs_scale" rel="noopener noreferrer"&gt;Mohs hardness scale&lt;/a&gt;, which indicates its resistance to scratches. While some crystals in our data set have a value of a single integer, several do occupy a range on the scale (with the minimum listed first), indicating a maximum and a minimum Mohs hardness. We will split-out these values, and store them as &lt;code&gt;mohs_min_hardness&lt;/code&gt; and &lt;code&gt;mohs_max_hardness&lt;/code&gt;, respectively. Do note that sometimes the &lt;code&gt;mohs_hardness&lt;/code&gt; column will have a value of “Variable” or “Varies,” so we will account for that possibility as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            # split out minimum and maximum mohs hardress
            mh_list = mohs_hardness.split('-')
            mohs_min_hardness = 1.0
            mohs_max_hardness = 9.0
            if mh_list[0][0:4] != 'Vari':
                mohs_min_hardness = mh_list[0]
                mohs_max_hardness = mh_list[0]
                if len(mh_list) &amp;gt; 1:
                    mohs_max_hardness = mh_list[1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With our data prepared, we can now build each crystal’s text and metadata properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            metadata = (f"gemstone: {gemstone}")

            text = (&amp;lt;em&amp;gt;f&amp;lt;/em&amp;gt;"gemstone: {gemstone}| alternate name: {alt_name}| physical attributes: {phys_attributes}| emotional attributes: {emot_attributes}| metaphysical attributes: {meta_attributes}| origin: {origin}| maximum mohs hardness: {mohs_max_hardness}| minimum mohs hardness: {mohs_min_hardness}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we can load the crystal’s image using Pillow (Python’s image processing library) and generate a vector embedding for it with the &lt;code&gt;encode()&lt;/code&gt; function from our CLIP &lt;code&gt;model&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            img_emb = model.encode(Image.open(IMAGE_DIR + image))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With all that complete, we are ready to build our local JSON document as a string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            strJson = (f' {{"_id":"{image}","text":"{text}","chakra":["{chakras}"],"birth_month":"{birth_month}","zodiac_sign":"{zodiac_sign}","$vector":{str(img_emb.tolist())}}}')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we can convert each crystal’s data to JSON and write it into Astra DB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            doc = json.loads(strJson)
            col.insert_one(doc)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  crystalSearch.py
&lt;/h2&gt;

&lt;p&gt;To demonstrate the visual aspects of Crystal Search, we will stand-up a simple web application using Flask. This interface will have a few simple components, including dropdowns (for navigation) and a way to upload an image for searching.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: As web front-end development is not the focus, we’ll skip the implementation details. For those who are interested, the code can be accessed in the project repository listed at the end of this post.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  astraConn.py
&lt;/h2&gt;

&lt;p&gt;Now that our data has been loaded, we can build the Crystal Search application. First, we will construct the &lt;code&gt;astraConn&lt;/code&gt; module, which will act as an abstraction layer for our interactions with the Astra DB vector database. We will create a new file named &lt;code&gt;astraConn.py&lt;/code&gt; and add the following two imports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os

from astrapy.db import AstraDB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we will pull-in our &lt;code&gt;ASTRA_DB_APPLICATION_TOKEN&lt;/code&gt; and &lt;code&gt;ASTRA_DB_API_ENDPOINT&lt;/code&gt; variables from our system environment, and instantiate them locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT= os.environ.get("ASTRA_DB_API_ENDPOINT")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This module will have a few different methods that will be called by our application, but we won’t want to rebuild our database connection each time. Therefore, we will create two global variables (&lt;code&gt;db&lt;/code&gt; and &lt;code&gt;collection&lt;/code&gt;) to keep data pertaining to our database cached:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db = None
collection = None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first method that we will define will be the &lt;code&gt;init_collection()&lt;/code&gt; method. This method will be called by every other method in this module. It will first initiate global scope access for the &lt;code&gt;db&lt;/code&gt; and &lt;code&gt;collection&lt;/code&gt; variables. Its primary function will be to instantiate the &lt;code&gt;db&lt;/code&gt; object if it is null or “None.” This way, an existing connection object can be reused. The code for this method is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def init_collection(table_name):
    global db
    global collection

    if db is None:
        db = AstraDB(
            token=ASTRA_DB_APPLICATION_TOKEN,
            api_endpoint=ASTRA_DB_API_ENDPOINT,
        )

    collection = db.collection(table_name)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that the &lt;code&gt;collection&lt;/code&gt; variable will be instantiated on every call. This allows us the flexibility to access different collections in Astra DB with the same database connection information.&lt;/p&gt;

&lt;p&gt;For our application, there are three ways that we will perform reads on our data. We will search by vector, query by id, and then query by three additional properties that we are going to build into dropdowns in our web application.&lt;/p&gt;

&lt;p&gt;First, we will build the &lt;code&gt;get_by_vector()&lt;/code&gt; method. This asynchronous method will accept a collection name, a vector embedding, and a maximum &lt;code&gt;(limit)&lt;/code&gt; number of results to be returned (defaulting to 1). After initializing our database and collection, we will invoke the &lt;code&gt;vector_find()&lt;/code&gt; method with the &lt;code&gt;vector_embedding&lt;/code&gt;, the &lt;code&gt;limit&lt;/code&gt;, and the list of fields from the collection that we want to receive. We will then return the &lt;code&gt;results&lt;/code&gt; to the calling method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def get_by_vector(collection_name, vector_embedding, limit=1):
    init_collection(collection_name)

    results = collection.vector_find(vector_embedding.tolist(), limit=limit, fields={"text","chakra","birth_month","zodiac_sign","$vector"})
    return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our &lt;code&gt;get_by_id()&lt;/code&gt; method will be similar to the previous one, but will work quite differently under the hood. This method is also meant to be called asynchronously, and accepts a collection name as well as the identifier to be queried. As querying by a unique identifier is deterministic, we can invoke the &lt;code&gt;find_one()&lt;/code&gt; method with a filter for the specific &lt;code&gt;id&lt;/code&gt;, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def get_by_id(collection_name, id):
    init_collection(collection_name)

    result = collection.find_one(filter={"_id": id})
    return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method will return a single JSON document as the &lt;code&gt;result&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Finally, &lt;code&gt;get_by_dropdowns()&lt;/code&gt; is an asynchronous method that will return all matching rows based on the values of three properties: chakras, birth month, and zodiac sign. First, we will build an array to hold our &lt;code&gt;conditions&lt;/code&gt;. This is necessary because not every dropdown is going to be used each time. That way we can dynamically build our conditions based on the state of the dropdowns at query-time.&lt;br&gt;
async def get_by_dropdowns(collection_name, chakra, birth_month, zodiac_sign):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;init_collection(collection_name)

    conditions = []

    if chakra != "--Chakra--":
        condition_chakra = {"chakra": {"$in": [chakra]}}
        conditions.append(condition_chakra)

    if birth_month != "--Birth Month--":
        condition_birth_month = {"birth_month": birth_month}
        conditions.append(condition_birth_month)

    if zodiac_sign != "--Zodiac Sign--":
        condition_zodiac_sign = {"zodiac_sign": zodiac_sign}
        conditions.append(condition_zodiac_sign)

    crystal_filter = ""

    if len(conditions) &amp;gt; 2:
        crystal_filter = {"$and": [{"$and": [conditions[0], conditions[1]]}, conditions[2]]}
    elif len(conditions) &amp;gt; 1:
        crystal_filter = {"$and": [conditions[0], conditions[1]]}
    elif len(conditions) &amp;gt; 0:
        crystal_filter = conditions[0]
    else:
        return 

    results = collection.find(crystal_filter)
    return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the &lt;code&gt;conditions&lt;/code&gt; array is built, we can then build &lt;code&gt;crystal_filter&lt;/code&gt; to use as our JSON query string. To pass a filter with multiple conditions through Astra DB’s Data API, we need to build a nested conditional statement.&lt;/p&gt;

&lt;p&gt;A single condition could be sent as a filter on its own. But two would need to use the &lt;code&gt;$and&lt;/code&gt; operator. If we were to hard-code our filter, it would be similar to this example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;crystal_filter = {"$and": [{"birth_month": "October"}, {"zodiac_sign": "Libra"}]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, this also means that three conditions would require a nested &lt;code&gt;$and&lt;/code&gt; (one &lt;code&gt;$and&lt;/code&gt; inside of another), like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;crystal_filter = {"$and": [{"$and": [{"birth_month": "October"}, {"zodiac_sign": "Libra"}]}, {"chakra": {"$in": ["Heart"]}}]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that as each crystal’s &lt;code&gt;chakra&lt;/code&gt; property is an array, we need to use the &lt;code&gt;$in&lt;/code&gt; operator.&lt;/p&gt;

&lt;h2&gt;
  
  
  crystalServices.py
&lt;/h2&gt;

&lt;p&gt;Next, we will create a new file named &lt;code&gt;crystalServices.py&lt;/code&gt; with the following imports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import os

from astraConn import get_by_vector
from astraConn import get_by_id
from astraConn import get_by_dropdowns
from sentence_transformers import SentenceTransformer
from PIL import Image

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will also define some local variables for our image directory, the name of our collection in Astra DB, and our CLIP model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INPUT_IMAGE_DIR = "static/input_images/"
DATA_COLLECTION_NAME = "crystal_data"
model = None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our service layer will expose two asynchronous methods. The first method that we will build, will be named &lt;code&gt;get_crystals_by_image&lt;/code&gt;, and it will accept an image filename as a parameter. It will be primarily responsible for generating a vector embedding from an image, using the embedding to invoke a vector similarity search, and returning the results to the view. This method will need the model global variable, and instantiate it if required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def get_crystals_by_image(file_path):
    global model

    if model is None:
        model = SentenceTransformer('clip-ViT-B-32')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we will define our result set variable as an empty dictionary. Then we will load the image, generate an embedding for it, and use it to call the &lt;code&gt;get_by_vector()&lt;/code&gt; method from &lt;code&gt;(astraConn.py)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    results = {}        
    img_emb = model.encode(Image.open(INPUT_IMAGE_DIR + file_path))
    crystal_data = await get_by_vector(DATA_COLLECTION_NAME, img_emb, 3)

    if crystal_data is not None:
        for crystal in crystal_data:
            id = crystal['_id']
            results[id] = parse_crystal_data(crystal)

    return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we will process and return the vector search results. Note that the &lt;code&gt;parse_crystal_data()&lt;/code&gt; method does much of the heavy-lifting of building the result set. We will construct that method toward the end of this module.&lt;/p&gt;

&lt;p&gt;We will now move on to the &lt;code&gt;get_crystals_by_facets()&lt;/code&gt; method. This method accepts the values taken from three dropdown lists containing data for chakras, birth month, and zodiac sign. Similar to the prior method, we will define an empty dictionary for the results and perform a query on our data, before processing and returning the &lt;code&gt;results&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def get_crystals_by_facets(chakra, birth_month, zodiac_sign):
 results = {}
 crystal_data = await get_by_dropdowns(DATA_COLLECTION_NAME, chakra, birth_month, zodiac_sign)

 if crystal_data is not None:
  for crystal in crystal_data['data']['documents']:
   id = crystal['_id']
   results[id] = parse_crystal_data(crystal)

 return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also two additional code blocks required to more easily transfer our data back up to the view layer. The first is the &lt;code&gt;parse_crystal_data()&lt;/code&gt; method. This method is fairly straightforward in that it takes the raw crystal data as a parameter, and processes each property into a new object of the Crystal class. As the final part of this module, we also need to add the Crystal object class. They will not be shown here, but both of these definitions can be found at the end of the crystalServices.py module.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Let’s see this in action. We will run the application with Flask. The complete code listed above (including all of the front end components) can be found in &lt;a href="https://github.com/aar0np/crystalSearch" rel="noopener noreferrer"&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To run the application, we will use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flask run -p 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it starts correctly, Flask should display the application name, address and port that it is bound to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; * Serving Flask app 'crystalSearch'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:8080

Press CTRL+C to quit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we navigate to that address in a browser, we should see a simple web page with a search interface at the top, and three differently-colored dropdowns in the left navigation. If we select values for the dropdowns and click on the “Find Crystals” button, we should see crystals matching those values returned (Figure 2).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn43lydb32h13rzrtt3rw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn43lydb32h13rzrtt3rw.png" alt="Image description" width="800" height="695"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2 - Results for crystals matching the dropdown values where chakra is “Heart”, birth month is “October,” and zodiac sign is “Libra.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Of course, we can also search with an image. Perhaps we have a picture of a crystal that we cannot identify. We can click on the “Choose File” button, select our image, and then click “Search” to see what the closest matches are. If our picture is of a black obsidian crystal, we will see results similar to Figure 3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuse1px1x666tew1fpfsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuse1px1x666tew1fpfsr.png" alt="Image description" width="800" height="741"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 3 - Results for crystals matching our image of a black obsidian crystal.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we have demonstrated another possible use case for an image-based search built with RAGStack and Astra DB. We walked through this very unique use case, how to configure the development environment, load and query data using CLIP, and build an application to leverage image-based vector embeddings. We also showed how to use the Astra DB &lt;a href="https://docs.datastax.com/en/astra/astra-db-vector/api-reference/overview.html?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;Data API&lt;/a&gt; to implement a simple product faceting approach using dropdowns.&lt;/p&gt;

&lt;p&gt;As the world continues to embrace GenAI, we will surely see more and more creative use cases spanning multiple industries. Searching by images using CLIP is one of the ways in which we are pushing the boundaries of conventional data applications. With solutions like RAGStack and Astra DB, DataStax continues to help you build the next generation of applications. &lt;/p&gt;

&lt;p&gt;Do you have an idea for a great use of GenAI? Pull down &lt;a href="https://docs.datastax.com/en/ragstack/docs/index.html?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;RAGStack&lt;/a&gt; and start using &lt;a href="https://www.datastax.com/products/datastax-astra?utm_source=dev-to&amp;amp;utm_medium=byline&amp;amp;utm_campaign=vector&amp;amp;utm_term=all-plays&amp;amp;utm_content=crystal-search" rel="noopener noreferrer"&gt;Astra DB&lt;/a&gt; with a free account today!&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>webdev</category>
      <category>python</category>
    </item>
    <item>
      <title>The Distributed Data Problem</title>
      <dc:creator>Aaron Ploetz</dc:creator>
      <pubDate>Tue, 18 Oct 2022 18:33:20 +0000</pubDate>
      <link>https://dev.to/datastax/the-distributed-data-problem-52gj</link>
      <guid>https://dev.to/datastax/the-distributed-data-problem-52gj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjyf9lcud41h6jpiavzz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjyf9lcud41h6jpiavzz.jpg" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Getty Images&lt;/p&gt;

&lt;p&gt;Today, online retailers sell millions of products and services to customers all around the world.  This was more prevalent in 2020, as COVID-19 restrictions all but eliminated visits to brick-and-mortar stores and in-person transactions. Of course, consumers still needed to purchase food, clothing, and other essentials and, as a result, world-wide digital sales channels rose to the tune of &lt;a href="https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/" rel="noopener noreferrer"&gt;$4.2 trillion&lt;/a&gt;, up $900 billion from just a year prior.&lt;/p&gt;

&lt;p&gt;Was it enough for those retailers to have robust websites and mobile apps to keep their customers from shopping with competitors?  Unfortunately, not. Looking across the ecommerce landscape of 2020, there were clear winners and losers. But what was the deciding factor?&lt;/p&gt;

&lt;p&gt;Consider this: 40% of consumers will leave after only &lt;em&gt;&lt;a href="https://www.thinkwithgoogle.com/marketing-strategies/app-and-mobile/mobile-page-speed-load-time/" rel="noopener noreferrer"&gt;three seconds&lt;/a&gt;&lt;/em&gt; if a web page or mobile app fails to fully load. That’s not a lot of time to make sure that everything renders properly. While there is a lot to be said for properly optimizing images and code, all that work can be for naught if data latency consumes a significant portion of load time.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Co-location helps—but can pose new challenges&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A good way to cut down on data latency is to deploy databases in the same geographic regions as the applications they serve. For example, an enterprise headquartered in San Diego may have applications deployed in data centers located in the Western United States (US), the Eastern US, and Western Europe. The customers who live in those regions are directed to the applications and services closest to them.&lt;/p&gt;

&lt;p&gt;But what kind of experience will a customer in London have if the app performs quickly, but then slows? Or worse, what if it stalls when it has to make a data call back to San Diego? This is why co-locating an application with its data is so important.&lt;/p&gt;

&lt;p&gt;The concept is deceptively simple. Data needs to be distributed to the regions where it’s needed. Actually, accomplishing that, however, can be challenging. One option would be to deploy our data in a traditional, relational database management system (RDBMS) with one database server in each of the three regions.&lt;/p&gt;

&lt;p&gt;But then we have to deal with questions that arise from an operational standpoint. How would we keep the database servers in-sync across all regions? How would we scale them to meet fluctuating traffic demands? Questions like these drive what I call the distributed data problem. And, like any complex problem, addressing the issue often requires specific tools.  &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The promise of NoSQL with horizontal scaling and data center awareness&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is where non-relational, NoSQL (“not only SQL”) databases come into play. NoSQL databases primarily evolved over the last decade as an alternative to single-instance relational database management systems, or RDBMS, which had trouble keeping up with the throughput demands of web-scale internet traffic. They solve scalability problems through a process known as “horizontal scaling” where multiple server instances of the database are linked to each other to form a “cluster.”&lt;/p&gt;

&lt;p&gt;Some of the NoSQL database products were also engineered with data center awareness, meaning the database is configured to logically group together certain instances to optimize the distribution of user data and workloads.&lt;/p&gt;

&lt;p&gt;For instance, &lt;a href="https://cassandra.apache.org/_/index.html" rel="noopener noreferrer"&gt;Apache Cassandra&lt;/a&gt;, the open source NoSQL database that was introduced by Facebook in 2007, is both horizontally scalable and data center aware. If we were to deploy Cassandra to solve this problem, it would look something like the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmlm1lzm8uoubc09fzov.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmlm1lzm8uoubc09fzov.jpg" alt="Image description" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DataStax&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A hypothetical deployment of Apache Cassandra, with one cluster spanning three regional data centers deployed in the Western US, Eastern US, and Western Europe.  Map not to scale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Our hypothetical ecommerce retailer headquartered in San Diego could perform writes on their “local” data center (DC), which would be the Western US DC. That data would then replicate to the other two data centers, in the Eastern US and Western Europe. The applications deployed regionally could then be configured to get data from their local data center.&lt;/p&gt;

&lt;p&gt;In this way, all data interactions initiated by the customer would be limited to their own geographic area. This prevents cross-DC, high-latency operations from being noticeable to any end users.&lt;/p&gt;

&lt;p&gt;The other advantage to this type of deployment is that the regional data centers can be scaled independently of each other. Perhaps traffic is increasing in Western Europe, requiring additional resources to be added. In this case, new database instances can be added to just that one DC; the resource levels of those that aren’t needed won’t increase—and won’t add unnecessary cost. Once the traffic subsides, the instances in a DC can also be reduced to lower costs.&lt;/p&gt;

&lt;p&gt;Of course, scaling a cluster or a DC up and down can require complex infrastructure operations.  Deploying databases while using orchestration platforms like Kubernetes can greatly ease that burden, enabling enterprises to worry less about infrastructure and more about new capabilities (To learn more about this topic, check out “&lt;a href="https://medium.com/building-the-open-data-stack/a-case-for-databases-on-kubernetes-from-a-former-skeptic-31250d2350c" rel="noopener noreferrer"&gt;A Case for Databases on Kubernetes from a Former Skeptic&lt;/a&gt;). Cloud-native, serverless databases like &lt;a href="https://astra.dev/3socHZj" rel="noopener noreferrer"&gt;DataStax Astra DB&lt;/a&gt; can take that a step further, essentially rendering the underlying infrastructure invisible to application developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Two predictions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I expect that in another 10 years, single-instance databases will become a relic of the past. As that happens, more and more database products will embrace data center awareness.&lt;/p&gt;

&lt;p&gt;The trends of horizontal scaling and data center awareness are driving more innovation in database technologies than ever before.&lt;/p&gt;

&lt;p&gt;It’s difficult to see when the next big uptick in ecommerce will happen or how widespread the effect will be relative to what we saw in 2020. One thing is certain. Those enterprises that are able to solve the distributed data problem by reducing latency to deliver data more quickly to the apps their customers are using will find themselves well ahead of their competition.&lt;/p&gt;

&lt;p&gt;To learn more, visit us &lt;a href="https://www.datastax.com/resources" rel="noopener noreferrer"&gt;here.&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
