DEV Community

Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: Using Java with OpenAI and SingleStoreDB

Abstract

When writing this article (July 2023), OpenAI does not officially provide libraries for the Java programming language. However, Java support is available through the openai-java project listed under Community libraries. In this short article, we'll store some vector data in SingleStoreDB, access the data and test ChatGPT using some simple Java code.

The Java code file used in this article is available on GitHub.

Introduction

Despite the great interest in Python for many Data Science and Machine Learning tasks, Java is still one of the most popular programming languages in the world, as shown by the TIOBE Index. As of July 2023, for example, Java ranks 3rd on the TIOBE Index under the Very Long Term History section. Java also ranks highly according to IEEE Spectrum. It is still a popular programming language in many large enterprise environments. In this short article, we'll see how to use Java with OpenAI and SingleStoreDB.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use OpenAI Demo Group as our Workspace Group Name and openai-demo as our Workspace Name. We'll make a note of our password and host name.

OpenAI API Key

We need to create an account on the OpenAI website. This provides some free credits. Since we will mainly use embeddings, the cost will be minimal. We'll also need to create an OpenAI API Key. This can be created from USER > API keys in our OpenAI account.

Create a Database and Table

In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS winter_wikipedia;

USE winter_wikipedia;
Enter fullscreen mode Exit fullscreen mode

We'll also create a table, as follows:

CREATE TABLE IF NOT EXISTS winter_olympics_2022 (
    text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    embedding BLOB
);
Enter fullscreen mode Exit fullscreen mode

We'll download the Winter Olympics CSV file from OpenAI to populate this table. This CSV file contains articles about the Winter Olympics 2022, and the file consists of two columns, as follows:

  1. Article text
  2. Pre-computed vector embeddings

Load the Data

We'll now connect to SingleStoreDB using a MySQL client, as follows:

mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p
Enter fullscreen mode Exit fullscreen mode

We'll replace the <host> with the value from our SingleStoreDB Cloud account. We'll be prompted for our SingleStoreDB Cloud account password.

Once connected, we'll load the CSV file data into the table, as follows:

USE winter_wikipedia;

LOAD DATA LOCAL INFILE '/path/to/winter_olympics_2022.csv'
INTO TABLE winter_olympics_2022(text, @embedding)
COLUMNS TERMINATED BY ',' ENCLOSED BY '"'
SET embedding = JSON_ARRAY_PACK(@embedding);
Enter fullscreen mode Exit fullscreen mode

We'll replace /path/to/ with the actual path to where the CSV file is located.

Create a Maven project

For quick testing, we'll use maven and build and run our code from the command line.

pom.xml

The pom.xml file is very straightforward with details of the Java version, the main dependencies and that we want to build a single jar file with all the dependencies:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.s2</groupId>
  <artifactId>s2-app</artifactId>
  <version>1.0-SNAPSHOT</version>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <dependencies>
    <!-- https://mvnrepository.com/artifact/com.theokanning.openai-gpt3-java/api -->
    <dependency>
      <groupId>com.theokanning.openai-gpt3-java</groupId>
      <artifactId>service</artifactId>
      <version>0.14.0</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/com.squareup.retrofit2/retrofit -->
    <dependency>
      <groupId>com.squareup.retrofit2</groupId>
      <artifactId>retrofit</artifactId>
      <version>2.7.2</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/com.singlestore/singlestore-jdbc-client -->
    <dependency>
      <groupId>com.singlestore</groupId>
      <artifactId>singlestore-jdbc-client</artifactId>
      <version>1.1.7</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-nop -->
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-nop</artifactId>
      <version>2.0.7</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <archive>
            <manifest>
              <mainClass>fully.qualified.MainClass</mainClass>
            </manifest>
          </archive>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>
    </plugins>
  </build>

</project>
Enter fullscreen mode Exit fullscreen mode

App class

In our application, we'll perform some simple operations. Here are the steps:

  1. Ask ChatGPT a question about the Winter Olympics 2022. It should respond that it does not know because its training data was before the Winter Olympics 2022.
  2. Get vector embeddings from OpenAI for a string pattern similar to the question in 1.
  3. Use DOT_PRODUCT with the vector embeddings stored for the Winter Olympics 2022 data in SingleStoreDB and the vector embeddings from 2, and return the top 5 answers.
  4. Use the text result returned in 3 to give ChatGPT some new information that it can use to answer the original question in 1. ChatGPT should now be able to provide an answer.

The code is quite simple and only used to test the OpenAI API. It requires work to make it more robust.

Build the code

We can build the code from the command line using maven, as follows:

mvn clean compile assembly:single
Enter fullscreen mode Exit fullscreen mode

This will create a jar file with all the dependencies.

Create Environment Variables

We'll need to create the following variables before we can run our program:

export OPENAI_TOKEN="<OpenAI API Key>"
export S2_HOST="<host>"
export S2_PASSWORD="<password>"
Enter fullscreen mode Exit fullscreen mode

We'll replace <OpenAI API Key>, <host>, and <password> with our OpenAI API Key, SingleStoreDB Cloud host, and SingleStoreDB Cloud password, respectively.

Run the code

We'll run our code, as follows:

java -cp target/s2-app-1.0-SNAPSHOT-jar-with-dependencies.jar com.s2.openai.App
Enter fullscreen mode Exit fullscreen mode

We should see output similar to the following in the command line window.

First, the answer to the original question:

ChatGPT says: Sorry, but I cannot predict the winners of future events like the Olympics 2022 as it is against OpenAI's use case policy.
Enter fullscreen mode Exit fullscreen mode

Next, the results of the DOT_PRODUCT:

Score: 0.8790971040725708
Text: Curling at the 2022 Winter Olympics

==Medal summary==

===Medal table===

{{Medals table
 | caption        = 
 | host           = 
 | flag_template  = flagIOC
 | event          = 2022 Winter
 | team           = 
 | gold_CAN = 0 | silver_CAN = 0 | bronze_CAN = 1
 | gold_ITA = 1 | silver_ITA = 0 | bronze_ITA = 0
 | gold_NOR = 0 | silver_NOR = 1 | bronze_NOR = 0
 | gold_SWE = 1 | silver_SWE = 0 | bronze_SWE = 2
 | gold_GBR = 1 | silver_GBR = 1 | bronze_GBR = 0
 | gold_JPN = 0 | silver_JPN = 1 | bronze_JPN - 0
}}
Score: 0.8719464540481567
Text: Curling at the 2022 Winter Olympics

==Results summary==

===Women's tournament===

====Playoffs====

=====Gold medal game=====

''Sunday, 20 February, 9:05''
{{#lst:Curling at the 2022 Winter Olympics – Women's tournament|GM}}
{{Player percentages
| team1 = {{flagIOC|JPN|2022 Winter}}
| [[Yurika Yoshida]] | 97%
| [[Yumi Suzuki]] | 82%
| [[Chinami Yoshida]] | 64%
| [[Satsuki Fujisawa]] | 69%
| teampct1 = 78%
| team2 = {{flagIOC|GBR|2022 Winter}}
| [[Hailey Duff]] | 90%
| [[Jennifer Dodds]] | 89%
| [[Vicky Wright]] | 89%
| [[Eve Muirhead]] | 88%
| teampct2 = 89%
}}
Score: 0.8690885901451111
Text: Curling at the 2022 Winter Olympics

==Results summary==

===Mixed doubles tournament===

====Playoffs====

=====Gold medal game=====

''Tuesday, 8 February, 20:05''
{{#lst:Curling at the 2022 Winter Olympics – Mixed doubles tournament|GM}}
{| class="wikitable"
!colspan=4 width=400|Player percentages
|-
!colspan=2 width=200 style="white-space:nowrap;"| {{flagIOC|ITA|2022 Winter}}
!colspan=2 width=200 style="white-space:nowrap;"| {{flagIOC|NOR|2022 Winter}}
|-
| [[Stefania Constantini]] || 83%
| [[Kristin Skaslien]] || 70%
|-
| [[Amos Mosaner]] || 90%
| [[Magnus Nedregotten]] || 69%
|-
| '''Total''' || 87%
| '''Total''' || 69%
|}
Score: 0.8679166436195374
Text: Curling at the 2022 Winter Olympics

==Medal summary==

===Medalists===

{| {{MedalistTable|type=Event|columns=1}}
|-
|Men<br/>{{DetailsLink|Curling at the 2022 Winter Olympics – Men's tournament}}
|{{flagIOC|SWE|2022 Winter}}<br>[[Niklas Edin]]<br>[[Oskar Eriksson]]<br>[[Rasmus Wranå]]<br>[[Christoffer Sundgren]]<br>[[Daniel Magnusson (curler)|Daniel Magnusson]]
|{{flagIOC|GBR|2022 Winter}}<br>[[Bruce Mouat]]<br>[[Grant Hardie]]<br>[[Bobby Lammie]]<br>[[Hammy McMillan Jr.]]<br>[[Ross Whyte]]
|{{flagIOC|CAN|2022 Winter}}<br>[[Brad Gushue]]<br>[[Mark Nichols (curler)|Mark Nichols]]<br>[[Brett Gallant]]<br>[[Geoff Walker (curler)|Geoff Walker]]<br>[[Marc Kennedy]]
|-
|Women<br/>{{DetailsLink|Curling at the 2022 Winter Olympics – Women's tournament}}
|{{flagIOC|GBR|2022 Winter}}<br>[[Eve Muirhead]]<br>[[Vicky Wright]]<br>[[Jennifer Dodds]]<br>[[Hailey Duff]]<br>[[Mili Smith]]
|{{flagIOC|JPN|2022 Winter}}<br>[[Satsuki Fujisawa]]<br>[[Chinami Yoshida]]<br>[[Yumi Suzuki]]<br>[[Yurika Yoshida]]<br>[[Kotomi Ishizaki]]
|{{flagIOC|SWE|2022 Winter}}<br>[[Anna Hasselborg]]<br>[[Sara McManus]]<br>[[Agnes Knochenhauer]]<br>[[Sofia Mabergs]]<br>[[Johanna Heldin]]
|-
|Mixed doubles<br/>{{DetailsLink|Curling at the 2022 Winter Olympics – Mixed doubles tournament}}
|{{flagIOC|ITA|2022 Winter}}<br>[[Stefania Constantini]]<br>[[Amos Mosaner]]
|{{flagIOC|NOR|2022 Winter}}<br>[[Kristin Skaslien]]<br>[[Magnus Nedregotten]]
|{{flagIOC|SWE|2022 Winter}}<br>[[Almida de Val]]<br>[[Oskar Eriksson]]
|}
Score: 0.8668502569198608
Text: Curling at the 2022 Winter Olympics

==Results summary==

===Men's tournament===

====Playoffs====

=====Gold medal game=====

''Saturday, 19 February, 14:50''
{{#lst:Curling at the 2022 Winter Olympics – Men's tournament|GM}}
{{Player percentages
| team1 = {{flagIOC|GBR|2022 Winter}}
| [[Hammy McMillan Jr.]] | 95%
| [[Bobby Lammie]] | 80%
| [[Grant Hardie]] | 94%
| [[Bruce Mouat]] | 89%
| teampct1 = 90%
| team2 = {{flagIOC|SWE|2022 Winter}}
| [[Christoffer Sundgren]] | 99%
| [[Rasmus Wranå]] | 95%
| [[Oskar Eriksson]] | 93%
| [[Niklas Edin]] | 87%
| teampct2 = 94%
}}
Enter fullscreen mode Exit fullscreen mode

Finally, given the new information, the answer to the question:

ChatGPT says: In the Men's tournament, the gold medal in curling at the 2022 Winter Olympics was won by Sweden (Niklas Edin, Oskar Eriksson, Rasmus Wranå, Christoffer Sundgren, Daniel Magnusson).
Enter fullscreen mode Exit fullscreen mode

Summary

This short article showed an example of using the OpenAI API from a Java program. We used SingleStoreDB to store the pre-computed vector embeddings and then performed a DOT_PRODUCT to return the best results. These results were used to give ChatGPT new information, which it could use to answer a previously unknown question.

Top comments (0)