Bananagram is a cooler version of Scrabble! In a race to the finish, players build word grids with their letters, aiming to use them all first.
But word games can be tough when you're playing in a language that you aren't native in.
Except if you have an Android app leveraging Gemini Pro vision!
What are we building
We will build Potassium (π) an application that suggests words that can be spelled given the tiles that are available. To do this we'll leverage Gemini Pro vision to:
- Analyze a picture of the tiles and extract the list letters available,
- List words that can be spelled with these letters.
Can Gemini Pro actually do this?
Experimenting with the ML model is key as crafting a prompt often requires multiple iterations before reaching a satisfying result.
Letβs use Google AI studio to evaluate Gemini Pro vision capabilities.
Can the model create a list of the letters based on a picture of the tiles?
Then, can the model then return a list of words made with these letters?
Add Gemini to your application
Now that we crafted a prompt that returns a relatively satisfying response (some suggested words might or might not be valid Scrabble words), letβs create the app!
On the top left of Google AI studio, click on βGet API keyβ to get your Gemini API key.
Then, click on Get code on the top right of Google AI studio to access the code snippet.
- Add the Gradle dependencies to your appβs
build.gradle
file:
implementation("com.google.ai.client.generativeai:generativeai:0.1.1")
- In your Kotlin code create a
GenerativeModel
:
Define the generationConfig that will be used by the model. e.g:
val generationConfig = generationConfig {
temperature = 0.15f
topK = 32
topP = 1f
maxOutputTokens = 4096
}
The configuration is reflecting the adjustments you made in the "Run settings" section of the console. These parameters define the creativity and diversity of the text generated during inference.
topK
: the Top-K value defines, out of the token generated by the model, the number (k) of tokens considered for the output.
topP
: the Top-P value is used to define the cumulative probability of the k tokens (after normalization of their probability) considered for the output.
temperature
: controls the level of randomness of the token selected for the output.
To learn more about the LLM sampling mechanism, Vibudh Singh wrote a good explainer.
Then instantiate the GenerativeModel
:
val model = GenerativeModel(
"gemini-pro-vision",
"your_gemini_key",
generationConfig = generationConfig,
safetySettings = listOf(
SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE),
SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE),
),
)
- You can then call the model as follow:
viewModelScope.launch {
val result = model.generateContent(
content {
image(bitmap)
text("What are the letters displayed on the tiles? " +
"And given these letter which Scrabble words can you spell with it?")
}
)
}
Youβll note that we pass both an image
(as a bitmap
) and a text
as content
.
To create your bitmap
, you can simply access the camera using rememberLauncherForActivityResult
in compose:
val resultLauncher =
rememberLauncherForActivityResult(ActivityResultContracts.StartActivityForResult()) { result: ActivityResult ->
if (result.resultCode == Activity.RESULT_OK) {
if (result?.data != null) {
bitmap = result.data?.extras?.get("data") as Bitmap
}
}
}
[...]
Button (
onClick = {
resultLauncher.launch(cameraIntent)
},
)
You'll find a very basic compose scaffolding in this gist.
Top comments (0)