DEV Community

Francesco Bonacci
Francesco Bonacci

Posted on

NavAIGuide-TS

🤖 NavAIGuide-TS

https://github.com/francedot/NavAIGuide-TS

🤔 What is NavAIGuide?

NavAIGuide (/næv eɪ aɪ ɡaɪd/) is a TypeScript Extensible components toolkit for integrating LLMs into Navigation Agents and Browser Companions. Key features include:

  • Natural Language Task Detection: Supports both visual (using GPT-4V) and textual modes to identify tasks from web pages.
  • Automation Code Generation: Automates the creation of code for predicted tasks with options for Playwright (requires Node) or native JavaScript Browser APIs.
  • Visual Grounding: Enhances the accuracy of locating visual elements on web pages for better interaction.
  • Efficient DOM Processing and Token Reduction: Utilizes advanced strategies for DOM element management, significantly reducing the number of tokens required for accurate grounding and action detection.
  • Reliability: Includes a retry mechanism with exponential backoff to handle transient failures in LLM calls.
  • JSON Mode & Action-based Framework: Utilizes JSON mode and reproducible outputs for predictable outcomes and an action-oriented approach for task execution.

NavAIGuide Agents extend the core toolkit with advanced automation solutions:

  • Preview of Playwright-based Agents: Initial offerings for browser automation.
  • Cross-platform Appium Support: Future updates will introduce compatibility with Appium for broader device coverage.

NavAIGuide aims to streamline the development process for web navigation assistants, offering a comprehensive suite of tools for developers to leverage LLMs in web automation efficiently.

⚡️ Quick Install

You can use npm, yarn, or pnpm to install NavAIGuide

npm:

  npm install navaiguide-ts
  // With Playwright:
  npm install --save-dev "@playwright/test"
  npx playwright install
Enter fullscreen mode Exit fullscreen mode

Yarn:

  yarn add navaiguide-ts
  // With Playwright:
  yarn add --dev "@playwright/test"
  npx playwright install
Enter fullscreen mode Exit fullscreen mode

💻 Getting Started

Prerequisites

  • Node.js
  • Access to OpenAI or AzureAI services
  • Playwright for automation capabilities

OpenAI & AzureAI Key Configuration

Configure the necessary environment variables. For example locally through .env.local (requires dotenv):

  • OPENAI_API_KEY: Your OpenAI API key.
  • Azure AI API keys and related configurations. Note that due to region availability of different classes of models, more than 1 Azure AI Project deployment might be required.
    • AZURE_AI_API_GPT4TURBOVISION_DEPLOYMENT_NAME: Deployment of GPT-4 Turbo with Vision.
    • AZURE_AI_API_GPT35TURBO_DEPLOYMENT_NAME: Deployment of GPT3.5 Turbo with JSON mode.
    • AZURE_AI_API_GPT35TURBO16K_DEPLOYMENT_NAME: Deployment of GPT-3.5 with 16k max request tokens.
    • AZURE_AI_API_GPT4TURBOVISION_KEY: GPT-4 Turbo with Vision API Key
    • AZURE_AI_API_GPT35TURBO_KEY: GPT3.5 Turbo with JSON mode and GPT-3.5 with 16k API Key
    • AZURE_AI_API_GPT35TURBO_INSTANCE_NAME: GPT-4 Turbo with Vision API Key Instance Name
    • AZURE_AI_API_GPT4TURBOVISION_INSTANCE_NAME: GPT3.5 Turbo with JSON mode and GPT-3.5 with 16k Instance Name

You can also explicitly provide the variables as part of the constructor of the NavAIGuide class.

NavAIGuide Agent

The NavAIGuideAgent base class orchestrates the process of performing and reasoning about actions on a web page towards achieving a specified end goal.

Example Playwright Agent scenario:

import { Page } from "@playwright/test";
import { PlaywrightAgent } from "navaiguide-ts";

let navAIGuideAgent = new PlaywrightAgent({
  page: playwrightPage
  openAIApiKey: "API_KEY", // if not provided as process.env
});
Enter fullscreen mode Exit fullscreen mode
const findResearchPaperQuery = "Help me view the research paper titled 'Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V' and download its pdf.";

const results = await navAIGuideAgent.runAsync({
  query: findResearchPaperQuery
});

for (const result of results) {
  console.log(result);
}
Enter fullscreen mode Exit fullscreen mode

See a demo here.

Top comments (0)