Overview

AI Action allows you to perform UI operations on your box using natural language instructions. Instead of manually specifying coordinates and click types, simply describe what you want to do in plain English.

This is a beta feature currently in testing phase. AI Action is currently only supported on Android boxes. Support for other box types (Linux) is coming soon. Please note that some functionalities might experience occasional instability as we continue to improve this feature.

How It Works

The AI will automatically:

  • Analyze your instruction
  • Identify the target UI elements
  • Execute the appropriate action on the box

Common Use Cases

You can use natural language for various UI operations such as:

  • "click the login button"
  • "scroll down to find settings"
  • "input my email address"
  • "swipe left to the next page"
  • "tap on the search icon"

Quickstart

Getting Started

To use AI Action, you’ll need to create a box first, then call the box.action.ai() method with your natural language instruction. The AI will analyze your request and perform the corresponding UI operation.

Basic Usage

Here’s a complete example showing how to use AI Action:

import GboxSDK from "gbox-sdk";

const gboxSDK = new GboxSDK({
  apiKey: process.env["GBOX_API_KEY"], // This is the default and can be omitted
});

async function main() {
  const box = await gboxSDK.create({ type: "android" });

  // Simple instruction
  console.info("Running simple instruction");
  await box.action.ai("Open the Chrome browser");

  // Instruction with context
  console.info("Running instruction with context");
  await box.action.ai({
    instruction: "Open the Chrome browser",
    background:
      "You are a QA engineer. You are testing the login page of the application.",
  });

  console.info("Taking screenshot");
  await box.action.screenshot({
    path: "screenshot.png",
  });
}

main();

Understanding the Parameters

The AI Action method accepts either:

  1. Simple string: Just pass your instruction directly

    await box.action.ai("tap on the settings icon");
    
  2. Object with context: Provide additional background information to help the AI make better decisions

    await box.action.ai({
      instruction: "select the premium plan",
      background: "The user wants to upgrade their subscription",
    });
    

Best Practices

  • Be specific: Use clear, descriptive language like “click the blue Submit button” instead of just “click button”
  • Provide context: When using the object format, include relevant background information to help the AI understand the scenario
  • Use common UI terms: Terms like “click”, “tap”, “scroll”, “swipe”, “type”, “input” work well
  • Be patient: AI actions may take a few seconds to analyze and execute

What Happens Next

When you call box.action.ai():

  1. The AI captures the current screen state
  2. Analyzes your instruction against the visible UI elements
  3. Determines the best action to take
  4. Executes the action automatically
  5. Returns the result of the operation