> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gbox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Changelog

<Update label="2025-11-19">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    43](https://www.npmjs.com/package/gbox-sdk/v/0.43.0) or later.
  </Tip>

  ## 🖱️ Click with Modifier Keys Support (Linux Only)

  The `click` action now supports modifier keys, enabling keyboard combinations during click operations. This is useful for actions like Shift+Click for multi-select, Control+Click for opening links in new tabs, or other keyboard-mouse combinations.

  <Note>
    This feature is currently only available for **Linux boxes**. Android support is not yet available.
  </Note>

  **Supported Modifier Keys:**

  * `control` - Control key
  * `shift` - Shift key
  * `alt` - Alt key

  ### Example

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "linux" });

  // Shift+Click for multi-select
  await box.click({
    x: 100,
    y: 100,
    modifierKeys: ["shift"]
  });

  // Control+Click to open link in new tab
  await box.click({
    x: 200,
    y: 200,
    modifierKeys: ["control"]
  });

  // Multiple modifiers: Control+Shift+Click
  await box.click({
    x: 300,
    y: 300,
    modifierKeys: ["control", "shift"]
  });
  ```
</Update>

<Update label="2025-11-04">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    42](https://www.npmjs.com/package/gbox-sdk/v/0.42.0) or later.
  </Tip>

  ## 🔍 Action Response with Actual Field

  All UI actions (Click, Tap, Scroll, Swipe, Drag, Long Press, Touch) now return an `actual` field with detailed execution information, showing the exact coordinates and parameters used during action execution.

  ### Example

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "linux" });

  const result = await box.action.drag({
    start: "Chrome App",
    end: "Trash",
  });

  console.log(result);
  ```

  Returns:

  ```json wrap theme={null}
  {
    "message": "Action executed successfully",
    "actionId": "61649fee-1887-4f74-87b8-e50ee6f0c967",
    "actual": {
      "start": { "x": 987, "y": 1039 },
      "end": { "x": 921, "y": 1038 },
      "duration": "500ms"
    }
  }
  ```

  The `actual` field helps you verify action execution, debug automation, and understand how natural language targets are translated into screen coordinates.
</Update>

<Update label="2025-10-23">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    41](https://www.npmjs.com/package/gbox-sdk/v/0.41.0) or later.
  </Tip>

  ### 🤖 Model API for Coordinate Generation

  Introducing the **gbox-handy-1** model - a specialized AI model that generates precise UI element coordinates from screenshots using natural language. The Model API analyzes screenshots and returns exact coordinates for click, drag, and other UI actions based on descriptive targets.

  [Learn more →](api-reference/model/generate-coordinates-for-a-model)

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const result = await gboxSDK.model.call({
  		model: "gbox-handy-1",
      screenshot: "https://gru-activate2-public-assets.s3.us-west-2.amazonaws.com/jessica/screenshot-1759332945616-pu0ovj.png",
      action: {
        type: "click",
        target: "the VSCode app icon on the bottom dock"
      }
    });

    // Returns coordinates for the target element
    console.log(result);
  }

  main();
  ```

  ### ⏳ Keep Alive on Activity

  Automatically extend box expiration time when there's activity on the box. When `keepAlive` is set (e.g., "5m"), any operation (UI Action, File System, Browser, Command, Media, or Run Code) will ensure at least the specified duration remains.

  **Example:** If `keepAlive` is "5m" and the box has 2 minutes remaining, any operation extends it back to 5 minutes.

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({ 
      type: "android",
      config: {
        keepAlive: "30m"  // Set to "0ms" to disable (default)
      }
    });

    // Each operation extends the expiration time
    await box.action.tap({ target: "chrome app" });
    await box.action.screenshot();
    // Box automatically stays alive as long as you're using it
  }

  main();
  ```
</Update>

<Update label="2025-10-15">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    40](https://www.npmjs.com/package/gbox-sdk/v/0.40.0) or later.
  </Tip>

  ## 🤖 Customizable AI Model Selection for UI Actions

  The UI Action system now supports specifying custom AI models through the `options.model` parameter. This enhancement gives you the flexibility to choose different computer vision models based on your specific needs, performance requirements, or accuracy preferences.

  <Note>
    The `model` parameter is only effective when using natural language-driven UI actions (e.g., `click`, `type`, `scroll` with descriptive targets). It does not apply to coordinate-based or other non-AI-driven interactions.
  </Note>

  You can now choose from different AI models (e.g., `openai-computer-use`) for UI element detection by passing the `model` parameter in action options.

  ### Example Usage

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"],
  });

  async function main() {
    const box = await gboxSDK.create({ type: "android" });

    // Use a specific AI model for UI action
    await box.action.click({
      target: "chrome app",
      options: {
        model: "openai-computer-use"
      }
    })
  }

  main()
  ```

  ## 🔌 Appium Integration Support

  Gbox now provides native Appium connection support, enabling seamless integration with the Appium ecosystem for advanced automation workflows. The new `appiumURL()` method generates a ready-to-use Appium connection URL with optimized default configurations.

  ### Key Features

  * **One-Click Connection**: Get an Appium connection URL with a single method call
  * **Pre-configured Options**: Default capabilities and settings optimized for Gbox Android boxes
  * **Full Appium Compatibility**: Use any Appium client library (WebdriverIO, Appium Python Client, etc.)
  * **Advanced Automation**: Access native Appium features like XML layout inspection, element finding, and complex gestures

  ### Use Cases

  * Extract and analyze UI element hierarchies
  * Implement complex automation workflows
  * Integrate with existing Appium-based testing frameworks
  * Debug UI layouts and element properties

  ### Example Usage

  ```typescript wrap theme={null}
  import { remote } from "webdriverio";
  import GboxSDK from "gbox-sdk";
  import * as fs from "fs";
  import * as path from "path";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"],
  });

  async function main() {
    const box = await gboxSDK.create({ type: "android" });

    // Generate Appium connection URL and default options
    const { url, defaultOption } = await box.appiumURL();

    console.log("Appium connection URL:", url);

    // Connect to Appium with defaultOption from backend
    console.log("Connecting to Appium server...");
    const ac = await remote(defaultOption);

    console.log("✅ Successfully connected to Appium server");
    console.log("Session ID:", ac.sessionId);

    try {
      // Get current page XML layout
      console.log("Fetching XML layout...");
      const xmlLayout = await ac.getPageSource();
      
      console.log("✅ XML layout fetched successfully!");
      console.log("XML length:", xmlLayout.length, "characters");
      
      // Display a preview of the XML (first 500 characters)
      console.log("\n--- XML Layout Preview (first 500 chars) ---");
      console.log(xmlLayout.substring(0, 500) + "...\n");
      
      // Save XML to file for easier viewing
      const outputDir = path.join(__dirname, "output");
      if (!fs.existsSync(outputDir)) {
        fs.mkdirSync(outputDir, { recursive: true });
      }
      
      const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
      const filename = `layout_${timestamp}.xml`;
      const filepath = path.join(outputDir, filename);
      
      fs.writeFileSync(filepath, xmlLayout, "utf-8");
      console.log(`📁 XML layout saved to: ${filepath}`);
      
      // Optional: Parse and display some basic info
      const elementMatches = xmlLayout.match(/<[\w.-]+/g);
      if (elementMatches) {
        const elementSet = new Set(elementMatches.map(e => e.substring(1)));
        const uniqueElements = Array.from(elementSet);
        console.log("\n--- UI Elements Found ---");
        console.log("Total elements:", elementMatches.length);
        console.log("Unique element types:", uniqueElements.length);
        console.log("Element types:", uniqueElements.slice(0, 10).join(", "), "...");
      }
      
    } catch (error) {
      console.error(
        "❌ Error fetching XML layout:",
        error instanceof Error ? error.message : String(error)
      );
    } finally {
      console.log("\nClosing session...");
      await ac.deleteSession();
      console.log("Session closed.");
    }
  }

  main()
  ```
</Update>

<Update label="2025-10-11">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    38](https://www.npmjs.com/package/gbox-sdk/v/0.38.0) or later.
  </Tip>

  ## 📍 Enhanced Scroll Action with Natural Language Location Support

  The `scroll` action now supports natural language descriptions for the `location` parameter, allowing you to specify where on the screen to perform the scroll gesture. This enhancement makes it easier to interact with specific areas of the UI, such as scrolling within a particular region or component.

  ### Key Features

  * **Natural Language Location**: Use descriptive text to specify scroll locations (e.g., "screen bottom", "toolbar area", "middle of the screen")
  * **Flexible Targeting**: Perfect for scrolling within specific UI regions or components
  * **Improved Precision**: Better control over scroll behavior in complex layouts

  ### Example Usage

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({
      type: "android",
    });

    // Scroll up from the bottom of the screen
    const result = await box.action.scroll({
      direction: 'up',
      location: "screen bottom",
    });

    console.info(result);
  }
  ```

  ### Additional Examples

  ```typescript wrap theme={null}
  // Scroll down in the toolbar area
  await box.action.scroll({
    direction: 'down',
    location: "toolbar area",
  });

  // Scroll left in the center of the screen
  await box.action.scroll({
    direction: 'left',
    location: "middle of the screen",
  });
  ```
</Update>

<Update label="2025-09-26">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    37](https://www.npmjs.com/package/gbox-sdk/v/0.37.0) or later.
  </Tip>

  ## 🌐 Enhanced Browser API

  New browser API with full-screen support and window control options, designed to prevent AI agents from accidentally closing or minimizing browsers.

  ### Key Features & Use Cases

  * **🔄 Full-Screen Control**: Seamlessly maximize browser windows for immersive experiences
  * **🪟 Window Control Management**: Hide browser minimize, maximize, and close buttons to prevent AI or automation from accidentally closing browsers
  * **🤖 AI Agent Automation**: Perfect for AI agents and LLM applications requiring focused browser sessions

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";
  import { chromium } from "playwright";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({ type: "linux" });

    const result = await box.browser.open({
      maximize: true,
      showControls: false,
    })

    const browser = await chromium.connectOverCDP(result.cdpUrl);

    // Use the browser as usual
    const context = await browser.contexts()[0];
    const page = await context.newPage();
    await page.goto("https://example.com");

    // Perform actions on the page
    console.log(await page.title());

    await box.browser.close()
  }

  main();
  ```
</Update>

<Update label="2025-09-25">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    36](https://www.npmjs.com/package/gbox-sdk/v/0.36.0) or later.
  </Tip>

  ## 🔍 UI Elements Detection

  Automatically detect and extract interactive elements from web pages for precise automation and AI-driven interactions.

  <Warning>
    Currently only supported for **Linux boxes with browser**. Android support coming soon.
  </Warning>

  ### Key Features

  * **Element Detection**: Identifies buttons, links, inputs, and other interactive elements
  * **Rich Metadata**: Position, size, text content, and HTML attributes
  * **Annotated Screenshots**: Returns screenshots with visual annotations highlighting detected elements
  * **AI Ready**: Perfect for LLM integration and intelligent automation

  ### Usage Example

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({ type: "linux" });

    await box.browser.openTab({
      url: "https://gbox.ai",
    });

    const { screenshot, elements } = await box.action.elements.detect({
      screenshot: {
        outputFormat: 'storageKey'
      }
    });

    console.info(`Screenshot: ${JSON.stringify(screenshot, null, 2)}`);

    console.info(`Detected elements length: ${elements.list().length}`);

    // You can send the screenshot to an LLM or Agent to decide which element to click
    // here we just click the first element
    const firstElement = elements.get("1");
    await box.action.click({
      target: firstElement,
    });

    console.info(
      `Clicked element: ${JSON.stringify(firstElement, null, 2)}`
    );
  }

  main();
  ```
</Update>

<Update label="2025-09-19">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    35](https://www.npmjs.com/package/gbox-sdk/v/0.35.0) or later.
  </Tip>

  ## 📋 Clipboard Support

  New clipboard management capabilities for Android boxes, enabling programmatic control over device clipboard content.

  ### Key Features

  * **Set/Get Content**: Read and write text to device clipboard
  * **Automation Integration**: Seamlessly integrate with UI automation workflows
  * **Cross-App Data**: Share data between applications through clipboard

  ### Usage Example

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({ type: "android" });

    // Set clipboard content
    await box.action.clipboard.set("Hello, world!");

    // Get current clipboard content
    const clipboardContent = await box.action.clipboard.get();
    console.log("Clipboard content:", clipboardContent);
  }

  main();
  ```
</Update>

<Update label="2025-09-12">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    34](https://www.npmjs.com/package/gbox-sdk/v/0.34.0) or later.
  </Tip>

  ## 📸 Screenshot ScrollCapture

  Enhanced screenshot functionality now supports automatic scrolling to capture tall content like long web pages, documents, or chat conversations in a single image.

  ### Key Features

  * **Automatic Scrolling**: Intelligently scrolls through content to capture everything in one screenshot
  * **Height Control**: Configurable maximum height to manage memory usage and file size
  * **Position Restoration**: Optional scroll-back functionality to return to original position
  * **Memory Optimization**: Built-in limits to prevent excessive memory consumption

  ### Configuration Options

  * **`maxHeight`**: Maximum height in pixels (default: 4000px) - limits the total height of captured content
  * **`scrollBack`**: Whether to scroll back to original position after capture (default: false)

  ### Usage Example

  ```typescript theme={null}
  const result = await box.action.screenshot({
    scrollCapture: {
      maxHeight: 5000,
      scrollBack: true
    }
  })
  ```

  ### Use Cases

  * **Web Page Documentation**: Capture entire web pages for documentation or analysis
  * **Chat History**: Save complete conversation threads
  * **Long Documents**: Capture full document content in one screenshot
  * **Social Media Feeds**: Capture extended social media timelines
  * **App Content**: Document complete app screens or settings pages
</Update>

<Update label="2025-09-05">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    31](https://www.npmjs.com/package/gbox-sdk/v/0.31.0) or later.
  </Tip>

  ## 🔄 Rewind Recording

  New Rewind functionality that automatically preserves the last 5 minutes of screen recording, allowing you to extract video clips from any time period at any moment.

  ### Key Features

  * **Automatic Recording**: Continuously preserves up to 5 minutes of recent box screen recording
  * **Flexible Extraction**: Extract video clips from any time period (up to 5 minutes maximum)
  * **Instant Access**: No need to pre-start recording - access historical clips anytime

  ### Use Cases

  * **Debug Analysis**: Review recent operation recordings when automation scripts encounter issues
  * **Error Reproduction**: Quickly capture screen recordings from before problems occurred for easier troubleshooting
  * **Operation Logging**: Automatically save operation history without manual recording management
  * **Performance Monitoring**: Observe application behavior during specific operations

  ### Performance Considerations

  <Warning>
    Enabling Rewind functionality will have some performance impact. It's recommended to enable only when needed and disable when not required to optimize performance.
  </Warning>

  ### Basic Usage

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({
      type: "android",
      config: {
        deviceType: "physical",
      },
    });

    // Enable Rewind functionality
    await box.action.recording.rewind.enable();

    // Perform some operations...
    await box.action.tap({ target: "chrome app" });
    await box.action.swipe({ direction: "up" });

    // Extract the last 10 seconds of recording
    const result = await box.action.recording.rewind.extract({
      duration: "10s"
    });

    console.log("Recording result:", result);
    console.log("Download URL:", result.presignedUrl);

    // Disable Rewind functionality when not needed to save performance
    // await box.action.recording.rewind.disable();
  }

  main();
  ```

  ### Advanced Usage

  ```typescript wrap theme={null}
  // Extract different time periods
  const shortClip = await box.action.recording.rewind.extract({
    duration: "5s"  // Last 5 seconds
  });

  const longClip = await box.action.recording.rewind.extract({
    duration: "2m"  // Last 2 minutes
  });

  const maxClip = await box.action.recording.rewind.extract({
    duration: "5m"  // Last 5 minutes (maximum)
  });
  ```

  ## 📸 Save Screenshots to Album

  Enhanced screenshot functionality now supports saving screenshots directly to the device's media album, making it easier to organize and access captured images.

  ### Key Features

  * **Album Integration**: Screenshots are automatically saved to the device's media gallery
  * **Easy Access**: Screenshots can be accessed through the device's photo app
  * **Organized Storage**: All screenshots are properly categorized in the media library

  ### Usage

  ```typescript wrap theme={null}
  import GboxSDK from "gbox-sdk";

  const gboxSDK = new GboxSDK({
    apiKey: process.env["GBOX_API_KEY"] // This is the default and can be omitted
  });

  async function main() {
    const box = await gboxSDK.create({ type: "android" });

    // Take a screenshot and save it to the device album
    await box.action.screenshot({
      saveToAlbum: true,
    });

    // Screenshot is now available in the device's photo gallery
  }

  main();
  ```
</Update>

<Update label="2025-08-29">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    30](https://www.npmjs.com/package/gbox-sdk/v/0.30.0) or later.
  </Tip>

  ### 🖼️ Enhanced Screenshot Options

  New unified screenshot configuration with `options.screenshot` parameter for better control over screenshot behavior.

  #### Basic Usage

  ```typescript wrap theme={null}
  // Simple boolean
  await box.action.click({
    x: 100,
    y: 100,
    options: {
      screenshot: true,
    },
  });

  // Detailed configuration
  await box.action.click({
    x: 100,
    y: 100,
    options: {
      screenshot: {
        outputFormat: "storageKey",
        presignedExpiresIn: "1h",
        delay: "1s",
        phases: ["before", "after"],
      },
    },
  });
  ```

  #### Key Features

  **📸 Screenshot Phases**

  * `before`: Screenshot before the action
  * `after`: Screenshot after the action
  * `trace`: Screenshot with operation trace
  * Default captures all three phases

  **⏱️ Configurable Delay**

  * `delay`: Wait time after action before taking final screenshot
  * Default: `500ms`, Maximum: `30s`

  **🔄 Output Format**

  * `base64`: Direct image data (default)
  * `storageKey`: Storage key with presigned URL access
  * `presignedExpiresIn`: Custom expiration for storageKey URLs (default: `30m`)

  #### Usage Examples

  **Capture specific phases:**

  ```typescript wrap theme={null}
  await box.action.click({
    target: "login button",
    options: {
      screenshot: {
        phases: ["before", "after"],
      },
    },
  });
  ```

  **Custom delay for UI state capture:**

  ```typescript wrap theme={null}
  await box.action.click({
    target: "submit button",
    options: {
      screenshot: {
        delay: "2s",
        phases: ["after"],
      },
    },
  });
  ```

  **Disable screenshots:**

  ```typescript wrap theme={null}
  await box.action.click({
    target: "button",
    options: {
      screenshot: false,
    },
  });
  ```

  <Warning>
    Please use the new `options.screenshot` parameter instead of the old
    screenshot fields, as they will be deprecated in future versions.
  </Warning>
</Update>

<Update label="2025-08-25">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    29](https://www.npmjs.com/package/gbox-sdk/v/0.29.0) or later.
  </Tip>

  ### 🎯 Natural Language UI Actions

  UI Actions now support natural language descriptions for targets and locations, making automation more intuitive and human-readable.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  // Tap on app icons or UI elements using natural language
  await box.action.tap({
    target: "chrome app",
  });

  await box.action.click({
    target: "login button",
  });

  // Swipe with natural language location descriptions
  await box.action.swipe({
    direction: "up",
    distance: 300,
    duration: "500ms",
    location: "screen bottom",
  });

  // Drag and drop using natural language
  await box.action.drag({
    start: "Chrome App",
    end: "Trash",
  });

  // Long press with natural language targets
  await box.action.longPress({
    target: "Chrome icon",
    duration: "1s",
  });
  ```

  ### 📏 Action Scale Settings

  Added support for scaling UI actions to adjust the size of screenshots and coordinate calculations without changing the actual screen resolution.

  #### Key Features

  * **Scale Range**: 0.1 to 1.0 (10% to 100% of original size)
  * **Screenshot Scaling**: Output screenshots are scaled according to the setting
  * **Coordinate Scaling**: All action coordinates and distances are automatically scaled
  * **No Resolution Change**: The box's actual screen resolution remains unchanged

  #### Usage Examples

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  // Get current action settings
  const currentSettings = await box.action.getSettings();
  console.log("Current settings:", currentSettings);

  // Set scale to 50% for smaller screenshots and scaled coordinates
  await box.action.updateSettings({
    scale: 0.5,
  });

  // With scale = 0.5, this click at (100, 100) becomes equivalent to (50, 50) at full scale
  await box.action.click({
    x: 100,
    y: 100,
  });

  // Reset all settings to default values
  await box.action.resetSettings();

  // Verify settings have been reset
  const resetSettings = await box.action.getSettings();
  console.log("Settings after reset:", resetSettings);
  ```

  #### Scale Examples

  | Scale Value     | Screenshot Size | Coordinate Example                 |
  | --------------- | --------------- | ---------------------------------- |
  | `1.0` (default) | Full size       | `Click({x: 100, y: 100})`          |
  | `0.5`           | 50% size        | `Click({x: 50, y: 50})` equivalent |
  | `0.25`          | 25% size        | `Click({x: 25, y: 25})` equivalent |

  **Note**: Scale affects both the output screenshot dimensions and the coordinate system for all UI actions, making it useful for optimizing performance and storage when full resolution isn't needed.

  ### 🌐 Proxy Configuration

  Added proxy configuration support for Android boxes.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  // Set proxy
  await box.proxy.set({
    host: "127.0.0.1",
    port: 9090,
  });

  // Get current proxy settings
  console.info(await box.proxy.get());

  // Clear proxy
  await box.proxy.clear();
  ```
</Update>

<Update label="2025-08-15">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    27](https://www.npmjs.com/package/gbox-sdk/v/0.27.0) or later.
  </Tip>

  ### 🖱️ UI Actions

  * **Added**: `tap` and `longPress` actions for precise coordinate taps and long-press interactions.
  * **Enhanced**: `swipe` / `scroll` now support semantic `distance` values (`"tiny" | "short" | "medium" | "long"`), so you no longer need to provide pixel values.

  #### New action: tap

  ```typescript theme={null}
  const box = await gboxSDK.create({ type: "android" });

  await box.action.tap({
    x: 100,
    y: 100,
  });
  ```

  #### New action: longPress

  ```typescript theme={null}
  const box = await gboxSDK.create({ type: "android" });

  await box.action.longPress({
    x: 100,
    y: 100,
  });
  ```

  #### Semantic distance (swipe/scroll)

  ```typescript theme={null}
  const box = await gboxSDK.create({ type: "android" });

  await box.action.swipe({
    direction: "up",
    distance: "long",
  });

  await box.action.scroll({
    direction: "up",
    distance: "long",
  });
  ```

  * Docs: [`tap`](/api-reference/ui-action/tap) · [`longPress`](/api-reference/ui-action/long-press)
</Update>

<Update label="2025-08-08">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    26](https://www.npmjs.com/package/gbox-sdk/v/0.26.0) or later.
  </Tip>

  ### 🎥 Android Screen Recording

  Added screen recording capabilities to Android boxes for creating tutorials and debugging UI interactions.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  console.info("start recording");

  await box.action.screenRecordingStart();

  console.info("swipe up");

  await box.action.swipe({
    direction: "up",
  });

  console.info("sleep 5 seconds...");

  // you can do anything you want here
  await new Promise((resolve) => setTimeout(resolve, 5000));

  const result = await box.action.screenRecordingStop();

  // you can download the video from the result
  console.info(`recording result: ${JSON.stringify(result, null, 2)}`);
  ```

  ### 📱 Android Media Management

  Added media file management capabilities for Android devices, including album listing and media operations.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  const albums = await box.media.listAlbums();

  console.info(albums);
  ```

  For more media API endpoints, see [Media API Docs](/api-reference/media/list-albums).

  ### 🪝 AI Action Progress Callbacks

  Since AI Action execution takes time, we provide a series of callbacks to monitor the progress of the entire operation process.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  await box.action.ai("open the youtube app", {
    onActionStart: () => {
      console.info("action start");
    },
    onActionEnd: () => {
      console.info("action end");
    },
  });
  ```
</Update>

<Update label="2025-07-29">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    25](https://www.npmjs.com/package/gbox-sdk/v/0.25.0) or later.
  </Tip>

  ### UI Action Support for StorageKey Output Format

  UI Action now supports `outputFormat: "storageKey"`, allowing GBOX.AI to directly store screenshot information. Compared to returning image data directly, StorageKey provides more flexible storage and access options:

  * **presigned URL**: System-generated temporary access link with a default validity period of 30 minutes
  * **storageKey**: A storage key that remains valid throughout the box lifecycle (until the box is deleted)
  * **Custom presigned URL**: Use `createPresignedUrl` to create presigned URLs with specified expiration times, convenient for returning to LLM models for image analysis

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  const res = await box.action.screenshot({
    outputFormat: "storageKey",
  });

  console.info(`presigned url: ${res.presignedUrl}`);

  // create custom presigned url
  const customPresignedUrl = await box.storage.createPresignedUrl({
    storageKey: res.uri,
    expiresIn: "1h",
  });

  console.info(`custom presigned url: ${customPresignedUrl}`);
  ```

  ### Android Multiple APK Installation Support

  Added support for installing multiple APK files using ZIP archives, enabling installation of split APKs and bundle applications.

  #### Installation Modes

  **Single APK Installation** (default):

  * Upload and install a single APK file
  * Traditional installation method for standalone applications

  **Multiple APK Installation** (ZIP-based):

  * Upload a ZIP archive containing multiple APK files
  * Automatically extracts and installs all APK files in the correct order
  * Essential for modern Android apps that use App Bundle distribution

  #### ZIP Archive Structure

  When using multiple APK installation, organize your files as follows:

  ```
  app-bundle.zip
  └── app-folder/
      ├── base.apk           (base application)
      ├── config.arm64_v8a.apk  (architecture-specific)
      ├── config.en.apk      (language resources)
      └── config.xxxhdpi.apk (density-specific resources)
  ```

  #### Usage Examples

  **Single APK Installation:**

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  const app = await box.app.install({
    apk: "/path/to/single-app.apk",
  });

  await app.open();
  ```

  **Multiple APK Installation (ZIP):**

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  const app = await box.app.install({
    apk: "/path/to/app-bundle.zip",
  });

  await app.open();
  ```
</Update>

<Update label="2025-07-21">
  <Tip>
    These features require upgrading to [TypeScript SDK version
    22](https://www.npmjs.com/package/gbox-sdk/v/0.22.0) or later.
  </Tip>

  ### Android Box Command Support

  Added direct command execution support for Android boxes, enabling full system-level control through shell commands.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "android" });

  await box.command({
    commands: ["logcat"],
    onStdout: (data) => {
      console.info("---stdout---");
      console.info(data);
    },
    onStderr: (data) => {
      console.info("---stderr---");
      console.error(data);
    },
  });
  ```

  ### Linux Box Support for UI Action / AI Action

  Added UI automation and AI-powered action capabilities to Linux boxes, enabling desktop application interaction and programmatic UI operations.

  ```typescript wrap theme={null}
  await gboxSDK.create({ type: "linux" });

  await box.action.click({
    x: 100,
    y: 100,
  });

  await box.action.ai("Open the Chrome browser");
  ```

  ### Linux Box Command / Run Code Streaming Support

  Added real-time streaming support for command execution and code running on Linux boxes, providing immediate feedback and live output monitoring.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "linux" });

  const result = await box.command({
    commands: ["for i in {10..1}; do echo $i; echo $i >&2; sleep 1; done"],
    onStdout: (data) => {
      console.info("---stdout---");
      console.info(data);
    },
    onStderr: (data) => {
      console.info("---stderr---");
      console.error(data);
    },
  });

  console.info(result);

  const result2 = await box.runCode({
    code: `
  console.log("something xxxx");
  console.error("something error")
  `,
    language: "typescript",
    onStdout: (data) => {
      console.info("stdout");
      console.info(data);
    },
    onStderr: (data) => {
      console.info("stderr");
      console.error(data);
    },
  });

  console.info(result);
  ```

  ### Browser Action Support

  Added browser automation capabilities to Linux boxes, enabling programmatic control and interaction with web browsers and tabs.

  ```typescript wrap theme={null}
  const box = await gboxSDK.create({ type: "linux" });

  const tabs = await box.browser.listTabInfo();

  console.info(tabs);
  ```
</Update>
