2025-08-29
These features require upgrading to TypeScript SDK version 30 or later.

🖼️ Enhanced Screenshot Options

New unified screenshot configuration with options.screenshot parameter for better control over screenshot behavior.

Basic Usage

// Simple boolean
await box.action.click({
   x: 100,
   y: 100,
   options: {
      screenshot: true
   }   
})

// Detailed configuration
await box.action.click({
   x: 100,
   y: 100,
   options: {
      screenshot: {
        outputFormat: "storageKey",
        presignedExpiresIn: "1h",
        delay: "1s",
        phases: ["before", "after"]
      }
   }   
})

Key Features

📸 Screenshot Phases
  • before: Screenshot before the action
  • after: Screenshot after the action
  • trace: Screenshot with operation trace
  • Default captures all three phases
⏱️ Configurable Delay
  • delay: Wait time after action before taking final screenshot
  • Default: 500ms, Maximum: 30s
🔄 Output Format
  • base64: Direct image data (default)
  • storageKey: Storage key with presigned URL access
  • presignedExpiresIn: Custom expiration for storageKey URLs (default: 30m)

Usage Examples

Capture specific phases:
await box.action.click({
   target: "login button",
   options: {
      screenshot: {
        phases: ["before", "after"]
      }
   }
})
Custom delay for UI state capture:
await box.action.click({
   target: "submit button",
   options: {
      screenshot: {
        delay: "2s",
        phases: ["after"]
      }
   }
})
Disable screenshots:
await box.action.click({
   target: "button",
   options: {
      screenshot: false
   }
})
Please use the new options.screenshot parameter instead of the old screenshot fields, as they will be deprecated in future versions.
2025-08-25
These features require upgrading to TypeScript SDK version 29 or later.

🎯 Natural Language UI Actions

UI Actions now support natural language descriptions for targets and locations, making automation more intuitive and human-readable.
const box = await gboxSDK.create({ type: "android" });

// Tap on app icons or UI elements using natural language
await box.action.tap({
  target: "chrome app"
});

await box.action.click({
  target: "login button"
});

// Swipe with natural language location descriptions
await box.action.swipe({
  direction: "up",
  distance: 300,
  duration: "500ms",
  location: "screen bottom"
});

// Drag and drop using natural language
await box.action.drag({
  start: "Chrome App",
  end: "Trash",
});

// Long press with natural language targets
await box.action.longPress({
  target: "Chrome icon",
  duration: "1s"
})

📏 Action Scale Settings

Added support for scaling UI actions to adjust the size of screenshots and coordinate calculations without changing the actual screen resolution.

Key Features

  • Scale Range: 0.1 to 1.0 (10% to 100% of original size)
  • Screenshot Scaling: Output screenshots are scaled according to the setting
  • Coordinate Scaling: All action coordinates and distances are automatically scaled
  • No Resolution Change: The box’s actual screen resolution remains unchanged

Usage Examples

const box = await gboxSDK.create({ type: "android" });

// Get current action settings
const currentSettings = await box.action.getSettings();
console.log('Current settings:', currentSettings);

// Set scale to 50% for smaller screenshots and scaled coordinates
await box.action.updateSettings({
  scale: 0.5
});

// With scale = 0.5, this click at (100, 100) becomes equivalent to (50, 50) at full scale
await box.action.click({
  x: 100,
  y: 100
});

// Reset all settings to default values
await box.action.resetSettings();

// Verify settings have been reset
const resetSettings = await box.action.getSettings();
console.log('Settings after reset:', resetSettings);

Scale Examples

Scale ValueScreenshot SizeCoordinate Example
1.0 (default)Full sizeClick({x: 100, y: 100})
0.550% sizeClick({x: 50, y: 50}) equivalent
0.2525% sizeClick({x: 25, y: 25}) equivalent
Note: Scale affects both the output screenshot dimensions and the coordinate system for all UI actions, making it useful for optimizing performance and storage when full resolution isn’t needed.

🌐 Proxy Configuration

Added proxy configuration support for Android boxes.
const box = await gboxSDK.create({ type: "android" });

// Set proxy
await box.proxy.set({
  host: "127.0.0.1",
  port: 9090,
});

// Get current proxy settings
console.info(await box.proxy.get());

// Clear proxy
await box.proxy.clear();
2025-08-15
These features require upgrading to TypeScript SDK version 27 or later.

🖱️ UI Actions

  • Added: tap and longPress actions for precise coordinate taps and long-press interactions.
  • Enhanced: swipe / scroll now support semantic distance values ("tiny" | "short" | "medium" | "long"), so you no longer need to provide pixel values.

New action: tap

const box = await gboxSDK.create({ type: "android" });

await box.action.tap({
  x: 100,
  y: 100,
});

New action: longPress

const box = await gboxSDK.create({ type: "android" });

await box.action.longPress({
  x: 100,
  y: 100,
});

Semantic distance (swipe/scroll)

const box = await gboxSDK.create({ type: "android" });

await box.action.swipe({
  direction: "up",
  distance: "long",
});

await box.action.scroll({
  direction: "up",
  distance: "long",
});
2025-08-08
These features require upgrading to TypeScript SDK version 26 or later.

🎥 Android Screen Recording

Added screen recording capabilities to Android boxes for creating tutorials and debugging UI interactions.
const box = await gboxSDK.create({ type: "android" });

console.info("start recording")

await box.action.screenRecordingStart();

console.info("swipe up")

await box.action.swipe({
  direction: "up"
})

console.info("sleep 5 seconds...")

// you can do anything you want here 
await new Promise((resolve) => setTimeout(resolve, 5000));

const result = await box.action.screenRecordingStop();

// you can download the video from the result
console.info(`recording result: ${JSON.stringify(result, null, 2)}`)

📱 Android Media Management

Added media file management capabilities for Android devices, including album listing and media operations.
const box = await gboxSDK.create({ type: "android" });

const albums = await box.media.listAlbums()

console.info(albums)
For more media API endpoints, see Media API Docs.

🪝 AI Action Progress Callbacks

Since AI Action execution takes time, we provide a series of callbacks to monitor the progress of the entire operation process.
const box = await gboxSDK.create({ type: "android" });

await box.action.ai("open the youtube app", {
  onActionStart: () => {
    console.info("action start")
  },
  onActionEnd: () => {
    console.info("action end")
  }
})
2025-07-29
These features require upgrading to TypeScript SDK version 25 or later.

UI Action Support for StorageKey Output Format

UI Action now supports outputFormat: "storageKey", allowing GBOX.AI to directly store screenshot information. Compared to returning image data directly, StorageKey provides more flexible storage and access options:
  • presigned URL: System-generated temporary access link with a default validity period of 30 minutes
  • storageKey: A storage key that remains valid throughout the box lifecycle (until the box is deleted)
  • Custom presigned URL: Use createPresignedUrl to create presigned URLs with specified expiration times, convenient for returning to LLM models for image analysis
const box = await gboxSDK.create({ type: "android" });

const res = await box.action.screenshot({
  outputFormat: "storageKey"
})

console.info(`presigned url: ${res.presignedUrl}`)

// create custom presigned url
const customPresignedUrl = await box.storage.createPresignedUrl({
  storageKey: res.uri,
  expiresIn: "1h"
});

console.info(`custom presigned url: ${customPresignedUrl}`)

Android Multiple APK Installation Support

Added support for installing multiple APK files using ZIP archives, enabling installation of split APKs and bundle applications.

Installation Modes

Single APK Installation (default):
  • Upload and install a single APK file
  • Traditional installation method for standalone applications
Multiple APK Installation (ZIP-based):
  • Upload a ZIP archive containing multiple APK files
  • Automatically extracts and installs all APK files in the correct order
  • Essential for modern Android apps that use App Bundle distribution

ZIP Archive Structure

When using multiple APK installation, organize your files as follows:
app-bundle.zip
└── app-folder/
    ├── base.apk           (base application)
    ├── config.arm64_v8a.apk  (architecture-specific)
    ├── config.en.apk      (language resources)
    └── config.xxxhdpi.apk (density-specific resources)

Usage Examples

Single APK Installation:
const box = await gboxSDK.create({ type: "android" });

const app = await box.app.install({
  apk: "/path/to/single-app.apk"
});

await app.open();
Multiple APK Installation (ZIP):
const box = await gboxSDK.create({ type: "android" });

const app = await box.app.install({
  apk: "/path/to/app-bundle.zip"
});

await app.open();
2025-07-21
These features require upgrading to TypeScript SDK version 22 or later.

Android Box Command Support

Added direct command execution support for Android boxes, enabling full system-level control through shell commands.
const box = await gboxSDK.create({ type: "android" });

await box.command({
  commands: ["logcat"],
  onStdout: (data) => {
    console.info("---stdout---");
    console.info(data);
  },
  onStderr: (data) => {
    console.info("---stderr---");
    console.error(data);
  },
});

Linux Box Support for UI Action / AI Action

Added UI automation and AI-powered action capabilities to Linux boxes, enabling desktop application interaction and programmatic UI operations.
await gboxSDK.create({ type: "linux" });

await box.action.click({
  x: 100,
  y: 100,
});

await box.action.ai("Open the Chrome browser");

Linux Box Command / Run Code Streaming Support

Added real-time streaming support for command execution and code running on Linux boxes, providing immediate feedback and live output monitoring.
const box = await gboxSDK.create({ type: "linux" });

const result = await box.command({
  commands: ["for i in {10..1}; do echo $i; echo $i >&2; sleep 1; done"],
  onStdout: (data) => {
    console.info("---stdout---");
    console.info(data);
  },
  onStderr: (data) => {
    console.info("---stderr---");
    console.error(data);
  },
});

console.info(result);

const result2 = await box.runCode({
  code: `
console.log("something xxxx");
console.error("something error")
`,
  language: "typescript",
  onStdout: (data) => {
    console.info("stdout");
    console.info(data);
  },
  onStderr: (data) => {
    console.info("stderr");
    console.error(data);
  },
});

console.info(result);

Browser Action Support

Added browser automation capabilities to Linux boxes, enabling programmatic control and interaction with web browsers and tabs.
const box = await gboxSDK.create({ type: "linux" });

const tabs = await box.browser.listTabInfo()

console.info(tabs);