2025-10-15
These features require upgrading to TypeScript SDK version
40 or later.
🤖 Customizable AI Model Selection for UI Actions
The UI Action system now supports specifying custom AI models through theoptions.model
parameter. This enhancement gives you the flexibility to choose different computer vision models based on your specific needs, performance requirements, or accuracy preferences.The
model
parameter is only effective when using natural language-driven UI actions (e.g., click
, type
, scroll
with descriptive targets). It does not apply to coordinate-based or other non-AI-driven interactions.openai-computer-use
) for UI element detection by passing the model
parameter in action options.Example Usage
🔌 Appium Integration Support
Gbox now provides native Appium connection support, enabling seamless integration with the Appium ecosystem for advanced automation workflows. The newappiumURL()
method generates a ready-to-use Appium connection URL with optimized default configurations.Key Features
- One-Click Connection: Get an Appium connection URL with a single method call
- Pre-configured Options: Default capabilities and settings optimized for Gbox Android boxes
- Full Appium Compatibility: Use any Appium client library (WebdriverIO, Appium Python Client, etc.)
- Advanced Automation: Access native Appium features like XML layout inspection, element finding, and complex gestures
Use Cases
- Extract and analyze UI element hierarchies
- Implement complex automation workflows
- Integrate with existing Appium-based testing frameworks
- Debug UI layouts and element properties
Example Usage
2025-10-11
These features require upgrading to TypeScript SDK version
38 or later.
📍 Enhanced Scroll Action with Natural Language Location Support
Thescroll
action now supports natural language descriptions for the location
parameter, allowing you to specify where on the screen to perform the scroll gesture. This enhancement makes it easier to interact with specific areas of the UI, such as scrolling within a particular region or component.Key Features
- Natural Language Location: Use descriptive text to specify scroll locations (e.g., “screen bottom”, “toolbar area”, “middle of the screen”)
- Flexible Targeting: Perfect for scrolling within specific UI regions or components
- Improved Precision: Better control over scroll behavior in complex layouts
Example Usage
Additional Examples
2025-09-26
These features require upgrading to TypeScript SDK version
37 or later.
🌐 Enhanced Browser API
New browser API with full-screen support and window control options, designed to prevent AI agents from accidentally closing or minimizing browsers.Key Features & Use Cases
- 🔄 Full-Screen Control: Seamlessly maximize browser windows for immersive experiences
- 🪟 Window Control Management: Hide browser minimize, maximize, and close buttons to prevent AI or automation from accidentally closing browsers
- 🤖 AI Agent Automation: Perfect for AI agents and LLM applications requiring focused browser sessions
2025-09-25
These features require upgrading to TypeScript SDK version
36 or later.
🔍 UI Elements Detection
Automatically detect and extract interactive elements from web pages for precise automation and AI-driven interactions.Currently only supported for Linux boxes with browser. Android support coming soon.
Key Features
- Element Detection: Identifies buttons, links, inputs, and other interactive elements
- Rich Metadata: Position, size, text content, and HTML attributes
- Annotated Screenshots: Returns screenshots with visual annotations highlighting detected elements
- AI Ready: Perfect for LLM integration and intelligent automation
Usage Example
2025-09-19
These features require upgrading to TypeScript SDK version
35 or later.
📋 Clipboard Support
New clipboard management capabilities for Android boxes, enabling programmatic control over device clipboard content.Key Features
- Set/Get Content: Read and write text to device clipboard
- Automation Integration: Seamlessly integrate with UI automation workflows
- Cross-App Data: Share data between applications through clipboard
Usage Example
2025-09-12
These features require upgrading to TypeScript SDK version
34 or later.
📸 Screenshot ScrollCapture
Enhanced screenshot functionality now supports automatic scrolling to capture tall content like long web pages, documents, or chat conversations in a single image.Key Features
- Automatic Scrolling: Intelligently scrolls through content to capture everything in one screenshot
- Height Control: Configurable maximum height to manage memory usage and file size
- Position Restoration: Optional scroll-back functionality to return to original position
- Memory Optimization: Built-in limits to prevent excessive memory consumption
Configuration Options
maxHeight
: Maximum height in pixels (default: 4000px) - limits the total height of captured contentscrollBack
: Whether to scroll back to original position after capture (default: false)
Usage Example
Use Cases
- Web Page Documentation: Capture entire web pages for documentation or analysis
- Chat History: Save complete conversation threads
- Long Documents: Capture full document content in one screenshot
- Social Media Feeds: Capture extended social media timelines
- App Content: Document complete app screens or settings pages
2025-09-05
These features require upgrading to TypeScript SDK version
31 or later.
🔄 Rewind Recording
New Rewind functionality that automatically preserves the last 5 minutes of screen recording, allowing you to extract video clips from any time period at any moment.Key Features
- Automatic Recording: Continuously preserves up to 5 minutes of recent box screen recording
- Flexible Extraction: Extract video clips from any time period (up to 5 minutes maximum)
- Instant Access: No need to pre-start recording - access historical clips anytime
Use Cases
- Debug Analysis: Review recent operation recordings when automation scripts encounter issues
- Error Reproduction: Quickly capture screen recordings from before problems occurred for easier troubleshooting
- Operation Logging: Automatically save operation history without manual recording management
- Performance Monitoring: Observe application behavior during specific operations
Performance Considerations
Enabling Rewind functionality will have some performance impact. It’s recommended to enable only when needed and disable when not required to optimize performance.
Basic Usage
Advanced Usage
📸 Save Screenshots to Album
Enhanced screenshot functionality now supports saving screenshots directly to the device’s media album, making it easier to organize and access captured images.Key Features
- Album Integration: Screenshots are automatically saved to the device’s media gallery
- Easy Access: Screenshots can be accessed through the device’s photo app
- Organized Storage: All screenshots are properly categorized in the media library
Usage
2025-08-29
These features require upgrading to TypeScript SDK version
30 or later.
🖼️ Enhanced Screenshot Options
New unified screenshot configuration withoptions.screenshot
parameter for better control over screenshot behavior.Basic Usage
Key Features
📸 Screenshot Phasesbefore
: Screenshot before the actionafter
: Screenshot after the actiontrace
: Screenshot with operation trace- Default captures all three phases
delay
: Wait time after action before taking final screenshot- Default:
500ms
, Maximum:30s
base64
: Direct image data (default)storageKey
: Storage key with presigned URL accesspresignedExpiresIn
: Custom expiration for storageKey URLs (default:30m
)
Usage Examples
Capture specific phases:Please use the new
options.screenshot
parameter instead of the old
screenshot fields, as they will be deprecated in future versions.2025-08-25
These features require upgrading to TypeScript SDK version
29 or later.
🎯 Natural Language UI Actions
UI Actions now support natural language descriptions for targets and locations, making automation more intuitive and human-readable.📏 Action Scale Settings
Added support for scaling UI actions to adjust the size of screenshots and coordinate calculations without changing the actual screen resolution.Key Features
- Scale Range: 0.1 to 1.0 (10% to 100% of original size)
- Screenshot Scaling: Output screenshots are scaled according to the setting
- Coordinate Scaling: All action coordinates and distances are automatically scaled
- No Resolution Change: The box’s actual screen resolution remains unchanged
Usage Examples
Scale Examples
Scale Value | Screenshot Size | Coordinate Example |
---|---|---|
1.0 (default) | Full size | Click({x: 100, y: 100}) |
0.5 | 50% size | Click({x: 50, y: 50}) equivalent |
0.25 | 25% size | Click({x: 25, y: 25}) equivalent |
🌐 Proxy Configuration
Added proxy configuration support for Android boxes.2025-08-15
These features require upgrading to TypeScript SDK version
27 or later.
🖱️ UI Actions
- Added:
tap
andlongPress
actions for precise coordinate taps and long-press interactions. - Enhanced:
swipe
/scroll
now support semanticdistance
values ("tiny" | "short" | "medium" | "long"
), so you no longer need to provide pixel values.
New action: tap
New action: longPress
Semantic distance (swipe/scroll)
2025-08-08
These features require upgrading to TypeScript SDK version
26 or later.
🎥 Android Screen Recording
Added screen recording capabilities to Android boxes for creating tutorials and debugging UI interactions.📱 Android Media Management
Added media file management capabilities for Android devices, including album listing and media operations.🪝 AI Action Progress Callbacks
Since AI Action execution takes time, we provide a series of callbacks to monitor the progress of the entire operation process.2025-07-29
These features require upgrading to TypeScript SDK version
25 or later.
UI Action Support for StorageKey Output Format
UI Action now supportsoutputFormat: "storageKey"
, allowing GBOX.AI to directly store screenshot information. Compared to returning image data directly, StorageKey provides more flexible storage and access options:- presigned URL: System-generated temporary access link with a default validity period of 30 minutes
- storageKey: A storage key that remains valid throughout the box lifecycle (until the box is deleted)
- Custom presigned URL: Use
createPresignedUrl
to create presigned URLs with specified expiration times, convenient for returning to LLM models for image analysis
Android Multiple APK Installation Support
Added support for installing multiple APK files using ZIP archives, enabling installation of split APKs and bundle applications.Installation Modes
Single APK Installation (default):- Upload and install a single APK file
- Traditional installation method for standalone applications
- Upload a ZIP archive containing multiple APK files
- Automatically extracts and installs all APK files in the correct order
- Essential for modern Android apps that use App Bundle distribution
ZIP Archive Structure
When using multiple APK installation, organize your files as follows:Usage Examples
Single APK Installation:2025-07-21
These features require upgrading to TypeScript SDK version
22 or later.