Supported Actions
The model supports three core actions that cover nearly all coordinate-based UI interactions:- Click: Identify precise tap/click coordinates for buttons, links, and interactive elements
- Drag: Calculate start and end coordinates for drag operations (e.g., swipe, scroll bars)
- Scroll: Determine optimal scroll coordinates and directions
Authorizations
Enter your API Key in the format: Bearer <token>. Get it from https://gbox.ai
Body
application/json
Model request
HTTP(S) URL to screenshot image
Example:
"https://gru-activate2-public-assets.s3.us-west-2.amazonaws.com/jessica/screenshot-1759332945616-pu0ovj.png"
Structured action object (click or drag) Click action structure
- Click Action
- Drag Action
- Scroll Action
Example:
{
"type": "click",
"target": "the VSCode app icon on the bottom dock"
}Model to use
Available options:
gbox-handy-1 Example:
"gbox-handy-1"
Response
200 - application/json
Model response data structure
Model response data Model click response data structure
- Model Click Response Data
- Model Drag Response Data
- Model Scroll Response Data
Unique ID of this request, can be used for issue reporting and feedback
Example:
"123e4567-e89b-12d3-a456-426614174000"