> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gbox.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate Coordinates

Generate precise UI element coordinates using the **gbox-handy-1** model.

This specialized model analyzes screenshots and instructions to identify exact coordinates for UI operations.

## Supported Actions

The model supports three core actions that cover nearly all coordinate-based UI interactions:

* **Click**: Identify precise tap/click coordinates for buttons, links, and interactive elements
* **Drag**: Calculate start and end coordinates for drag operations (e.g., swipe, scroll bars)
* **Scroll**: Determine optimal scroll coordinates and directions


## OpenAPI

````yaml post /model
openapi: 3.0.0
info:
  title: GBOX Open API
  description: GBOX Open API Documentation
  version: '1.0'
  contact: {}
servers:
  - url: https://gbox.ai/api/v1
    description: Production Server
security: []
tags: []
paths:
  /model:
    post:
      tags:
        - Model
      summary: Generate coordinates for a model
      operationId: ModelController_call
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ModelRequest'
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ModelResponseData'
      security:
        - bearer: []
      x-codeSamples:
        - lang: JavaScript
          source: "import GboxSDK from \"gbox-sdk\";\n\nconst gboxSDK = new GboxSDK({\n  apiKey: process.env[\"GBOX_API_KEY\"] // This is the default and can be omitted\n});\n\nasync function main() {\n  const result = await gboxSDK.model.call({\n\t\tmodel: \"gbox-handy-1\",\n    screenshot: \"https://gru-activate2-public-assets.s3.us-west-2.amazonaws.com/jessica/screenshot-1759332945616-pu0ovj.png\",\n    action: {\n      type: \"click\",\n      target: \"the VSCode app icon on the bottom dock\"\n    }\n  });\n\n  console.log(\"Result:\");\n  console.info(JSON.stringify(result, null, 2));\n}\n\nmain();"
        - lang: Python
          source: '# Coming soon'
        - lang: Go
          source: // Coming soon
components:
  schemas:
    ModelRequest:
      type: object
      properties:
        model:
          type: string
          description: Model to use
          enum:
            - gbox-handy-1
          example: gbox-handy-1
          default: gbox-handy-1
        screenshot:
          type: string
          description: >-
            Screenshot image as HTTP(S) URL or base64-encoded data URI. Supports
            both formats: 1) HTTP(S) URL pointing to an image file; 2)
            Base64-encoded data URI with format 'data:image/png;base64,[data]'
            or 'data:image/jpeg;base64,[data]'. Only PNG and JPEG formats are
            supported for base64.
        action:
          description: Structured action object (click or drag)
          oneOf:
            - $ref: '#/components/schemas/ClickAction'
            - $ref: '#/components/schemas/DragAction'
            - $ref: '#/components/schemas/ScrollAction'
      title: Model Request
      description: Model request
      required:
        - screenshot
        - action
    ModelResponseData:
      type: object
      properties:
        response:
          description: Model response data
          oneOf:
            - $ref: '#/components/schemas/ModelClickResponseData'
            - $ref: '#/components/schemas/ModelDragResponseData'
            - $ref: '#/components/schemas/ModelScrollResponseData'
        id:
          type: string
          description: >-
            Unique ID of this request, can be used for issue reporting and
            feedback
          example: 123e4567-e89b-12d3-a456-426614174000
      title: Model Response Data
      description: Model response data structure
      required:
        - response
        - id
    ClickAction:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: click
        target:
          type: string
          description: Natural language description of what to click
          example: the VSCode app icon on the bottom dock
          maxLength: 2000
          minLength: 1
      title: Click Action
      description: Click action structure
      required:
        - type
        - target
    DragAction:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: drag
        target:
          type: string
          description: Natural language description of starting position
          example: the VSCode app icon
          maxLength: 2000
          minLength: 1
        destination:
          type: string
          description: Natural language description of ending position
          example: the desktop
          maxLength: 2000
          minLength: 1
      title: Drag Action
      description: Drag action structure
      required:
        - type
        - target
        - destination
    ScrollAction:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: scroll
        location:
          type: string
          description: >-
            Natural language description of the location where the scroll should
            originate.
          example: Side bar
          maxLength: 2000
          minLength: 1
        direction:
          type: string
          description: Scroll direction
          enum:
            - up
            - down
            - left
            - right
          example: up
      title: Scroll Action
      description: Scroll action structure
      required:
        - type
        - location
        - direction
    ModelClickResponseData:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: click
        coordinates:
          description: Click result data
          allOf:
            - $ref: '#/components/schemas/ClickResult'
      title: Model Click Response Data
      description: Model click response data structure
      required:
        - type
        - coordinates
    ModelDragResponseData:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: drag
        coordinates:
          description: Drag result data
          allOf:
            - $ref: '#/components/schemas/DragResult'
      title: Model Drag Response Data
      description: Drag response data structure
      required:
        - type
        - coordinates
    ModelScrollResponseData:
      type: object
      properties:
        type:
          type: string
          description: Action type
          enum:
            - click
            - drag
            - scroll
          example: scroll
        coordinates:
          description: Scroll result data
          allOf:
            - $ref: '#/components/schemas/ScrollResult'
      title: Model Scroll Response Data
      description: Scroll response data structure
      required:
        - type
        - coordinates
    ClickResult:
      type: object
      properties:
        x:
          type: number
          description: X coordinate. Returns -1 if no valid target is found.
          example: 520
        'y':
          type: number
          description: Y coordinate. Returns -1 if no valid target is found.
          example: 340
      title: Click Result
      description: Single click result with coordinates
      required:
        - x
        - 'y'
    DragResult:
      type: object
      properties:
        target:
          description: Starting coordinates
          allOf:
            - $ref: '#/components/schemas/Coordinates'
        destination:
          description: Ending coordinates
          allOf:
            - $ref: '#/components/schemas/Coordinates'
      title: Drag Result
      description: Single drag result with target and destination coordinates
      required:
        - target
        - destination
    ScrollResult:
      type: object
      properties:
        x:
          type: number
          description: X coordinate
          example: 520
        'y':
          type: number
          description: Y coordinate
          example: 340
        scrollX:
          type: number
          description: Horizontal scroll amount
          example: 100
        scrollY:
          type: number
          description: Vertical scroll amount
          example: 100
      title: Scroll Result
      description: Single scroll result with location and direction
      required:
        - x
        - 'y'
        - scrollX
        - scrollY
    Coordinates:
      type: object
      properties:
        x:
          type: number
          description: X coordinate. Returns -1 if no valid target is found.
          example: 520
        'y':
          type: number
          description: Y coordinate. Returns -1 if no valid target is found.
          example: 340
      title: Coordinates
      description: X and Y coordinates. Returns -1, -1 if no valid target is found.
      required:
        - x
        - 'y'
  securitySchemes:
    bearer:
      scheme: bearer
      bearerFormat: JWT
      type: http
      description: >-
        Enter your API Key in the format: Bearer <token>. Get it from
        https://gbox.ai

````