Skip to main content
This tutorial teaches you how to use GBOX as a provider in OS World to build and run agents that can interact with operating systems.

What is OS World?

OSWorld is a benchmark framework for evaluating multimodal agents on open-ended tasks in real computer environments. It supports multiple providers for running virtual environments, including VMware, VirtualBox, Docker, and AWS. By using GBOX as a provider, you can leverage cloud-native infrastructure without managing local virtual machines, making it easier to scale your agent evaluations and reduce setup complexity.

Architecture

The following diagram illustrates the architecture of OS World using GBOX as a provider:

Benefits of Using GBOX Provider

Using GBOX as a provider in OS World offers several advantages:

🚀 Cloud-Native Infrastructure

  • No need to set up and manage local virtual machines
  • Works seamlessly across different development environments

Easy Scaling & Parallelization

  • Run multiple environments in parallel without local resource constraints
  • Significantly reduce evaluation time through parallel execution

🔧 Simplified Setup

  • No need to check KVM support or install Docker Desktop
  • Works on any platform without virtualization requirements

🌐 Accessibility

  • Access your environments from anywhere
  • Consistent performance regardless of your local hardware

Prerequisites

Before getting started, make sure you have:
  • A GBOX account with an API key (Get your API key)
  • An OpenAI API key (or another compatible LLM provider)
  • Python 3.10 or higher installed
  • Git installed

Getting Started

Step 1: Clone the Repository

Clone the OS World provider repository:
# Clone the OSWorld provider repository
git clone https://github.com/babelcloud/OSWorld-provider

# Change directory into the cloned repository
cd OSWorld-provider

# Optional: Create a Conda environment for OSWorld
# conda create -n osworld python=3.10
# conda activate osworld

# Install required dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

Create a .env file in the repository root and add your GBOX API Key and OpenAI API Key:
.env
GBOX_API_KEY=your_gbox_api_key
OPENAI_API_KEY=your_openai_api_key
Note: You can obtain your GBOX API key from the API Key page. Make sure to keep your API keys secure and never commit them to version control.

Step 3: Run the Provider

Execute the following command to start the provider with GBOX:
python run_multienv.py \
    --provider_name gbox \
    --model gpt-4o \
    --region us-east-1 \
    --max_steps 15 \
    --observation_type screenshot \
    --action_space pyautogui \
    --result_dir ./results_gbox \
    --num_envs 1 \
    --test_all_meta_path evaluation_examples/test_small.json
Command Parameters Explained:
  • --provider_name gbox: Use GBOX as the provider
  • --model gpt-4o: Specify the LLM model for the agent
  • --region us-east-1: GBOX region (adjust based on your preference)
  • --max_steps 15: Maximum number of steps the agent can take
  • --observation_type screenshot: Use screenshots for environment observation
  • --action_space pyautogui: Use PyAutoGUI for action execution
  • --result_dir ./results_gbox: Directory to save evaluation results
  • --num_envs 1: Number of parallel environments to run. Increasing this value can significantly improve evaluation efficiency by running multiple tasks concurrently
  • --test_all_meta_path: Path to the test configuration file

Step 4: Monitor Agent Execution

Once the agent starts running, you can monitor its progress in real-time through the VNC viewer. The agent will interact with the OS environment, performing tasks based on the evaluation configuration. VNC View
Tip: The default VNC password is osworld-public-evaluation. You can access the VNC viewer URL from the GBOX dashboard or API response.

Step 5: View Results

After the evaluation completes, you can find the results in the results_gbox directory. The results include:
  • Task execution logs
  • Screenshots of key actions
  • Performance metrics
  • Success/failure status for each task
You can now start building your own agents by modifying the test configuration files or creating custom evaluation scenarios.

Next Steps

  • Explore the OSWorld documentation to learn more about creating custom evaluation tasks
  • Check out the GBOX API reference for advanced configuration options
  • Experiment with different models and parameters to optimize agent performance
  • Scale up your evaluations by increasing the --num_envs parameter to run multiple environments in parallel