Gaia Agent Evaluation Guide

This guide will walk you through the setup process for running the sample code and evaluating your agent using Gaia results.

Step 1: Configure API Keys

Before anything else, make sure you configure your secret keys in the Space Settings section.

Log into each required platform.
Locate and input your API keys in the designated fields.

Step 2: Set Up Supabase

Log in to Supabase.
Navigate to your space, then go to your project.
Open the SQL Editor, paste the SQL code below, and run it to create the necessary table and function.

📦 SQL Code – Creating Tables and Functions

-- Enable pgvector if not already enabled
create extension if not exists vector;

-- Create the documents table (if not already done)
create table if not exists documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(768)  -- Make sure this matches your model's embedding dimension
);

-- Create the match_documents function
create or replace function match_documents (
  query_embedding vector(768),
  match_count int default 5,
  filter jsonb default '{}'
)
returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    id,
    content,
    metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  where metadata @> filter
  order by embedding <=> query_embedding
  limit match_count;
end;
$$;

After running the above, execute this command to ensure Supabase’s API layer (PostgREST) refreshes its internal schema cache:

NOTIFY pgrst, 'reload config';

Step 3: Populate the Database

To enable document retrieval, you need to populate the database with example entries:

Open and run the test.ipynb Jupyter notebook.
This script reads from the metadata.jsonl file and inserts the examples into the documents table.
This adds a Basic Retrieval capability to your agent, enhancing its performance.

Step 4: Run the Evaluation

Once the database is set up and filled with data:

Proceed to the Evaluation section in your project.
Run the evaluation script to test and score your agent’s performance.