A newer version of the Gradio SDK is available:
5.33.0
Gaia Agent Evaluation Guide
This guide will walk you through the setup process for running the sample code and evaluating your agent using Gaia results.
Step 1: Configure API Keys
Before anything else, make sure you configure your secret keys in the Space Settings section.
- Log into each required platform.
- Locate and input your API keys in the designated fields.
Step 2: Set Up Supabase
- Log in to Supabase.
- Navigate to your space, then go to your project.
- Open the SQL Editor, paste the SQL code below, and run it to create the necessary table and function.
📦 SQL Code – Creating Tables and Functions
-- Enable pgvector if not already enabled
create extension if not exists vector;
-- Create the documents table (if not already done)
create table if not exists documents (
id bigserial primary key,
content text,
metadata jsonb,
embedding vector(768) -- Make sure this matches your model's embedding dimension
);
-- Create the match_documents function
create or replace function match_documents (
query_embedding vector(768),
match_count int default 5,
filter jsonb default '{}'
)
returns table (
id bigint,
content text,
metadata jsonb,
similarity float
)
language plpgsql
as $$
begin
return query
select
id,
content,
metadata,
1 - (embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by embedding <=> query_embedding
limit match_count;
end;
$$;
- After running the above, execute this command to ensure Supabase’s API layer (PostgREST) refreshes its internal schema cache:
NOTIFY pgrst, 'reload config';
Step 3: Populate the Database
To enable document retrieval, you need to populate the database with example entries:
Open and run the test.ipynb Jupyter notebook.
This script reads from the metadata.jsonl file and inserts the examples into the documents table.
This adds a Basic Retrieval capability to your agent, enhancing its performance.
Step 4: Run the Evaluation
Once the database is set up and filled with data:
Proceed to the Evaluation section in your project.
Run the evaluation script to test and score your agent’s performance.