hashiruAI / bench /benchmarking_hle.py

Commit History

Refactor get_last_assistant_content function to improve response handling and support various response formats
81fafc1

Kunal Pai commited on

Refactor benchmarking script to implement HLE dataset performance evaluation and improve response handling
aa7e221

Kunal Pai commited on

Add benchmarking script for GlobleDistanceTool via Gradio API
97e9ed5

Kunal Pai commited on