Commit History
Add benchmarking functionality for Globle game
cab9413
Kunal Pai
commited on
Add benchmarking script for Wordle game
e09bf50
Kunal Pai
commited on
Add benchmarking functionality for NYT Connections dataset
0577af4
Kunal Pai
commited on
Add paper benchmarking, along with dataset for it
4f96523
Kunal Pai
commited on
Refactor get_last_assistant_content function to improve response handling and support various response formats
81fafc1
Kunal Pai
commited on
Refactor benchmarking script to implement HLE dataset performance evaluation and improve response handling
aa7e221
Kunal Pai
commited on
Add benchmarking script for GlobleDistanceTool via Gradio API
97e9ed5
Kunal Pai
commited on