Spaces:

HASHIRUAgentX
/

hashiruAI

Running

App Files Files Community

hashiruAI / bench

Ctrl+K

Ctrl+K

10 contributors

History: 2 commits

Kunal Pai

Refactor benchmarking script to implement HLE dataset performance evaluation and improve response handling

aa7e221 2 months ago

benchmarking_hle.py
5.72 kB

Refactor benchmarking script to implement HLE dataset performance evaluation and improve response handling 2 months ago