Benchmarks

Sharing benchmark results conducted to improve Careti.

We hope this helps developers choose the right model and serves as foundational data for AI model researchers. Each benchmark result can be found below.

How to use raw data →

Filters

Clear

Models tested

gemini-2.5-flash solar-pro2 solar-pro3

HumanEval Agent Mode Benchmark

Gemini 2.5 Flash - Careti prompt mode 97.6% first-attempt pass rate, 5.3s avg response

Models tested:gemini-2.5-flashsolar-pro2solar-pro3

#humaneval #agent-mode