Feb 4, 2026
HumanEval Agent Mode Benchmark
Gemini 2.5 Flash - Careti prompt mode 97.6% first-attempt pass rate, 5.3s avg response
Models tested:gemini-2.5-flashsolar-pro2solar-pro3
Sharing benchmark results conducted to improve Careti.
We hope this helps developers choose the right model and serves as foundational data for AI model researchers. Each benchmark result can be found below.
Filters
ClearModels tested
Gemini 2.5 Flash - Careti prompt mode 97.6% first-attempt pass rate, 5.3s avg response