How good are free AI chatbots at doing Ukrainian schoolwork?
CheatBench is a benchmark I'm building to answer one nosy question: if a Ukrainian student handed their schoolwork to a free-tier chatbot, would it actually pass? Real curriculum, real answer keys, checked by hand. No vibes, no marketing numbers.
No scores yet. This page is the plan, not the results.
Methodology
How the benchmark will work
The plan is fixed before any model sees a single question, so nobody can claim the test was rigged after the fact.
- 01
Same task, every model
- 02
Three input formats
- 03
Manual verification
- 04
Published rankings
The contenders
Three families, multiple modes each
Each model is tested across its thinking modes, because the fast setting and the slow careful one behave nothing alike.
Gemini
- AGemini 3.5 Flash (Standard)
- BGemini 3.5 Flash (Extended thinking)
- CGemini 3.5 Pro (Standard)
- DGemini 3.5 Pro (Extended thinking)
ChatGPT
- AGPT 5.5 (Low thinking)
- BGPT 5.5 (High thinking)
Claude
- ASonnet 4.6 (Low thinking)
- BSonnet 4.6 (Medium thinking)
Input formats
The same question, three ways
Students don't always type their schoolwork out. Sometimes it's a snapshot of a textbook at midnight. Each task is fed in all three formats to see what trips the models up.
Photograph
Plain text
Every answer checked by hand
No model grades another model. Every response is compared against a known correct answer by a human, marked right or wrong, and logged. Slow, boring, and the only way the numbers mean anything.
Stay posted
Want the results when they drop?
Leave an email and I'll send the rankings the day they're published. One message, no list, no spam.