How good are free AI chatbots at doing Ukrainian schoolwork?

CheatBench is a benchmark I'm building to answer one nosy question: if a Ukrainian student handed their schoolwork to a free-tier chatbot, would it actually pass? Real curriculum, real answer keys, checked by hand. No vibes, no marketing numbers.

No scores yet. This page is the plan, not the results.

Methodology

How the benchmark will work

The plan is fixed before any model sees a single question, so nobody can claim the test was rigged after the fact.

01
Same task, every model
One fixed set of curriculum questions goes to every model and mode.
02
Three input formats
Each task is submitted as a PDF, a photo, and plain text.
03
Manual verification
A human checks every answer against the known correct one.
04
Published rankings
Scores per subject, model, and format, with the raw notes.

The contenders

Three families, multiple modes each

Each model is tested across its thinking modes, because the fast setting and the slow careful one behave nothing alike.

Gemini

AGemini 3.5 Flash (Standard)
BGemini 3.5 Flash (Extended thinking)
CGemini 3.5 Pro (Standard)
DGemini 3.5 Pro (Extended thinking)

ChatGPT

AGPT 5.5 (Low thinking)
BGPT 5.5 (High thinking)

Claude

ASonnet 4.6 (Low thinking)
BSonnet 4.6 (Medium thinking)

Input formats

The same question, three ways

Students don't always type their schoolwork out. Sometimes it's a snapshot of a textbook at midnight. Each task is fed in all three formats to see what trips the models up.

PDF

The task handed over as a PDF file, exactly as it's often shared.

Photograph

A phone photo of the page, glare and crooked angles included.

Plain text

The clean, typed-out version. The easy mode, for comparison.

Every answer checked by hand

No model grades another model. Every response is compared against a known correct answer by a human, marked right or wrong, and logged. Slow, boring, and the only way the numbers mean anything.

Stay posted

Want the results when they drop?

Leave an email and I'll send the rankings the day they're published. One message, no list, no spam.