5 Assessment Methods That ChatGPT Can't Beat

The real problem isn't ChatGPT

For two years, the same scenario has played out in every higher education institution: a professor discovers a student used ChatGPT to write their assignment. Cue the debate about detection, bans, sanctions.

But let's ask the right question: why is it so easy to cheat with AI on your current assessments?

The answer is simple. Most assessments ask students to summarize, synthesize, or analyze a standard case — exactly what language models do best. When the assessment focuses on the final product (an essay, a synthesis, a report), AI can produce it instead of the student.

The solution isn't to ban AI. It's to change what you assess.

Here are 5 assessment methods that make AI cheating structurally impossible — not because you monitor better, but because the assessment format itself prevents it.

1. Process-based investigation assessment

The principle

Instead of grading a final deliverable, you assess the journey that led to it. What questions did the student ask? To whom? In what order? What hypotheses did they form before reaching their conclusion?

Why AI can't cheat

ChatGPT can produce a brilliant financial analysis. But it can't simulate a unique investigation journey — the hesitations, the dead ends, the unexpected discoveries. The process is the proof of learning.

In practice

In an immersive simulation, each team runs its own investigation. Students interrogate characters who each hold partial information. The instructor's dashboard records everything: who talked to whom, what questions were asked, which clues were discovered.

Two teams facing the same problem never reach the same conclusion — because they didn't run the same investigation. The final deliverable only makes sense as a reflection of a unique journey.

Key takeaway: When the journey matters as much as the destination, copying the destination is useless.

2. Oral defense on unique data

The principle

The student produces a written deliverable, then defends it orally before the instructor or a panel. Questions focus on the choices made, data consulted, alternatives considered.

Why AI can't cheat

The student who did the work can explain why they chose this recommendation over another. The one who copy-pasted an AI answer can't respond to "why didn't you talk to the CFO?" or "what would you have done if the supplier had refused?"

In practice

After a simulation session, ask each team to present their diagnosis in 10 minutes. Ask questions about their investigation journey, not the content of their deliverable. Students who actually investigated answer effortlessly. The others can't.

Key takeaway: The oral doesn't test memory — it tests understanding of the process.

3. Real-time peer assessment

The principle

Students evaluate each other's question quality, hypotheses, and reasoning during the activity, not after.

Why AI can't cheat

Real-time peer assessment focuses on interactions happening live. There's no "assignment to submit" that could be outsourced. The student must demonstrate their ability to reason in front of their peers, at the moment they're doing it.

In practice

During a simulation, each team member has a role. One asks questions to the characters, another analyzes the financial data, a third takes notes. At the end of the session, team members evaluate each other's contribution. Combined with dashboard analytics (which objectively show who participated), this creates an assessment that's impossible to falsify.

Key takeaway: When assessment is distributed and real-time, there's nothing to outsource.

4. Contextualized, non-reproducible deliverables

The principle

Instead of "analyze this case," you ask "write a recommendation based on what you discovered during your investigation." The deliverable is intrinsically linked to a unique journey.

Why AI can't cheat

If each team discovered different information (because they interrogated different characters, asked different questions), then each deliverable is unique by construction. A student can't ask ChatGPT to write a recommendation based on conversations they didn't have.

In practice

In inquiry-based learning, deliverables are naturally contextualized. "Write your diagnosis of the cash flow problem, citing the sources (characters and data) that underpin your analysis." The student who ran the investigation can do it. The one who didn't participate can't invent the sources.

Format matters too: a commented financial dashboard, a synthesis memo with sources, a costed action plan based on simulated company data. All formats that ChatGPT can't fill without the simulation's specific data.

Key takeaway: The more a deliverable is anchored in a unique context, the less it can be outsourced.

5. Continuous assessment through learning analytics

The principle

Instead of a one-time exam, you assess engagement and progression throughout the activity using interaction data.

Why AI can't cheat

Learning analytics measure real behaviors: how many characters contacted, what questions asked, how much data consulted, when hypotheses shifted during the journey. These are activity traces, not text productions. They can't be artificially generated.

In practice

An instructor dashboard shows, for each team and each student:

The number and quality of character interactions
Clues discovered (and missed)
Time spent on company data
Progression between the start and end of the session

This data enables fine-grained assessment that goes far beyond the final deliverable. A student may produce a good deliverable with their team's help — but their activity traces show exactly what they did (or didn't do) to contribute.

Key takeaway: Activity traces don't lie. They're the raw material for fair assessment.

The common thread: assess the investigation, not the answer

These 5 methods share a fundamental principle: you're no longer assessing what the student knows, but how they searched.

| Method | What you assess | Can AI do it? | |--------|----------------|---------------| | Investigation process | The research journey | No | | Oral defense | Understanding of the journey | No | | Peer assessment | Real-time contribution | No | | Contextualized deliverable | Anchoring in unique data | No | | Learning analytics | Real activity traces | No |

AI cheating only works when you assess a decontextualized product — an essay, a synthesis, a standard report. As soon as you assess the process that led to the product, AI becomes useless for cheating.

It's no coincidence that these methods also produce the best learning outcomes. AI-proof assessment and good pedagogy are the same thing.

How to implement this concretely

You don't need to change everything at once. Start with one simulation session in your next class:

Describe a problem related to your discipline (business, literature, history)
Create a world with characters who each hold part of the information
Let students investigate — they choose who to talk to, what questions to ask
Assess the journey as much as the deliverable

That's exactly what MEτiS lets you do in 5 minutes. The AI generates the characters, distributes knowledge, and records the entire investigation journey. You get the deliverables and the analytics — the raw material for an assessment that ChatGPT will never be able to beat.

Frequently asked questions

Do these methods work for all levels?

Yes. The principle (assessing the process) applies from undergraduate to graduate, including professional training. What changes is the complexity of the problem and the autonomy given to students.

Does it take more preparation time?

With traditional tools, yes. With a simulation platform like MEτiS, no — creating a scenario takes 5 minutes. The time saved on surveillance and cheating detection more than compensates.

Can students use ChatGPT during the simulation?

They can try. But since each conversation with a character is unique and contextual, ChatGPT can't predict what a specific character will answer to a specific question. AI isn't banned — it's simply useless for cheating.

How do you assess fairly if each team has a different journey?

That's precisely the point. You assess the quality of reasoning, not the correctness of a single answer. An assessment rubric based on Bloom's taxonomy (identify, analyze, evaluate, create) allows fair grading across different journeys.