For generations, British schoolchildren have been taught that showing your working matters as much as getting the correct answer. Whether tackling fractions in Year 5 or differentiating functions at A-level, laying out each step of the solution has been a cornerstone of mathematics education across the UK.
Then came ChatGPT.
Within months of its mainstream adoption, teachers across Britain began noticing that homework submissions suddenly displayed flawlessly structured, even elegant, step-by-step solutions. Parents reported that their children “understood the homework less but completed it faster.” Students quietly admitted to using AI tools not only to “check” their work but to generate it wholesale. And universities observed a blurring line between legitimate assistance and academic outsourcing.
As a member of a UK academic committee tasked with examining the implications of artificial intelligence in education, I have spent the past year exploring one deceptively simple question: How does ChatGPT actually generate mathematical steps?
Behind that question lies a bigger one: What does this mean for Britain’s mathematical literacy, educational fairness, and our relationship with emerging AI systems?
This article seeks to provide a clear, accessible explanation—avoiding both hype and alarmism—written in the explanatory, public-facing style familiar to BBC News Magazine readers. It aims to help the British public understand what these tools can do, what they cannot, and how our education system should respond.

Mathematics in Britain is more than a school subject; it is a gatekeeper discipline. GCSE maths is one of the strongest predictors of later-life income. A-level maths remains the most popular advanced subject in the UK. Apprenticeships in engineering, healthcare, and technology all rely heavily on numeracy. Even our political debates increasingly hinge on data interpretation—polling, risk assessment, public health, climate projections, economic modelling.
In short: numeracy is civic literacy.
This makes the rise of AI-generated mathematical solutions profoundly consequential. Unlike copying from the back of a textbook, ChatGPT offers a more complete illusion: immaculate, human-like reasoning. It is the difference between submitting copied answers and submitting answers that appear to reflect understanding.
Teachers across the UK have reported several effects:
Homework differentiates less clearly between high and low attainers.
Students lose opportunities for productive struggle, an essential part of mathematical development.
Misconceptions become harder to detect, because AI-generated work obscures gaps in understanding.
Parents rely more heavily on AI explanations, sometimes to good effect, but often without understanding the system’s limitations.
The public debate frequently divides into two camps: those celebrating AI as a great equaliser in education, and those fearing it will undermine foundational skills. The truth is more complex and depends heavily on how the technology works.
The most important fact is also the least intuitive:
ChatGPT is not performing mathematics in the same way humans or calculators do.
It does not contain a built-in arithmetic module. It does not store formulae in a dedicated symbolic engine. Instead, it operates as a language model—a statistical system trained on vast quantities of text. Mathematics, in this context, is simply another type of language pattern.
This means that when asked to solve an equation such as:
5x−3=275x - 3 = 275x−3=27
ChatGPT is not “solving for x” through formal computation. Rather, it is predicting what a typical step-by-step explanation for such a problem looks like, based on patterns learned from textbooks, online resources, exam solutions, and countless other sources.
It may look like this:
Add 3 to both sides.
Divide both sides by 5.
Conclude that x=6x = 6x=6.
These steps are familiar, intuitive, and often correct—not because the model “understands” the algebraic logic in a human sense, but because such patterns frequently occur in its training data.
However, this approach also produces signature weaknesses. ChatGPT can misapply rules, mishandle arithmetic, or force unusual questions into familiar templates.
This dual nature—sometimes brilliant, sometimes flawed—defines AI maths reasoning today.
Despite its unconventional logic, ChatGPT offers real advantages for learners:
The model excels at breaking complex problems into digestible steps. Many British parents report that its explanations are clearer than traditional classroom materials.
ChatGPT can adjust its explanations dynamically:
“Explain this like I’m 10.”
“Give me a visual analogy.”
“Show two different methods.”
“Explain the common mistakes students make.”
Such adaptability is rare even among human tutors.
Private tutoring in the UK is expensive. For families unable to afford it, AI offers high-quality support that was previously inaccessible.
AI will explain a concept as many times as needed, without judgement—a significant benefit for anxious learners.
These strengths explain why AI is becoming a daily tool in British households. Yet they also make its weaknesses more subtle and potentially more dangerous.
A central concern for teachers is not that ChatGPT sometimes makes mistakes, but how it makes them.
Unlike a hesitant pupil, ChatGPT states incorrect reasoning with unwavering confidence. This can be more misleading than a simple arithmetic slip.
If a problem has an unusual structure, the model may “hallucinate” steps that follow a standard pattern rather than addressing the actual question.
Humans can identify when a question is poorly worded. ChatGPT cannot. If a problem is vague or uses unconventional notation, its answer may drift into irrelevance.
Ironically, the model can stumble over large numbers or multi-digit calculations that would be trivial for a basic calculator.
These weaknesses raise important questions for Britain’s education system: How should teachers assess work in an age of AI? How should students learn to verify answers? What safeguards are needed?
Emerging research suggests ChatGPT’s mathematical reasoning involves several overlapping processes:
The system implicitly classifies the question: algebra, geometry, calculus, probability, etc.
It draws on millions of examples to construct plausible human-like working.
Modern language models often generate multiple internal reasoning paths and select the most coherent one—a process known as self-consistency sampling.
The final answer is polished into a clean sequence of steps that resemble textbook reasoning.
Understanding this process is crucial for assessing reliability. It also clarifies why AI is strong at typical problems but less dependable with edge cases or creative tasks.
Teachers across the UK report a sharp increase in homework that “looks perfect but doesn’t reflect classroom performance.”
Several UK schools are already emphasising:
In-class assessments
Oral questioning
Open-book or open-AI exams
Conceptual questions that AI handles poorly
Britain will need to teach students not just mathematics, but how to evaluate AI-generated reasoning. This includes spotting errors, cross-checking steps, and recognising overconfident explanations.
If properly woven into curriculum design, AI could offer disadvantaged students the kind of personalised support long enjoyed by those who can afford private tutoring.
Interviews with British teachers reveal creative responses:
Some use ChatGPT to generate practice problems tailored to class needs.
Others demonstrate AI mistakes live in lessons to teach verification skills.
A few schools integrate AI into revision sessions, allowing students to interact with explanations but requiring them to handwrite their own reasoning.
Rather than banning AI, many educators prefer teaching students to use it responsibly—mirroring the approach taken during the calculator debates of the 1980s.
Students should verify each step, especially arithmetic.
Ask ChatGPT to explain concepts, not to produce full solutions for copying.
A student who cannot explain a solution in their own words has not learned it.
Requesting two or three different approaches reduces the risk of accepting flawed reasoning.
Ofqual and the Department for Education continue to release policies on the use of AI in schools.
Future AI tools may combine language models with symbolic mathematics engines, greatly reducing errors.
Classrooms may soon include AI systems that detect misconceptions in real time or personalise revision plans.
Students may not need to perform every calculation manually, but they will need to understand concepts deeply enough to judge AI outputs.
As public discourse becomes increasingly data-driven, Britain must ensure that citizens can evaluate statistics and algorithms with confidence.
ChatGPT’s ability to generate step-by-step maths solutions is both an extraordinary opportunity and a complex challenge. It can democratise access to high-quality explanations and support learners who struggle. It can also obscure misunderstanding, inflate grades, and erode foundational skills if used uncritically.
The responsibility now lies with Britain’s educators, parents, policymakers, and students. We must cultivate a culture in which AI is used wisely—supporting learning, not replacing it; enhancing understanding, not bypassing it.
If Britain succeeds, we can build a future where every student, regardless of background, gains access to high-quality mathematical guidance. If we fail, we risk widening inequalities and weakening the mathematical competence on which our society increasingly depends.
The challenge is real, but so is the opportunity. As with any powerful tool, the outcome depends on how we choose to use it.