Biased LLMs and the risk to student thinking
AI in education is not a simple cheating story. Biased LLMs can train students to outsource thought unless schools teach judgment.

Biased LLMs are becoming a hidden curriculum for students. The primary worry of teachers and professors is cheating, but the deeper problem is formation. Students who have not built their own intellectual, moral, political and spiritual foundations are now getting first drafts of thought from systems trained on biased data, tuned by institutional preference and wrapped in a voice that sounds reasonable even when it packages soulless PR phrasing as a nuanced philosophy of morality, life and the world.
AI can be a tutor, research assistant, writing partner, coding helper, debate opponent and creative multiplier. Yet it can also become a soft curriculum for people who have no inner curriculum of their own. The system prompt preventing their favorite cloud AI from spitting too many politically incorrect truths might end up programming the intellectually defenseless in its image.
That realization is imporant for every parent, teacher, school leader and student trying to make sense of AI in education. The question is bigger than whether a teenager copy-pasted a paragraph or two from ChatGPT. The question is whether a generation of young people begins composing its inner life through the defaults of a cloud model.
Key takeaways
LLMs inherit bias from the whole system around them. Training data matters, but bias also enters through feedback data, moderation rules, system prompts, product design, safety tuning and deployment incentives.
Student writing is already changing in the AI era. Research suggests AI-assisted writing is becoming more polished, positive, formal and vocabulary-rich at the cohort level, even when there is no clear evidence that core reasoning has improved.
Opinionated AI writing tools can shape what users write and later believe. Co-writing studies show that model suggestions can move users toward certain views while leaving them feeling as if they stayed in control.
The evidence does not fit a simple “AI makes students dumb” story. Structured AI tutoring can improve learning, while generic answer-giving systems can weaken it when students use them to avoid the work.
The worst response is a panic cycle of bans, detector surveillance and state licensing. That approach increases institutional control while doing little to build judgment.
The better response is formation. Students need to use AI from a position of intellectual independence rather than dependency.
Why biased LLMs matter now
Three things are happening at once.
First, LLMs are moving into search, writing, studying, coding, note-taking, office software, browsers, phones and school workflows. They are becoming a default language layer between students and reality.
Second, researchers are finding measurable traces of AI style in human writing. A University of Warwick analysis of 4,820 undergraduate reports found that after ChatGPT’s launch, student writing became more positive, formal and vocabulary-rich. AI-associated words such as “delve” and “intricate” rose sharply before falling in 2025 as students appeared to adapt their style. The researchers emphasized that the study did not prove individual AI use, but it did show cohort-level stylistic shifts during the AI era.
Third, researchers are moving beyond the “cheating” angle. They are studying how AI changes the process of thinking, writing, learning and belief formation.
That is the real story many AI doomers overlook. A copied paragraph is a disciplinary problem. A copied mental framework is a generational crisis.
The control lever is model behavior
A hosted LLM controls the suggestions a student sees, the framing of controversial questions, the sources it favors, the moral tone it adopts, the ideas it refuses, the ideas it treats as settled and the boundaries of acceptable inquiry.
This control lever has several layers.
Pretraining data gives the model its statistical memory. Internet data overrepresents some languages, classes, institutions, ideologies, professions and cultures while underrepresenting others. NIST’s Generative AI Profile warns that harmful bias in generative AI can come from training data and can create representational harms, performance disparities and homogenized outputs.
Post-training and RLHF shape what the model says when it has options. Stanford HAI’s policy brief on whose opinions language models reflect found that language models are shaped by training data, crowd workers and developer decisions. It also reported that models fine-tuned with human feedback were less representative of the general public’s opinions than models that were not fine-tuned, and that RLHF-tuned models in the study aligned more with left-leaning, liberal views.
Behavior policies and constitutions add explicit values. Anthropic says Claude’s constitution directly shapes Claude’s behavior and describes the company’s intentions for Claude’s values and character. OpenAI says ChatGPT should avoid political bias in any direction and reports that it evaluates model bias across prompts with different political slants.
These are attempts at alignment. They are also proof that model behavior is governed by value choices.
Deployment context adds institutional pressure. A model used by a school, government agency, corporation or regulated provider may be tuned further by procurement rules, safety rules, legal risk, content policies and brand incentives. The model may never even receive an order from a government minister and still reflect establishment assumptions in every response. It can absorb them through the full stack of incentives around it.
Dataset bias is not the sole culprit
Early language-model bias research already showed that models trained on real-world data capture stereotypes. The StereoSet benchmark measured stereotypical bias across gender, profession, race and religion. It found strong stereotypical biases in models such as BERT, GPT-2, RoBERTa and XLNet.
The BBQ benchmark found that question-answering models often relied on stereotypes when context was under-informative. Even when context was informative, models were more accurate when the correct answer aligned with a social bias than when it conflicted with one.
The corporate response to AI models’ tendency to parrot such biases in training data has always been to “correct” against these biases by first mercilessly lobotomizing models that dared utter unorthodox opinions, then hard-coding deliberate opposite ideological and political biases into the model or application layer itself.
In other words, a frontier chatbot goes far beyond raw autocomplete. It combines a pretraining corpus, filtered data, preference training, safety tuning, system instructions, content policy, retrieval systems, product design, regional law, corporate risk management and interface defaults into one smooth assistant voice.
That polished voice is powerful because it rarely sounds like propaganda. It sounds like procedural reasonableness. It writes the kind of prose that gets approved by HR, a compliance department, a university committee or a public-sector communications office. It is positive, careful, agreeable and allergic to unsanctioned conflict. It can make weak ideas sound polished and water strong ideas down beyond recognition.

Students are already absorbing AI style
The strongest evidence for student emulation is stylistic.
The Warwick study looked at authentic undergraduate writing over time. After ChatGPT’s launch, student reports shifted in ways similar to ChatGPT-rewritten versions of pre-2022 reports. The researchers found more positivity, more formality and a wider vocabulary range, with no corresponding change in grades or examiner feedback.
In plain English, the writing became more polished without clear evidence that the underlying academic skill improved.
That pattern matches broader research on academic writing. Dmitry Kobak and coauthors studied 14 million PubMed abstracts from 2010 to 2024 and found an abrupt increase in certain style words after LLMs became widely available. Their analysis of ChatGPT usage in academic writing through excess vocabulary suggested that at least 10% of 2024 PubMed abstracts had been processed with LLMs, with some sub-corpora reaching 30%.
A separate analysis of about 280,000 English-language videos from academic institutions found evidence that humans increasingly imitate LLM-associated wording in speech too. The paper on LLM influence on human spoken communication is careful about interpretation, but its thesis is direct: LLMs appear to be influencing how people speak as well as how they write.
This does not mean every polished student paragraph is necessarily AI-generated. That assumption is the logical leap behind many AI detector scandals, most prominent of which are the absolute deluge of Turnitin false positives.
This moves beyond students merely using AI to generate language for them. Even students who do not copy-paste entire paragraphs from a literal chatbot interface can begin to internalize the model’s rhythm, hedging, structure, tone and moral posture. They may start writing as if every thought must pass through the LinkedIn-average filter before it becomes acceptable.
In other words, we may be looking at a horrific future where cloud-based AI has taught all young adults to think and talk like HR memos.
More on AI writing tells:
The bigger risk is reactive thinking
It does not even stop there. Students’ intellectual autonomy as a whole is at stake.
A 2023 CHI paper on co-writing with opinionated language models tested whether a language-model writing assistant could affect users’ opinions. In an experiment with 1,506 participants, users wrote about whether social media is good for society. Some received suggestions from a model configured to favor one side. The opinionated assistant affected the views expressed in users’ writing and shifted their opinions in a later survey.
A 2026 follow-up on reactive writers gives the phenomenon a useful name: reactive writing. The authors found that engaging with AI suggestions becomes a central activity in the writing process. Users often start evaluating and accepting suggestions before they have completed their own ideation. The AI seeds directions, and the writer elaborates.
That is a real danger for education.
A fully formed adult can use AI as a sparring partner. He can ask for counterarguments, demand sources, compare models, reject premises, detect omissions because he has broad general knowledge, rewrite clearly in his own voice and verify claims. A power user treats the model as a tool under his own judgment.
An unformed student, however, may reverse that relationship. He lets the model supply the premises, frame, vocabulary, emotional tone and acceptable range of conclusions. Then he mistakes the resulting paragraph for his own thought because he clicked accept, edited three sentences and added a conclusion.
The tool does not need to brainwash him. It only needs to make one path easier than another.
AI can harm learning… under the wrong conditions
Still, the evidence points away from blanket anti-AI panic.
The best research suggests a more useful distinction. Answer machines can damage learning. Structured tutors can improve it.
The Stanford SCALE summary of Generative AI Can Harm Learning describes a field experiment with nearly 1,000 high-school math students using GPT-4-based tutors. GPT Base, which resembled a standard ChatGPT interface, improved performance during assisted practice. GPT Tutor, which had safeguards designed to support learning, improved assisted performance even more. When access was removed, students in the GPT Base condition performed worse than students who never had access. The negative learning effects were largely mitigated by GPT Tutor.
That result fits ordinary experience. If a student uses AI to get the answer, he may complete the assignment while skipping over any meaningful development of skills. If he uses AI to receive guided hints, compare approaches, explain mistakes and practice retrieval, the tool can definitely help.
The positive evidence matters too. A 2025 Scientific Reports randomized controlled trial in a Harvard physics course found that students using a custom AI tutor learned significantly more in less time than students in an active-learning class, while reporting more engagement and motivation. The study on AI tutoring outperforming in-class active learning focused on a tutor designed around pedagogical best practices, rather than a generic chatbot bolted onto a class.
Tutor CoPilot points in the same direction. A randomized controlled trial involving 900 tutors and 1,800 K-12 students found that students whose tutors had access to AI support were more likely to master topics, with the largest gains coming from lower-rated tutors. The study describes Tutor CoPilot as a human-AI approach for scaling real-time expertise.
In practice, the real question then becomes: who is using the AI, for what purpose, under which pedagogy, with what level of prior formation and who controls the model?

The cognitive-debt warning
A 2025 MIT Media Lab preprint called Your Brain on ChatGPT drew attention because it used EEG to compare essay-writing with ChatGPT, search engines or no tool. The authors reported that brain-only participants showed the strongest and most distributed brain connectivity, search-engine users showed moderate engagement and LLM users showed the weakest connectivity. The LLM group also reported lower ownership of essays and struggled more to quote their own work.
However, such studies should be looked at carefully. In this case, the sample was limited and, even granting its strongest premises, it does not prove that ChatGPT use lowers intelligence. Regardless, the implications are serious enough: when an AI model does too much of the cognitive work, the average user will most likely end up doing less.
That pattern long predates AI, though. Calculators can weaken arithmetic if introduced badly. GPS can weaken navigation. Autocomplete can weaken spelling. Television trained an entire generation into adopting passivity as a lifestyle. Social media trained impulse, status anxiety and dopamine loops.
My beef with studies like the one above is that they contain no serious ranking of brain activity by importance. If the entire purpose of a technology is to alleviate the burden of repetitive busywork, less brain activity will be observed in a controlled environment. But does the fact that one doesn’t have to spend time manually writing out long divisions or skimming through a physical dictionary in the library really mean that one can’t thoughtfully engage with mathematics or language? Or does it merely give us more cognitive bandwidth to focus on what really matters?
For a long time, our civilizational advancement has been measured by the extent to which we are able to delegate physical and cognitive grunt work. The risk with LLMs, however, is that they might condition the average person to delegate something far more valuable: the very first movement of thought.
The AI literacy horseshoe
A recent Journal of Marketing study on how lower AI literacy predicts greater AI receptivity found that people with lower AI literacy were typically more receptive to AI. The authors argue that lower-literacy users are more likely to perceive AI as magical and feel awe when it performs tasks that seem uniquely human.
That finding is most definitely interesting. However, the study quickly went viral as proof that AI enthusiasm signals low intelligence.
A better way to look at these findings is an AI literacy horseshoe.
At one end, low-literacy users may overtrust AI because it feels magical. They may accept fluent answers, polished prose and official-sounding explanations without understanding hallucinations, sampling, hidden prompts, retrieval gaps or alignment pressure.
In the middle, average users may treat AI as a convenience layer. They ask for summaries, email drafts, homework help and brainstorming. However, because they don’t have the mental capacities and tech literacy to troubleshoot, argue, and understand the real limitations of AI tools, they get frustrated when outputs don’t match expectations. This is the user who can spot when a tool does something wrong, but lacks the ability to work around it.
At the other end, power users often get extraordinary gains because they do not trust AI blindly. They know when and how to ask for sources. They compare models. They use local tools when privacy matters. They split problems into verifiable parts. They know the model’s first answer is a move in a workflow, rather than final authority. They can force the system to generate options, attack assumptions, expose tradeoffs, detect omissions, write code, test code, produce variants and compress drudgery.
The same tool that makes one person passive can make another exponentially more capable.
That is why bans are almost certainly the wrong option. They punish the development of skills while leaving centralized model providers, school administrators, AI detectors and regulators with more power over the permitted uses of intelligence.
More on AI literacy:
The establishment answer will be control
The predictable institutional response is already familiar.
First comes the panic: students are cheating, young people cannot think, AI is unsafe and nobody can tell what is real.
Then comes the control stack: school bans, detector tools, procurement lists, approved vendors, usage logs, age gates, identity systems, reporting requirements, content filters and regulatory frameworks.
New York City schools blocked ChatGPT on school networks in January 2023, citing concerns about learning and content accuracy, before later moving toward guidance and classroom use. Chalkbeat covered the original policy in its report on how NYC banned access to ChatGPT on school computers and networks. UNESCO’s 2023 guidance for generative AI in education and research called for governments to regulate generative AI in schools, including data protection and consideration of age limits.
Some of those concerns are real. Children should not be pushed into account-gated AI systems that collect data, shape behavior and replace mental effort. Schools should not treat generic chatbots as neutral tutors. Parents should not outsource the formation of their children to a cloud tool.
Institutional “solutions” are almost certainly guaranteed to fail because they treat students as managed subjects rather than developing persons. They try to solve weak judgment with stronger surveillance. They often boil down to tackling intellectual dependency with analog forms of that same dependency.
A school, system or society that cannot teach a student to reason will not fix the problem by banning certain tools and mandating other ones.
More on AI as a control layer:
The real formation gap
If students become empty vessels for whatever machine speaks first, the machine is a mere symptom of a deeper underlying problem.
The harder question that parents, teachers and academics are all too eager to avoid is this: exactly why are so many students intellectually defenseless in the face of chatbots in the first place? Why do they face AI with no strong inner core? No serious grasp of logic? No understanding of rhetoric? No metaphysics? No moral vocabulary beyond slogans? No deep reading life? No serious political theory? No religious or spiritual discipline? No habit of disputation? No intellectual tradition that helps them recognize when a machine is smuggling in assumptions?
That question is being skipped over in the AI debate because it exposes a broader educational failure.
Granted, a young person who has read Plato, Aristotle, Augustine, Aquinas, Locke, Burke, Tocqueville, Bastiat, Orwell, MacIntyre, Lewis, or Sowell will not automatically become wise. But he at least has some furniture in the room. He has voices to compare. He has inherited arguments. He knows categories deeper than “safe,” “harmful,” “inclusive,” “problematic,” “efficient” and “evidence-based.”
A young person formed mostly by worksheets, compliance language, social media and test prep enters the AI age almost undefended. The model’s tone will become his tone. The model’s categories will become his categories. The model’s idea of reasonableness will become his conscience.
This is why the AI debate cannot be limited to tool talk. Tools matter, but tools cannot replace formation. It is also exactly why we started our sister projects, Popular Philosophy and Thinking Better. Popular AI can help readers understand the AI control layer, and how to work around it. Popular Philosophy exists because even the most skilled power users can only accomplish good if they know how to identify it. Thinking Better exists exactly to teach young people the basics of intellectual self-defense.
The superintelligences of the future will either devour people’s minds entirely, scare them into a luddite existence of irrelevance, or become powerful allies to those who can keep up.
That means that, for those who want to remain intellectually autonomous human beings in the age of machine superintelligence, philosophy and critical thinking skills are simply no longer optional.
What biased LLMs mean for users
For ordinary AI users, the lesson is simple: never confuse flashy presentation with truth. Treat the chatbot as a fast assistant, rather than a replacement for your mind.
For parents, the question reaches beyond permission. The real issue is whether the child has enough inner structure to use AI without being used by it.
For teachers, detector paranoia is a dead end. Our own coverage of Turnitin false positives shows why detector outputs cannot carry the moral weight many schools place on them. A better approach includes oral defense, drafts, source trails, in-class writing, revision logs and assignments that require judgment rather than polished filler.
For schools, AI policy should distinguish between answer extraction and guided learning. A generic chatbot that gives answers can weaken learning. A structured tutor that asks questions, reveals hints gradually, supports retrieval and forces the student to explain reasoning can improve it.
For power users, the lesson is to keep building skill. The more capable AI becomes, the more valuable independent judgment becomes. If every mediocre user can accomplish professionally packaged output, the advantage shifts to people who can ask better questions, verify faster, compare frames and preserve a real, authentic voice in the process.
For local AI users, the control issue is obvious. Hosted models are convenient, but they place model policy, account access, retention rules and behavior defaults in someone else’s hands. Local models are weaker in many tasks, but they create a fallback. Our guide to building a local Perplexity alternative with Vane, Ollama and SearXNG shows one way users can reduce dependence on hosted research tools.
More on local AI alternatives:
What to do instead of banning AI
Teach students to challenge the model before using it
Students should learn to interrogate the model before they use it.
That means asking what assumptions are built into an answer, what a serious opponent would say, which claims need sources and what the strongest version of the opposing view would look like. It means asking how the answer would change from another moral, political or religious tradition. It means noticing what the model omitted and how the source base shaped the result.
This turns AI from an answer vending machine into a debate partner.
The student must learn to see the model as an artifact. Its output comes from training, tuning, retrieval, policy, product design and prompt context. Once students grasp that, the low literacy magic spell weakens. The model becomes a useful tool and less of an authoritative machine god.
Make students write before the model speaks
For serious writing, the student should move first.
That means producing a thesis, outline, background research and rough argument before asking AI for help. The model can then critique, extend, challenge and restructure. It should not supply the first act of thought.
This one rule matters because the first frame has enormous power. When AI writes first, the student reacts. When the student writes first, AI becomes a tool for revision.
Schools should make this distinction central to writing instruction. A student who brings AI a weak but genuine argument has something to strengthen. A student who brings nothing and asks for a draft has already surrendered the most important part of the assignment.
Use AI tutors that ask questions before revealing answers
The evidence from GPT Base versus GPT Tutor points to a practical rule: AI should scaffold learning, rather than replace the struggle.
Students should be asked to explain steps, predict answers, retrieve concepts and correct mistakes. A good AI tutor should often refuse to give the final answer until the student has tried.
That may frustrate students who want speed. But that is the point. Learning requires friction. Remove all friction and the assignment becomes performance theater.
Good tutoring uses AI to keep a student engaged in the problem long enough to build skill. Bad tutoring uses AI to make the problem disappear.
Preserve voice as an educational goal
Students should not be trained to produce frictionless prose. They should learn to write in a voice that reflects real thought.
That means occasional roughness is not always a defect. A strange sentence can reveal a mind at work. A polished paragraph can hide the absence of one.
Teachers should reward clarity, evidence, structure and intellectual courage. They should also make room for idiosyncrasy. If every student essay begins to sound like a corporate policy memo, education has lost something far more important than style.
Voice is more than decoration. It is a sign that a real human being is present.
Expose students to multiple models and local tools
Exposure to multiple tools and models can weaken the impression that AI is “magic”.
Students should compare ChatGPT, Claude, Gemini, open models, local models, search engines, books and primary sources. They should learn that models differ. They should experience that outputs are shaped by training data, alignment choices, prompting and retrieval.
This practice breaks the spell. It helps students understand that AI responses are constructed, partial and contestable.
Local tools also matter. They teach students that AI does not have to be a single corporate interface. Even when local models are weaker, they preserve a sense of agency and technical literacy. That is important in a world where hosted models increasingly mediate research, writing and schoolwork.
Build a non-digital intellectual life
No AI policy can replace books, conversation, prayer, debate, mentorship, apprenticeship, solitude and difficult writing.
Children need time away from screens. They need adults who can argue. They need traditions older than the latest interface update. They need real friendships, real duties and real skin in the game.
A mind formed mostly by tools will eventually belong to whoever controls the tools.
The answer to biased LLMs cannot be another dashboard. It has to include deeper reading, stronger writing, live argument, moral formation and serious attention. Those things are harder to scale than software. That is why they are important.
Sadly, it is also why the education system consistently fails to offer them.

The real risk is managed cognition
The most dangerous future is not one where students use AI to cheat on writing assignments.
The most dangerous future is one where students use approved AI inside approved platforms under approved school policies, guided by approved institutional values, watched by approved detector systems and never taught how to think outside the approval structure.
That is managed cognition. Institutionally approved tunnel vision. Take one look at the average European Commissioner struggling to articulate the inane platitudes and ideological dogmas on their cue cards or teleprompters. That is the dreadful future of cognitive decline that could await every child if real education is not taken seriously. And they will likely not receive a €37,000-per-month paycheck to help them cope with the absence of meaningful mental faculties either.
This threat is easy to miss to those who are already converged to the hive mind because it will arrive in the form of pleasant, politically safe language: safety, equity, responsible use, student support, digital citizenship and trustworthy AI. Those words can describe real goods. But, more realistically speaking, they are more often euphemisms for the intelligence enforcement layer.
The practical questions stay the same. Who controls the model? Who controls the account? Who controls the curriculum? Who controls the logs? Who controls the acceptable range of inquiry? What happens to the student who asks the wrong question?
Biased LLMs make these questions urgent because they move the struggle over education into the interface itself. A textbook sits on the desk. A model talks back. It guides, suggests, refuses, summarizes, praises, edits and frames. It can feel like a tutor while behaving like a policy layer.
That is why AI literacy must go hand in hand with thorough intellectual formation, rather than being limited to software training only.
More on AI alignment:
FAQ
Are LLMs trained on biased datasets?
Yes. LLMs are trained on large text corpora that reflect the biases, omissions, stereotypes, incentives and power structures of the sources they ingest. Bias also enters through filtering, labeling, RLHF, safety tuning, system prompts and deployment policies.
Does ChatGPT make students worse at thinking?
The evidence does not support a universal claim. Generic answer-giving AI can harm learning when students use it as a crutch. Structured AI tutors can improve learning when they are designed around active learning, scaffolding, feedback and student effort.
Are students really starting to write like AI?
Research suggests student writing has shifted in the AI era. The Warwick study found cohort-level changes in undergraduate writing after ChatGPT’s launch, including more positivity, formality and vocabulary range. The study does not prove individual AI use, but it does support the concern that AI tools are shaping student writing style.
What is reactive writing?
Reactive writing is a suggestion-led writing process where the user reacts to AI-generated ideas before fully developing his own. This can make the user feel in control while the model quietly seeds the direction of the argument.
Should schools ban AI?
A blanket ban is usually a weak answer. It may be appropriate for specific ages, assignments or privacy-sensitive contexts, but it does not teach judgment. Schools should distinguish between answer extraction, assisted drafting, structured tutoring, research support and critical model comparison.
How should parents handle AI?
Parents should focus on formation first. Children need reading, writing, logic, moral reasoning, spiritual life, conversation and disciplined attention before AI becomes a default assistant. AI should be introduced as a tool to question, rather than an oracle to obey.
Intellectual formation is more important than ever
Do not raise children to be pro-AI, anti-AI or AI-dependent.
Raise them to be stronger than the interface.
A young person with no intellectual foundation will be shaped by whichever system he can delegate his thinking to. A young person with serious formation can use AI as leverage. That is the difference between being managed by the model and commanding it.
Expect the policy-maker class to try to solve this with bans, dashboards, detector tools, vendor approvals and regulation. Some of those guardrails may be useful, especially for young children and sensitive data. But the real answer is older and more difficult for educators to accept: form the mind before handing it a thinking machine.
Explore more from Popular AI:
Start here | Local AI | Fixes & guides | Builds & gear | Popular AI podcast







