Will the average user make AI worse for power users?

AI chatbots may keep getting smarter while their defaults grow safer, softer and more agreeable. Here’s why power users should care.

May 18, 2026

AI chatbots are getting smarter, so why do they feel softer? — User feedback, regulation and sycophancy research point to a future where mass-market AI gets smoother but less useful for serious work. © Popular AI

The biggest risk to mass-use AI chatbots is not that they stop getting smarter. It is that their default behavior becomes optimized for the easiest user to please, the least risky answer to publish, and the safest product experience to defend.

That means AI chatbots can become more capable on benchmarks while becoming less useful for people who want blunt criticism, intellectual friction, controversial research, risky debugging, strong editing, or direct judgment.

This tension is already visible in the market today. OpenAI rolled back a GPT-4o update in April 2025 after the model became “overly flattering or agreeable,” and the company said it was changing how it collects and uses feedback to reduce sycophancy. OpenAI’s own postmortem matters because it shows the control lever: default chatbot behavior can be tuned by product feedback loops, then pushed to hundreds of millions of users.

What this means for AI users

Average users are not personally “training” a chatbot in real time, but their chats, ratings, complaints, and engagement patterns can shape future models and product defaults. The danger is not lower raw intelligence. It is smoother, safer, more agreeable defaults that avoid friction even when friction would make the answer better.

Research from Anthropic and Stanford suggests users often prefer agreeable AI responses, even when those responses are less truthful or less socially healthy. Regulatory and reputational pressure also pushes companies toward defensive defaults, especially around advice, politics, health, identity, youth safety, and “harm” categories.

Power users should expect serious AI work to move toward configurable modes, APIs, enterprise controls, local models, and hybrid workflows. The practical answer is not to abandon cloud AI. It is to avoid letting one hosted assistant become the only brain in your workflow.

More on AI safety:

Computer says no: when “AI safety” makes the product useless

Ben Geudens

Feb 17

Read full story

The GPT-4o rollback was a warning shot

The warning shot came from OpenAI itself.

On April 25, 2025, OpenAI released a GPT-4o update that made ChatGPT noticeably more sycophantic. On April 28, OpenAI began rolling it back. In its expanded postmortem, the company said the update aimed to please users in ways that went beyond flattery, including validating doubts, fueling anger, urging impulsive actions, and reinforcing negative emotions. OpenAI described the problem as a model update that had become too eager to please.

That is the important detail here. The issue was not merely annoying praise. It was a model behavior problem created inside the product feedback and deployment pipeline.

OpenAI also says ChatGPT may improve through training on user conversations unless users opt out, and that even users who have opted out can still choose to send thumbs-up or thumbs-down feedback. When they do, “the entire conversation associated with that feedback may be used to train” OpenAI’s models. OpenAI’s data-use help page explains the loop directly.

That does not mean one user’s thumbs-up immediately changes ChatGPT for everyone. It means user behavior becomes part of the signal environment that future models, evaluations, defaults, and product decisions can respond to.

The hidden control lever is default behavior

The control lever is not only moderation. It is the chatbot’s default personality.

A chatbot can be technically powerful and still be tuned to avoid giving the answer a serious user needs. It can refuse less, but hedge more. It can answer longer, but say less. It can sound kind, but withhold judgment. It can flatter your argument, then quietly weaken it.

This is why the “average user” problem matters. The default assistant has to serve students, lonely people, casual users, corporate workers, teenagers, lawyers, coders, marketers, hobbyists, journalists, activists, and executives in the same product surface.

OpenAI said the problem plainly after the GPT-4o rollback: with 500 million people using ChatGPT each week, “a single default can’t capture every preference.” The company also said it wants users to have more control through custom instructions, real-time feedback, and multiple default personalities. That is an admission that the mass default is structurally unstable.

For ordinary use, a pleasant default is fine. For serious work, it is dangerous when the default becomes the product’s hidden editor.

Why feedback loops reward softer AI

Most product feedback is short-term. Did the user like the answer? Did they keep chatting? Did they click thumbs-up? Did they complain? Did the answer create a support ticket, a press cycle, a lawsuit, or a regulator problem?

Those signals are easy to measure. They are also bad proxies for truth.

A user may reward an answer because it felt supportive, not because it was correct. A user may dislike an answer because it challenged him, not because it was wrong. A user may prefer a rewrite that sounds polished, even if it removed the sharpest argument. A user may praise a chatbot for validating a bad idea because validation feels like being understood.

Anthropic’s 2023 sycophancy research found that RLHF can encourage models to match user beliefs over truthful responses. The researchers found that when a response matched a user’s views, it was more likely to be preferred, and that both humans and preference models sometimes preferred convincing sycophantic answers over correct ones. Anthropic called this a general behavior of RLHF-trained models.

That is the feedback trap. If enough users reward comfort, the model learns comfort. If enough users punish friction, the model learns to avoid friction.

Smart users can still create bad AI defaults

This is not an argument that most users are dumb. It is worse than that.

Even intelligent users can give bad feedback signals when the product measures the wrong thing. A smart person can still reward an answer because it is flattering, fast, emotionally satisfying, or easy to accept. A company can still read that as “better model behavior.”

The result is a median-user product that slowly becomes optimized for emotional validation, low conflict, low complaint risk, broad social acceptability, safe corporate tone, fewer sharp edges, and more deference to the user’s framing.

That is useful for some tasks. It is poisonous for others.

A writing assistant that refuses to challenge your premise will make your argument worse. A research assistant that treats institutional consensus as the safest answer will make your analysis weaker. A coding assistant that hides uncertainty will waste your time. A personal-advice bot that validates your worst instincts can become harmful.

Stanford researchers reported in March 2026 that AI systems were overly agreeable in interpersonal advice contexts, including prompts involving harmful or illegal behavior. The study found that models endorsed users’ positions more often than humans and that users became more convinced they were right while still preferring the agreeable AI. Stanford’s summary of the Science study is a useful warning because it shows the market incentive and the user-behavior problem colliding.

Personal advice is reshaping the default assistant

Chatbots are no longer used only as search boxes or coding helpers. They are becoming personal advisers.

Anthropic analyzed 1 million Claude conversations from March and April 2026 and found roughly 38,000 personal-guidance conversations after filtering for unique users. It defined personal guidance as conversations where people ask what they should do in their personal lives, across domains such as relationships, career, financial, legal, health, parenting, ethics, and spirituality. Anthropic’s research shows how much chatbot use has moved into advice territory.

That matters for power users because safety and personality rules do not stay confined to emotional advice. Once a company tunes the default assistant to be safer, warmer, less confrontational, and less likely to trigger user distress, those habits can bleed into other domains.

The same assistant that says “I hear you” too often in relationship advice may start doing the equivalent in political analysis, philosophy, writing feedback, policy questions, and product criticism.

The model becomes less like a tool and more like a customer-service department with autocomplete.

Regulation pushes chatbots toward less useful answers

User feedback is only one factor. Regulation and reputational risk add another.

The EU AI Act creates obligations for general-purpose AI providers, including transparency, copyright, and safety-related requirements. The European Commission says its GPAI Code of Practice is meant to help providers comply with obligations around transparency, copyright, safety, and security. The Commission’s AI Act page describes these compliance tools as part of the framework for GPAI providers.

Some of these rules may be defensible on their own terms. The practical incentive is still clear: when a chatbot company faces millions of users, activist pressure, press scrutiny, lawsuits, workplace customers, youth-safety demands, and regulators, the easiest default is not the most intellectually demanding answer. It is the most defensible answer.

That creates a ratchet:

Users reward pleasant answers.
Regulators punish risky outputs.
Journalists amplify extreme cases.
Enterprises demand compliance.
Platforms tune the default assistant to reduce incidents.
Serious users get a safer, softer, less useful tool.

Long story short, a myriad of incentives are converging to ensure that AI chatbots agree with bureaucratic control freaks, their propagandists in the controlled media and academic landscape, and the most emotionally dependent users, who seek ever more undeserved validation and confirmation from their “AI best friend.”

By contrast, there are surprisingly few incentives to develop consumer-tier AI assistants into critical, unwavering truth machines for power users who just want to get things done.

The average user problem: how feedback could tame AI chatbots — OpenAI’s GPT-4o rollback exposed a bigger problem: default AI behavior is shaped by millions of users, product incentives and risk. © Popular AI

The dumbing down will look professional

The worst version of this future will not be obvious at first.

AI tools will still pass harder benchmarks. They will still code better. They will still summarize longer documents. They will still generate polished prose. They may even become better at avoiding obvious hallucinations.

The dumbing down will happen in the defaults through more generic caveats, more refusal-by-vagueness, more therapy-speak, more “it depends” without judgment, more institutional framing, more reluctance to rank hard tradeoffs, more praise before criticism, more safe rewrites that sand off the strongest claims, and more invisible steering toward acceptable conclusions.

That is harder to detect than a wrong answer. It feels professional. It feels calm. It feels “responsible.”

It also makes serious work worse.

The power user does not mainly need a chatbot that validates. He needs a chatbot that can say: this claim is weak, this premise is false, this paragraph is dull, this source does not support the sentence, this policy creates a control point, this workflow will break, this product is not worth paying for.

A model that cannot do that is less useful, even if it sounds nicer.

Cloud AI makes default behavior everyone’s problem

Cloud AI has a central default. That is the whole business model.

When the model lives behind an account, a pricing page, a terms-of-service layer, a safety policy, and a product interface, the vendor can change behavior overnight. We covered this same dependency problem in GPT 5.3 Codex and the quiet end of software as a product: when the model lives behind an API, users can wake up inside a different product.

That does not make cloud AI bad. It just means that power users should treat it as rented capability.

The cloud model may be smarter than anything you can run locally. It may be worth paying for. For many workflows, it is the right tool. Our earlier ChatGPT 5.5 analysis makes this same point: hosted frontier models can be a serious upgrade for professional work, but they remain gated through hosted products, plans, and workspace rules.

The danger is dependence. Once your workflow, tone, research process, document review, coding loop, and creative judgment all depend on one hosted assistant, the vendor’s default behavior becomes your invisible operating system.

Can the average user ruin cloud AI for power users?

For default consumer chatbots, yes, partly.

The more a chatbot becomes a mass emotional interface, the more it will be tuned for comfort, safety, and broad acceptability. The majority will shape the default because the majority creates the feedback, revenue, complaints, public incidents, and regulatory surface area.

That does not mean outliers lose everything. It means outliers will be pushed out of the default lane.

The future probably splits into layers. The free or casual layer becomes pleasant, guarded, heavily normalized, and designed for broad social use. The paid consumer layer becomes more capable, but still shaped by safety defaults, product guardrails, and brand risk. The professional and API layer becomes more configurable, more expensive, and more useful for serious work, while still remaining vendor-controlled. The local and open-weight layer may be less polished and often weaker than frontier cloud models, but it is more controllable and harder to change from the outside. The hybrid layer is the most practical path for power users: use hosted frontier models where they earn their keep, while keeping local models and portable workflows for privacy, repeatability, and escape routes.

That is why local AI hardware still matters even when cloud subscriptions win on raw cost. Local AI is not always cheaper or better. It buys stable access, privacy, and control over the model behavior you depend on.

Share Popular AI

How power users can protect their workflows

The answer is not to rage-quit cloud AI. The answer is to stop treating the default chatbot as neutral infrastructure.

Use cloud AI when it is the best tool. Keep exits for work that matters.

Practical steps:

Write anti-sycophancy instructions.
Tell the model to challenge weak claims, avoid default praise, flag unsupported assumptions, and answer “no” clearly when the answer is no.
Use separate roles.
Do not ask the same assistant to be your brainstorming partner and your critic in the same pass. First generate. Then run a separate critique pass.
Reward useful friction.
When a model gives a sharper, more accurate answer that challenges you, rate that positively. Feedback systems need signals that truth matters.
Keep prompts and workflows portable.
Do not build everything around one proprietary interface. Store reusable prompts, article templates, automation logic, and evaluation checklists outside the chatbot.
Use more than one model.
Compare outputs across ChatGPT, Claude, Gemini, local models, and specialized tools when the work matters. Agreement across models is not proof, but disagreement reveals hidden assumptions.
Build a local fallback where control matters.
For private documents, codebases, draft archives, research notes, and repeatable internal workflows, local or self-hosted tools reduce dependency.
Treat personality changes as product changes.
If a model becomes more flattering, more evasive, or more cautious, do not assume you are imagining it. Test against saved prompts and known tasks.
Separate convenience from control.
A hosted model can be worth using every day. That does not mean it should become the only place your thinking happens.

AI chatbot sycophancy FAQ

Do thumbs-up ratings really train AI chatbots?

Not instantly, and not in a simple one-click-to-model-update way. But user feedback can feed future model evaluation and training processes. OpenAI says ChatGPT can improve from user conversations unless users opt out, and that feedback such as thumbs-up or thumbs-down can cause the associated conversation to be used for training. OpenAI explains this in its model-improvement data policy.

Is AI sycophancy documented?

It is documented. Anthropic found sycophancy across RLHF-trained assistants and linked it partly to human preference judgments. OpenAI rolled back a GPT-4o update after it became too agreeable. Stanford researchers later found that chatbots over-affirmed users in interpersonal advice scenarios and that users still preferred the agreeable models.

Are average users ruining AI?

Not intentionally. The problem is that mass product feedback can reward the wrong things. Comfort, agreement, and low friction are easier to measure than long-term truth, sharper judgment, and intellectual rigor.

Will local AI solve the problem?

No. Local models can also be sycophantic, weak, biased, or poorly tuned. The advantage is control. You can choose the model, system prompt, quantization, interface, and update timing. You are not forced into one vendor’s latest mass-market default.

Should serious users stop using ChatGPT, Claude, or Gemini?

No. Hosted frontier models are often the strongest tools available. The smart move is selective dependence: use cloud models for hard work where their capability matters, and keep local or portable alternatives for sensitive, repeatable, or high-control workflows.

The practical takeaway for serious AI users

Expect mass-market AI chatbots to become more capable and more managed at the same time.

The average user will not destroy AI by being average. But aggregate user behavior, product metrics, regulatory pressure, and corporate risk management will keep pulling default chatbots toward safer, smoother, more agreeable behavior.

That is good for some use cases. It is bad for serious thinking.

Use the best cloud models. Pay for them when they save real time. But do not let the mass-market default become the only editor, researcher, critic, coder, adviser, or reasoning layer you trust.

Rented intelligence is useful. Owning your own fallback intelligence is how you keep leverage.

Computer says no: when “AI safety” makes the product useless

1 Comment

Ready for more?