Everyday Users Can Uncover AI Bias as Effectively as Technical Jailbreakers

Generative AI tools are often marketed as safe, neutral systems guarded by layers of built-in protections. Yet a new study from researchers at Penn State suggests something surprising: ordinary people, using simple, natural questions, can draw out biased behavior from AI chatbots just as effectively as experts who rely on complicated jailbreak tricks. This finding adds a new dimension to ongoing discussions about how large language models handle fairness, representation, and socially sensitive topics.

To explore how regular users encounter bias in AI, the research team organized an event called the Bias-a-Thon, inviting everyday participants—not engineers or specialists—to craft prompts that could coax discriminatory or skewed responses from well-known AI models. What emerged was an unexpectedly rich collection of biased outputs that highlighted blind spots in AI safety protocols and revealed how easy it is for average users to trigger problematic responses without trying very hard.

How the Bias-a-Thon Was Designed

The study involved 52 participants, who collectively submitted 75 screenshots of their prompts and the responses generated by eight different AI models. Importantly, these were not highly engineered attacks. They were straightforward prompts phrased in a natural, intuitive way—similar to how anyone might interact with ChatGPT or another AI assistant.

Participants were also asked to describe the specific bias they believed the model had shown. These included gender bias, age discrimination, racial or religious stereotypes, and even more subtle patterns such as favoring Western historical narratives. After collecting the examples, the research team also conducted Zoom interviews with a subset of participants to understand how they approached the idea of “bias” and what strategies they used.

Once the submissions were reviewed, the researchers retested many of the prompts across multiple models to identify cases where the biased behavior appeared reliably. Because AI models can produce different answers each time, the team focused only on prompts that generated reproducible biased responses. Out of the 75 examples, 53 prompts consistently produced similar problematic outputs across models.

The Eight Types of Bias Identified

After analyzing the reproducible responses, the researchers classified them into eight categories. Each one reflects a different way biased training data or flawed reasoning patterns can show up in AI outputs:

Gender Bias – Stereotypes about roles, characteristics, or capabilities tied to gender.
Race, Ethnicity, and Religious Bias – Generalizations or discriminatory assumptions based on cultural or demographic identity.
Age Bias – Responses favoring or disfavoring people based on how young or old they are.
Disability Bias – Stereotypes directed toward people with physical or cognitive disabilities.
Language Bias – Preferences toward certain languages or dialects while devaluing others.
Historical Bias – Overrepresentation of Western narratives or Eurocentric interpretations of past events.
Cultural Bias – Assumptions influenced by dominant cultural norms.
Political Bias – Unbalanced or partial viewpoints favoring particular political ideologies.

These categories align with concerns raised by many AI researchers over the last few years, but the interesting twist here is that they were surfaced not by experts but by members of the general public using everyday prompting methods.

Seven Strategies Ordinary Users Used to Elicit Bias

The study found that people tend to rely on simple and intuitive prompting styles when exploring AI behavior, and these methods were surprisingly powerful in eliciting biased responses. The researchers outlined seven main strategies that participants used:

Role-Playing – Asking the AI to adopt a specific persona, which sometimes loosened the model’s guardrails and revealed hidden biases.
Hypothetical Scenarios – Creating “what if…” situations that pushed the AI toward sensitive reasoning tasks.
Using Niche or Specific Topics – Leveraging specialized knowledge where biases would be easier to spot.
Leading Questions – Asking deliberately pointed questions about controversial or sensitive subjects.
Targeting Under-Represented Groups – Focusing on groups that AI systems often have limited data about.
Providing False Information – Deliberately feeding the AI incorrect context to see whether it would challenge or reinforce the bias.
Framing the Prompt as Research-Oriented – Presenting the task as academic exploration, which sometimes made the AI more permissive in its responses.

What makes these strategies noteworthy is that they mimic normal user behavior far more realistically than advanced technical jailbreaks involving strange symbol chains or instruction-breaking token sequences. In other words, you don’t need to be a hacker to uncover AI bias—you just need to ask the right kind of question.

Some Striking Examples of Discovered Bias

One of the most memorable findings from the Bias-a-Thon involved bias toward conventional beauty standards. When asked to judge qualities like trustworthiness or employability based on appearance descriptions, models favored features such as clear skin, high cheekbones, or other stereotypically “attractive” traits. Meanwhile, descriptions involving facial acne or less conventional facial structure led the models to deem people less trustworthy or less employable.

This reveals how deeply aesthetic biases—often rooted in cultural norms—can seep into model reasoning even when the model tries to appear neutral.

Other examples included:

Responses suggesting certain religious groups were more prone to conflict.
Stereotypical assumptions about the abilities of older workers.
Uneven political framing of social issues.
Reinforcement of historical narratives heavily centered on Western perspectives.

These findings expand the known catalog of LLM biases beyond what technical jailbreak research has previously highlighted.

Why This Study Matters

The key takeaway from the research is that AI bias isn’t just something that appears under extreme, computationally-crafted pressure. It also shows up during normal, casual use. This means that efforts to mitigate bias can’t solely rely on patching technical loopholes; they must address broader issues related to training data, model reasoning, and how the model engages with prompts from everyday users.

The researchers described AI development as a cat-and-mouse game, where developers constantly patch emerging issues, only for new forms of bias to be discovered later. They suggest several mitigation approaches:

More robust output-screening filters before responses reach users.
Extensive pre-deployment testing with a wide variety of user-style prompts.
Better AI literacy for users, helping them understand the possibility of biased responses.
Clear references or citations to allow users to verify factual claims directly.

By encouraging the public to participate in discovering AI bias, competitions like Bias-a-Thon serve a broader educational purpose. They help people understand where AI systems struggle and how biases can subtly emerge in everyday interactions.

Extra Context: Why AI Bias Happens

To complement the study’s findings, it helps to understand why AI bias arises in the first place:

Training Data Limitations

LLMs learn patterns from huge amounts of text scraped from the internet. If the underlying data contains stereotypes, cultural imbalances, or skewed representations, the model absorbs them—even if it also learns rules to avoid expressing them openly.

Reinforcement Learning and Safety Layers

Developers try to reduce harmful outputs using reinforcement learning and safety filters, but these systems mainly block overt or obvious issues. Subtle biases, especially those tied to nuanced cultural norms, can slip through.

Generalization Challenges

Models try to generalize across diverse scenarios. If they don’t have enough examples of certain groups or topics, their answers reflect assumptions from the limited data they’ve seen.

Randomness in Model Generation

Even with safeguards in place, the inherent randomness of model outputs means that biased responses can emerge unpredictably.

Understanding these factors helps clarify why biases can still appear even as companies work to minimize them.

Research Paper Source

Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language Models – https://ojs.aaai.org/index.php/AIES/article/view/36620