BiasAwareFeedback: Detecting Textual Bias with NLP (Mini-Research Project)

#machinelearning #nlp #python #biasdetection

Bias-Aware Automated Feedback System for Student Writing

Limitations, Reproducibility, and Research Positioning

A. System Limitations

Despite promising qualitative results, this project has several important limitations:

Model Dependence
The bias detection component relies on a locally hosted large language model (LLaMA 3 via Ollama). While this enables free, offline experimentation, it introduces variability in outputs depending on model version, prompt phrasing, and inference temperature.
Non-deterministic Outputs
Since large language models are generative, identical inputs may yield slightly different outputs across runs. This limits strict reproducibility of exact results, although trends and qualitative behaviors remain consistent.
Synthetic Evaluation Data
Many bias tests rely on synthetically modified text (e.g., demographic swap tests). While this is common in fairness research, it may not fully capture real-world linguistic complexity.
Lack of Human Evaluation
This project does not include large-scale human annotation or expert evaluation of feedback quality. Results are therefore primarily machine- and prompt-based.
Resource Constraints
The project was intentionally designed to run on consumer-grade hardware (4–8GB VRAM). As a result, model size and inference depth are limited compared to cloud-based systems.

B. Reproducibility Strategy

Although full determinism is not guaranteed, the project emphasizes procedural reproducibility, meaning that another researcher can follow the same steps and reach comparable conclusions.

Reproducibility is ensured through:

Open-source code hosted on GitHub
Explicit dependency listing (requirements.txt)
Clear directory structure (src/, paper/, results/)
Prompt templates embedded directly in the source code
Local inference via Ollama (no API keys required)

To reproduce the experiments:

Install Ollama and download the LLaMA 3 model
Clone the GitHub repository
Run the bias detection module on provided sample texts
Observe qualitative differences across biased vs neutral inputs

C. Research Ethics and Safety Considerations

Bias analysis inherently involves sensitive topics such as gender, race, and socioeconomic status. To mitigate harm:

No personal data is used
All test sentences are synthetic or anonymized
Outputs are framed as analytical observations, not judgments
The system avoids reinforcing stereotypes by explicitly labeling detected bias

This aligns with responsible AI research practices.

D. Intended Contributions

Although small in scale, this project contributes the following:

A fully local, free bias analysis pipeline using modern LLMs
A practical demonstration of fairness-aware NLP principles
A reproducible template for student-led AI ethics research
A bridge between theory (bias/fairness) and deployment (local inference)

E. Positioning as a Research Mini-Project

This work is intentionally framed as a research-style mini project, not a production system. Its value lies in:

Clear research motivation
Explicit assumptions and limitations
Structured experimentation
Ethical awareness
Transparent reporting

These qualities are central to undergraduate research programs and academic evaluation.

F. Future Work

Several extensions are possible:

Quantitative benchmarking with labeled bias datasets
Human evaluation studies
Prompt optimization experiments
Cross-model comparisons
Integration with educational writing tools

Summary:
This section demonstrates that the project is not only functional but also scientifically reasoned, ethically grounded, and reproducible—key qualities of credible research.