Team Members and Roles
Sasha Richardson: Project Manager, Technical/Dev Lead. Responsible for “momentum-making,” code architecture, repository management (GitHub), and technical documentation.
Christian Gilkes: Research Lead. Responsible for sourcing content/data, bibliography, and ensuring intellectual rigor.
Michelle Santiago Cortés: Outreach & Documentation Lead, Design/UX. Responsible for public-facing copy, visual identity, accessibility standards, and meeting notes.
All Members: Data Curators (responsible for querying LLMs, collecting responses, and identifying hallucinations).
Abstract
The AI Hallucinations Project is a digital archive and critical analysis tool documenting how Large Language Models (LLMs) fabricate or distort the histories of marginalized communities. While AI models increasingly function as informal historians, they frequently invent figures, misquote theorists, or erase narratives when processing complex humanities data. Existing technical benchmarks treat these “hallucinations” as bugs to be patched; this project reframes them as cultural artifacts that reveal algorithmic bias. Building on the Black Knowledge Erasure Dataset (BKED), which documents distortions in 19th and 20th century Black history, this project expands its scope to include a parallel dataset on Puerto Rican histories from the same time-period for comparative analysis. By employing controlled prompting across model families (GPT-5, Gemini, Claude) and verifying outputs against gold-standard archives, such as the Schomburg Center and the Library of Congress, the project will enable visualization of how epistemic erasure functions differently across specific diasporas. The final product will be a public-facing website featuring an explorable database, data reports, and visualizations, serving as an Open Educational Resource (OER) for educators and students.
Narrative
Enhancing the Humanities
The AI “Hallucinations” Project represents a crucial interdisciplinary initiative designed to enhance the humanities by fundamentally reinterpreting algorithmic errors. Rather than treating fabrication and misrepresentation by large language models (LLMs) as mere technical glitches to be debugged, the project elevates these errors into significant cultural artifacts worthy of critical scholarly analysis. Our core methodology involves systematically documenting how contemporary LLMs fabricate, distort, or outright erase narratives, figures, and historical events within the fields of Black American Studies and Puerto Rican Studies.This documentation creates a robust, empirical foundation that empowers scholars, educators, and librarians to actively challenge the perceived authority and neutrality of AI systems.
The project will produce and maintain an Open Educational Resource (OER) specifically designed to equip students and educators with the skills to navigate the complexities of AI-generated content. This initiative fosters a necessary level of media and technological literacy, encouraging critical engagement rather than passive acceptance of AI outputs. For researchers focused on algorithmic accountability and ethics, the project provides an invaluable, categorized dataset. This allows Critical AI Researchers to conduct granular audits of model behavior, moving beyond surface-level metrics to understand the deeply patterned cultural consequences of AI bias in downstream applications. Likewise, curators, archivists, and museum professionals can utilize the project’s database to proactively protect the integrity of digital archives and collective memory. The data serves as an early warning system against the silent but pervasive threat of digital misattributions, invented sources, or the algorithmic re-shaping of historical fact. Furthermore, the structured nature of the data allows Investigative Journalists and Explorers to filter AI “lies” by specific demographic categories, error types (e.g., fabrication, omission, misattribution), and topical focus. This capability enables targeted reporting on the real-world, quantifiable harms of algorithmic bias, transforming abstract concepts of AI ethics into concrete, actionable stories of injustice and systemic failure in public-facing technologies.
Environmental Scan
The rapid integration of Generative AI into educational settings has necessitated rigorous scrutiny of model reliability. The controversies surrounding these technologies led to studies of its consequences in recent years. A study from last year highlighted a paradox associated with Large Language Models (LLMs), noting how those technologies can produce anything that is false, even if the content — video, document, audio, text (or other digital file) — gives the impression of something legit. On the other hand, LLMs can be tools for detecting and eliminating falsities if programmed appropriately, thus demonstrating the dilemma around reliability (Park and Nan, 2025). Another study — also from 2025 — takes a supportive perspective by claiming that Artificially Intelligent tools sustain research across many disciplines. However, respective scholars acknowledge how real-world biases are easily represented in many data sources required by AI software to function (Madanchian and Taherdoost, 2025). Researchers at the Stanford Institute for Human-Centered Artificial Intelligence investigated the impacts of ubiquitous autonomous learning models on African American communities. According to their report, artificially intelligent software has proven useful in the medical field (especially for low-income patients), presents economic opportunities, and is a tool for advancing educational instruction. These prospects, however, can be hampered by algorithm bias and misinformation, especially when the AI technologies reflect discrimination embedded into its data sources (Djanegara, Elam, Kosoglu, Koyejo, Meinhardt, Nwankwom Watkins, Wald, Zaman, Zhang, 2024). These findings highlight the source of many controversies surrounding the implementation of AI systems into general classrooms and higher learning. Accuracy and trustworthiness are critical points in this debate, especially as slightly more people are utilizing AI to learn anything, including historical topics. For these reasons, this project aims to leverage the observations and findings when uncovering evidence of informational errors produced by Generative AI software.
Questions about the reliability of Generative AI in education continue to be explored, revealing plenty of errors and inconsistencies. The AI Incident Database, for example, is a website that documents broad safety harms. It essentially functions as a digital library containing news articles and other reported evidence of real consequences resulting from AI technologies across various industries and spaces, including education. Additionally, there are benchmarks such as TruthfulQA and HaluEval, that are designed to measure how models mimic human falsehoods in a general sense. Also worth mentioning is Gordon McKelvie, a professor at the University of Winchester, who experimented with Microsoft CoPilot. He was curious to observe how the tool would analyze data curated for studying King Edward VI. However, it mainly gave facts pertaining to the source used without making connections to broader English history (McKelvie, 2025). The research and projects described above were influenced by the widespread realities of “hallucinations” in AI-generated content. Proceeding with a methodology of documentation permits us to consider the various ways that current Artificially Intelligent software can be misleading. Furthermore, the observations from referenced studies offer tools and perspectives to consider, meaning that the details can be helpful in studying Generative AI flaws in studying Black American and Puerto Rican histories.
History of the project
This project builds upon BKED, a dataset designed to document how AI models like Claude, GPT-5, and Gemini distort 19th and 20th century African American history and culture through specific “hallucinations”. Rather than viewing these errors as random bugs, the project frames them as “epistemic erasure,” where algorithms invent authorities or omit key figures in ways that mirror historical discrimination. The dataset includes the original prompts, the incorrect AI responses, and human-verified annotations that identify exactly where the models failed against standard archival sources. This dataset will serve as a component of the ongoing AI “Hallucinations” Project; an initiative aimed at cataloguing the ‘creative writing’ tendencies of algorithms to reveal how they impact marginalized communities, namely African-Americans and Puerto Ricans. By treating these errors as cultural artifacts, the project allows technologists to analyze the ‘cultural consequences’ of AI fabrication.
Final Product and Dissemination
This phase of the AI Hallucinations Project will culminate in:
- The creation of a new dataset of documented AI hallucinations about Puerto Rican history of the 19th and 20th centuries.
- Short (500-800 words) critical essays about the anthropomorphising effect of the term “hallucination,” generative AI’s parasitic relationship to knowledge production ecosystems, and the methodologies used to develop the datasets.
- A primary visualization chart where users can scan through a catalogue of hallucinations, their prompts, and the tools that yielded them.
- plus additional visualization assets that summarize key findings
- A repository for both datasets (GitHub) to be made available for query and download.
- A website that will host all of the above.
Dissemination strategy will begin during the development phase with peer and community-based outreach: one-on-one, in-network conversations within our immediate communities of art and culture workers, historians, archivists, students, digital humanists, and tech workers. Given the subject matter of the histories guiding this project, Black American and Puerto Rican communities will also be at the center of our dissemination efforts, with special attention to the communities around the Schomburg Center for Research in Black Culture and El Centro for Puerto Rican Studies at Hunter College.
We know from being active participants in these communities that there is a rising pressure to adopt tools like ChatGPT, Gemini and Claude and that workers across industries are ill-equipped to assess the merits and harms of turning to such tools for reliable information while considering the implications of how such adaptations affect the greater ecosystem of knowledge production. The AI Hallucinations Project aims to document AI fabrications and record them as cultural artefacts. The website will serve as a site for critical thinking about generative AI and its impact on knowledge production.
Because the project will primarily live on the website, we are aware that once the momentum for the initial launch and outreach campaigns slows down, the project will live on its afterlife through the Wayback Machine, people’s browser Bookmarks, Are.na boards, public spreadsheets, and other link repositories organized by amateurs, hobbyists, and institutions alike. We do not think this is an undesirable outcome, in fact, this is what successful outreach and dissemination look like. For this stage of the AI Hallucinations Project, successful dissemination is when the link to the website earns its place among people’s personal collections of internet artefacts or among institutions’s list of recommended resources. We will tease the project throughout its development by adding relevant readings, influences, inspirations and ideas to an Are.na board that will be shared publicly closer to launch.
In this spirit, the second phase of the dissemination and outreach strategy will consist of identifying 10-15 link repositories where the link to our project might be found by the intended audiences. Link repositories have boomed in popularity as a way to circumvent the narrowing effects of recommendation algorithms and commercial search engines. They live as websites listing links to other websites, public spreadsheets, Are.na boards, and paywalled recommendations. Individual users maintain their own collection of links on their browser or dedicated notes software, with some creators building entire communities and business models by treating their link repositories (usually containing product or travel recommendations) as commodities.
As we prepare to launch the website, we will assess the need for an Instagram or Substack presence, depending on the findings from the earlier stages of outreach. At this time, we believe Instagram is the most accessible platform for all our target user communities, and see a potential for growth by continuing our community building there. We are also considering a newsletter format, which is more immediate and yields higher engagement with the added benefit of producing an email list that can be adapted for other purposes in the future.
Technologies
To achieve our Minimum Viable Product (MVP) by May, we must focus our learning and efforts on a core set of necessary skills. This includes basic proficiency in Python scripts for data collection, manipulation, and outputting JSON/CSV files. We also require a functional understanding of the GitHub workflow, covering essential commands for cloning, committing, pushing, branching, and managing pull requests to ensure proper version control and issue tracking. Finally, a crucial component is understanding how to structure and validate the collected data using JSON/CSV schemas to maintain consistency, along with fundamental web development knowledge (HTML, CSS, and potentially a simple JavaScript framework or static site generator) to display the data accessibly and meet universal design standards.
While ambitious, there are a few stretch goals that would be beneficial but are not critical for the MVP. On the web development front, this involves implementing a more dynamic front-end framework like React or Vue, or gaining deeper knowledge of a back-end language for database integration, which is likely beyond the scope of a rapid MVP. We also consider advanced universal design and accessibility testing, moving beyond the basics to implement sophisticated features and conduct rigorous testing, as a beneficial but non-essential stretch goal.
To ensure we deliver the MVP by the May deadline, we may strategically scale back in a few key areas, particularly in Web Development. Instead of building a highly dynamic, database-driven website, we will focus on a static or minimally interactive site that solely displays the collected, structured data accessibly. This means scaling back the scope to exclude complex search or filtering mechanisms beyond what a simple static site can handle, and eliminating the need for a dedicated back-end server beyond static file hosting.
Project Management
Tasks will be tracked via Google Docs & Google Sheets, while code will be managed on GitHub. Primary channels are text and email, with a commitment to check every 48 hours. Weekly sync meetings on Fridays at 1:00 PM.
Work Plan: Milestones and Deliverables
| Date & Milestone | Deliverable | Specific Goal |
| Fri., Feb. 27th
(Project Work Plans) |
Submission of detailed timeline | Finalize the list of reputable sources to be used to develop queries for the Latino history dataset. |
| Fri., Mar. 6th
(Data Management Plans) |
Submission of the schema.json and data dictionary | Ensure the metadata schema accommodates both datasets (e.g., ensuring “cultural context” fields can distinguish between Black and Latino entries). |
| Fri., Mar. 13th
(Outreach/Social Media Plan) |
Submission of the strategy for publicizing the project | Draft copy that explains the nature of the project to a general audience. |
| Fri., Mar. 20th
(Project Website Draft) |
Launch of the basic landing page | A functional site structure that includes the “About,” “Methods,” and a preliminary search interface for the datasets. |
| Fri., Mar. 27th
(Pre-Break Update) |
Status check on data collection | Completion of 50% of the raw model querying for the new Latino dataset. |
| Fri., Apr. 17th
(Post-Break Update) |
Adjustments based on initial findings | Completion of human verification/fact-checking for the collected model responses. |
| Fri., Apr. 24th
(Final Stretch) |
Finalizing the Data Visualizations | Implementing the comparative charts showing hallucination rates between the two demographic datasets. |
| Fri., May 1st
(Final Project Update) |
Final submission of the website, Explorable Database, and White Paper |


