Category Archives: Project Proposal

The AI “Hallucinations” Project (Revised Proposal)

Team Members and Roles

Sasha Richardson: Project Manager, Technical/Dev Lead. Responsible for “momentum-making,” code architecture, repository management (GitHub), and technical documentation.

Christian Gilkes: Research Lead. Responsible for sourcing content/data, bibliography, and ensuring intellectual rigor.

Michelle Santiago Cortés: Outreach & Documentation Lead, Design/UX. Responsible for public-facing copy, visual identity, accessibility standards, and meeting notes.

All Members: Data Curators (responsible for querying LLMs, collecting responses, and identifying hallucinations).

Abstract

The AI Hallucinations Project is a digital archive and critical analysis tool documenting how Large Language Models (LLMs) fabricate or distort the histories of marginalized communities. While AI models increasingly function as informal historians, they frequently invent figures, misquote theorists, or erase narratives when processing complex humanities data. Existing technical benchmarks treat these “hallucinations” as bugs to be patched; this project reframes them as cultural artifacts that reveal algorithmic bias. Building on the Black Knowledge Erasure Dataset (BKED), which documents distortions in 19th and 20th century Black history, this project expands its scope to include a parallel dataset on Puerto Rican histories from the same time-period for comparative analysis. By employing controlled prompting across model families (GPT-5, Gemini, Claude) and verifying outputs against gold-standard archives, such as the Schomburg Center and the Library of Congress, the project will enable visualization of how epistemic erasure functions differently across specific diasporas. The final product will be a public-facing website featuring an explorable database, data reports, and visualizations, serving as an Open Educational Resource (OER) for educators and students.

Narrative

Enhancing the Humanities

The AI “Hallucinations” Project represents a crucial interdisciplinary initiative designed to enhance the humanities by fundamentally reinterpreting algorithmic errors. Rather than treating fabrication and misrepresentation by large language models (LLMs) as mere technical glitches to be debugged, the project elevates these errors into significant cultural artifacts worthy of critical scholarly analysis. Our core methodology involves systematically documenting how contemporary LLMs fabricate, distort, or outright erase narratives, figures, and historical events within the fields of Black American Studies and Puerto Rican Studies.This documentation creates a robust, empirical foundation that empowers scholars, educators, and librarians to actively challenge the perceived authority and neutrality of AI systems.

The project will produce and maintain an Open Educational Resource (OER) specifically designed to equip students and educators with the skills to navigate the complexities of AI-generated content. This initiative fosters a necessary level of media and technological literacy, encouraging critical engagement rather than passive acceptance of AI outputs. For researchers focused on algorithmic accountability and ethics, the project provides an invaluable, categorized dataset. This allows Critical AI Researchers to conduct granular audits of model behavior, moving beyond surface-level metrics to understand the deeply patterned cultural consequences of AI bias in downstream applications. Likewise, curators, archivists, and museum professionals can utilize the project’s database to proactively protect the integrity of digital archives and collective memory. The data serves as an early warning system against the silent but pervasive threat of digital misattributions, invented sources, or the algorithmic re-shaping of historical fact. Furthermore, the structured nature of the data allows Investigative Journalists and Explorers to filter AI “lies” by specific demographic categories, error types (e.g., fabrication, omission, misattribution), and topical focus. This capability enables targeted reporting on the real-world, quantifiable harms of algorithmic bias, transforming abstract concepts of AI ethics into concrete, actionable stories of injustice and systemic failure in public-facing technologies.

Environmental Scan

The rapid integration of Generative AI into educational settings has necessitated rigorous scrutiny of model reliability. The controversies surrounding these technologies led to studies of its consequences in recent years. A study from last year highlighted a paradox associated with Large Language Models (LLMs), noting how those technologies can produce anything that is false, even if the content — video, document, audio, text (or other digital file) — gives the impression of something legit. On the other hand, LLMs can be tools for detecting and eliminating falsities if programmed appropriately, thus demonstrating the dilemma around reliability (Park and Nan, 2025). Another study — also from 2025 — takes a supportive perspective by claiming that Artificially Intelligent tools sustain research across many disciplines. However, respective scholars acknowledge how real-world biases are easily represented in many data sources required by AI software to function (Madanchian and Taherdoost, 2025). Researchers at the Stanford Institute for Human-Centered Artificial Intelligence investigated the impacts of ubiquitous autonomous learning models on African American communities. According to their report, artificially intelligent software has proven useful in the medical field (especially for low-income patients), presents economic opportunities, and is a tool for advancing educational instruction. These prospects, however, can be hampered by algorithm bias and misinformation, especially when the AI technologies reflect discrimination embedded into its data sources (Djanegara, Elam, Kosoglu, Koyejo, Meinhardt, Nwankwom Watkins, Wald, Zaman, Zhang, 2024). These findings highlight the source of many controversies surrounding the implementation of AI systems into general classrooms and higher learning. Accuracy and trustworthiness are critical points in this debate, especially as slightly more people are utilizing AI to learn anything, including historical topics. For these reasons, this project aims to leverage the observations and findings when uncovering evidence of informational errors produced by Generative AI software.

Questions about the reliability of Generative AI in education continue to be explored, revealing plenty of errors and inconsistencies. The AI Incident Database, for example, is a website that documents broad safety harms. It essentially functions as a digital library containing news articles and other reported evidence of real consequences resulting from AI technologies across various industries and spaces, including education. Additionally, there are benchmarks such as TruthfulQA and HaluEval, that are designed to measure how models mimic human falsehoods in a general sense. Also worth mentioning is Gordon McKelvie, a professor at the University of Winchester, who experimented with Microsoft CoPilot. He was curious to observe how the tool would analyze data curated for studying King Edward VI. However, it mainly gave facts pertaining to the source used without making connections to broader English history (McKelvie, 2025). The research and projects described above were influenced by the widespread realities of “hallucinations” in AI-generated content. Proceeding with a methodology of documentation permits us to consider the various ways that current Artificially Intelligent software can be misleading. Furthermore, the observations from referenced studies offer tools and perspectives to consider, meaning that the details can be helpful in studying Generative AI flaws in studying Black American and Puerto Rican histories.

History of the project

This project builds upon BKED, a dataset designed to document how AI models like Claude, GPT-5, and Gemini distort 19th and 20th century African American history and culture through specific “hallucinations”. Rather than viewing these errors as random bugs, the project frames them as “epistemic erasure,” where algorithms invent authorities or omit key figures in ways that mirror historical discrimination. The dataset includes the original prompts, the incorrect AI responses, and human-verified annotations that identify exactly where the models failed against standard archival sources. This dataset will serve as a component of the ongoing AI “Hallucinations” Project; an initiative aimed at cataloguing the ‘creative writing’ tendencies of algorithms to reveal how they impact marginalized communities, namely African-Americans and Puerto Ricans. By treating these errors as cultural artifacts, the project allows technologists to analyze the ‘cultural consequences’ of AI fabrication.

Final Product and Dissemination

This phase of the AI Hallucinations Project will culminate in:

  1. The creation of a new dataset of documented AI hallucinations about Puerto Rican history of the 19th and 20th centuries.
  2. Short (500-800 words) critical essays about the anthropomorphising effect of the term “hallucination,” generative AI’s parasitic relationship to knowledge production ecosystems, and the methodologies used to develop the datasets.
  3. A primary visualization chart where users can scan through a catalogue of hallucinations, their prompts, and the tools that yielded them.
    1. plus additional visualization assets that summarize key findings
  4. A repository for both datasets (GitHub) to be made available for query and download.
  5. A website that will host all of the above.

Dissemination strategy will begin during the development phase with peer and community-based outreach: one-on-one, in-network conversations within our immediate communities of art and culture workers, historians, archivists, students, digital humanists, and tech workers. Given the subject matter of the histories guiding this project, Black American and Puerto Rican communities will also be at the center of our dissemination efforts, with special attention to the communities around the Schomburg Center for Research in Black Culture and El Centro for Puerto Rican Studies at Hunter College.

We know from being active participants in these communities that there is a rising pressure to adopt tools like ChatGPT, Gemini and Claude and that workers across industries are ill-equipped to assess the merits and harms of turning to such tools for reliable information while considering the implications of how such adaptations affect the greater ecosystem of knowledge production. The AI Hallucinations Project aims to document AI fabrications and record them as cultural artefacts. The website will serve as a site for critical thinking about generative AI and its impact on knowledge production.

Because the project will primarily live on the website, we are aware that once the momentum for the initial launch and outreach campaigns slows down, the project will live on its afterlife through the Wayback Machine, people’s browser Bookmarks, Are.na boards, public spreadsheets, and other link repositories organized by amateurs, hobbyists, and institutions alike. We do not think this is an undesirable outcome, in fact, this is what successful outreach and dissemination look like. For this stage of the AI Hallucinations Project, successful dissemination is when the link to the website earns its place among people’s personal collections of internet artefacts or among institutions’s list of recommended resources.  We will tease the project throughout its development by adding relevant readings, influences, inspirations and ideas to an Are.na board that will be shared publicly closer to launch.

In this spirit, the second phase of the dissemination and outreach strategy will consist of identifying 10-15 link repositories where the link to our project might be found by the intended audiences. Link repositories have boomed in popularity as a way to circumvent the narrowing effects of recommendation algorithms and commercial search engines. They live as websites listing links to other websites, public spreadsheets, Are.na boards, and paywalled recommendations. Individual users maintain their own collection of links on their browser or dedicated notes software, with some creators building entire communities and business models by treating their link repositories (usually containing product or travel recommendations) as commodities.

As we prepare to launch the website, we will assess the need for an Instagram or Substack presence, depending on the findings from the earlier stages of outreach. At this time, we believe Instagram is the most accessible platform for all our target user communities, and see a potential for growth by continuing our community building there. We are also considering a newsletter format, which is more immediate and yields higher engagement with the added benefit of producing an email list that can be adapted for other purposes in the future.

Technologies

To achieve our Minimum Viable Product (MVP) by May, we must focus our learning and efforts on a core set of necessary skills. This includes basic proficiency in Python scripts for data collection, manipulation, and outputting JSON/CSV files. We also require a functional understanding of the GitHub workflow, covering essential commands for cloning, committing, pushing, branching, and managing pull requests to ensure proper version control and issue tracking. Finally, a crucial component is understanding how to structure and validate the collected data using JSON/CSV schemas to maintain consistency, along with fundamental web development knowledge (HTML, CSS, and potentially a simple JavaScript framework or static site generator) to display the data accessibly and meet universal design standards.

While ambitious, there are a few stretch goals that would be beneficial but are not critical for the MVP. On the web development front, this involves implementing a more dynamic front-end framework like React or Vue, or gaining deeper knowledge of a back-end language for database integration, which is likely beyond the scope of a rapid MVP. We also consider advanced universal design and accessibility testing, moving beyond the basics to implement sophisticated features and conduct rigorous testing, as a beneficial but non-essential stretch goal.

To ensure we deliver the MVP by the May deadline, we may strategically scale back in a few key areas, particularly in Web Development. Instead of building a highly dynamic, database-driven website, we will focus on a static or minimally interactive site that solely displays the collected, structured data accessibly. This means scaling back the scope to exclude complex search or filtering mechanisms beyond what a simple static site can handle, and eliminating the need for a dedicated back-end server beyond static file hosting.

Project Management

Tasks will be tracked via Google Docs & Google Sheets, while code will be managed on GitHub. Primary channels are text and email, with a commitment to check every 48 hours. Weekly sync meetings on Fridays at 1:00 PM.

Work Plan: Milestones and Deliverables

Date & Milestone Deliverable Specific Goal
Fri., Feb. 27th

(Project Work Plans)

Submission of detailed timeline Finalize the list of reputable sources to be used to develop queries for the Latino history dataset.
Fri., Mar. 6th

(Data Management Plans)

Submission of the schema.json and data dictionary Ensure the metadata schema accommodates both datasets (e.g., ensuring “cultural context” fields can distinguish between Black and Latino entries).
Fri., Mar. 13th

(Outreach/Social Media Plan)

Submission of the strategy for publicizing the project Draft copy that explains the nature of the project to a general audience.
Fri., Mar. 20th

(Project Website Draft)

Launch of the basic landing page A functional site structure that includes the “About,” “Methods,” and a preliminary search interface for the datasets.
Fri., Mar. 27th

(Pre-Break Update)

Status check on data collection Completion of 50% of the raw model querying for the new Latino dataset.
Fri., Apr. 17th

(Post-Break Update)

Adjustments based on initial findings Completion of human verification/fact-checking for the collected model responses.
Fri., Apr. 24th

(Final Stretch)

Finalizing the Data Visualizations Implementing the comparative charts showing hallucination rates between the two demographic datasets.
Fri., May 1st

(Final Project Update)

Final submission of the website, Explorable Database, and White Paper

Pretty Terrifying Project (working name)

Abstract
The horror video game genre, shaped by a male-dominated industry, has historically centralized masculine perspectives in both creation and representation. Women and the LGBTQ+ community are underrepresented both in production roles as developers and designers, and also in game content, where playable characters often portray characters through harmful tropes, such as sexualization and female monstrosity. While horror has been examined in film and literature studies, horror video games are underexplored as cultural artifacts. This project builds on an earlier phase of work on a constructed dataset horror_games_feminist_themes where keywords were web-scraped from Wikipedia’s Category: Horror video games tree to identify possible recurring feminist themes. This project now aims to refine and transform the dataset into a public-facing website that visualizes and interprets patterns that emerge from the dataset, making them visible and analyzable.

List of Participants
Naila – Project Manager
Lead project and meetings; organize and keep track of tasks and calendar; assist with other roles (visual design, development, etc); outreach on a social media platform

Michael – Visual Design
Identifying visual layout for data visualizations being created; assisting in the layout for UI of site; double-checking visual accessibility (wording, color contrast etc.)

Truly – Developer
Coding; documentation for the project; setting up website; assisting with research.

Enhance the Humanities
When we consider horror studies, scholars such as Barbara Creed argue that these genres encode themes of sexuality, reproduction, and maternity by framing the narrative of women as monstrous. Expanding further allows us to extend those ideas into tropes of fear and survival. While all of these scholarly frameworks provide crucial foundational research on femininity in horror stories and media, video games tend to be underexamined in the horror genre. The critical analysis of these feminist themes can provide meaningful engagement with how women are portrayed in these types of media. 

This project plans to extend how feminist horror theory can be considered through interactive media. It treats horror video games not only as mindless entertainment, but as an intervention of how industries induce fear through gameplay mechanics, female embodiment and player engagement. Unlike traditional film and books, where engagement is passively experienced, video games require participants to interact with its world, to embody a sense of vulnerability, survival, and curated limited autonomy. Fear is not only seen or heard, but in a small capacity, lived. By building an interactive dataset that highlights these themes present in horror video games, we gain new insights on how these narratives are presented in this unexamined media. 

Environmental Scan
Horror studies have historically been centered on film and literature; scholars such as Barbara Creed have analyzed the “monstrous-feminine” as a figure that is shaped by patriarchal fears about embodiment and reproduction. More recent scholarships extend these conversations into video games, interpreting monstrous female figures not simply as misogynistic constructs but as a resistance against the trope. Works such as Redefining the Monstrous-Feminine: Applying a Postfeminist (Eco)Gothic Reading to Horror Video Games by Jennifer Loring offer frameworks for interpreting witches, ghosts, and vampires as figures aligned with nature and with rebellion against patriarchal structures. Similar analyses of games like Doki Doki Literature Club! Examine how female antagonists disrupt player agency and destabilize typical male-driven themes and narratives in the horror genre (Graham 2025). While there is qualitative scholarship that critiques gendered tropes in horror, there are currently no existing datasets or data visualization projects on the subject. 

Final Product and Dissemination
Now let’s delve into what that deployment will look like. We believe that the process of creating a data visualization is itself scholarship – that in organizing content into a new form, we may reveal some novel insight into that content, or just take a new perspective by seeing the information through a different lens. For this reason, the final output of the project does not have to be a website that is solely available online. What makes this a digital humanities process is how digital tools serve to help us push and question our thinking throughout the process. So while one of the final outputs of this project will be a data visualisation hosted on a public website, another will be a lightweight static copy of the website that can be stored on drives or personal computers, and distributed that way. While this version of the project may lose some of the ease and interactivity of a fully online version, it will also be easier to preserve, as it will not be at the mercy of changing technological standards, and can be viewed with people without stable access to the internet. This way, a record of the scholarship remains, even if the online version itself is quickly left behind by technological development in a way that makes maintaining it untenable. The project is thus both more accessible in the present and accessible to future generations.

As for the online version, it could be hosted on a relatively simple website such CUNY Academic Commons, WordPress or Blogspot. Regardless of where it’s hosted, we would post it along with documentation and our justification for our project. If possible, it could be a good idea to post the static version of the site and a PDF version of the documentation for download there as well, to make the project more accessible in multiple forms.

Works Cited
D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT Press, 2020.

Drucker, Johanna. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly, vol. 5, no. 1, 2011,

https://dhq.digitalhumanities.org/vol/5/1/000091/000091.html

Graham, Hannah. Metalepsis and Mental Castration: Doki Doki Literature Club! as the Cerebral Monstrous-Feminine. Georgia Southern University, Master’s thesis, 2023, https://digitalcommons.georgiasouthern.edu/etd/2976/

Loring, Jennifer. Redefining the Monstrous-Feminine: Applying a Postfeminist EcoGothic Reading to Horror Video Games. 2024. ResearchGate, https://www.researchgate.net/publication/393901879_Redefining_the_Monstrous-Feminine_Applying_a_Postfeminist_EcoGothic_Reading_to_Horror_Video_Games

The Voices of Lunfardo (Revised)

Abstract

The Voices of Lunfardo project seeks to create an interactive dictionary of fifteen lunfardo terms that are widely used in Argentina today. Lunfardo is the slang of the Río de la Plata region (Buenos Aires and Montevideo), originating in the late nineteenth and early twentieth centuries and shaped primarily by the daily experiences of working-class Italian immigrants. Many of its terms became popular through tango lyrics, and, over time, these words moved beyond tango and entered common speech, where they continue to evolve in meaning and usage.

Designed for college-level Spanish students, general Spanish speakers, and lovers of languages, the project situates lunfardo, often mischaracterized as criminal slang, as a language originating in the life and experiences of Italian immigrants. Each dictionary entry will present the lunfardo word, its standard Spanish equivalent, an English translation, sample tango lyrics, multimedia links to the song and video, and an explanation of its cultural significance. Some examples are “morfar” (“to eat”),  “mufa” (“bad luck”), a word that was very much used during the 2022 FIFA World Cup, and “falluto” (“fake”, “dishonest”). On each item page, students will find the lunfardo term alongside its standard Spanish equivalent and English translation. The page will also include tango lyrics featuring the term, with links to audio and video recordings, as well as explanations of its cultural significance in the context of tango and working-class identity. In addition, each entry will provide a link to current media sources, such as podcasts, news articles, or blogs, that demonstrate contemporary use of the term, accompanied by reflections on its historical evolution and cultural implications.

This project seeks to explore the cultural and historical significance of lunfardo through two central questions. First, it asks how the use of lunfardo in tango lyrics functions as a critical archive of immigrant working-class identity in Buenos Aires, capturing the voices and experiences of marginalized communities. Second, the project investigates how each term is used today and the role it plays in contemporary Argentine culture, exploring the ways in which these words have persisted or evolved while continuing to carry traces of their historical and cultural origins.

The Need

Online Lunfardo dictionaries and glossaries already exist. These resources are lexical repositories that list lunfardo terms and brief definitions, offering valuable reference tools for researchers and general audiences. However, they are not specifically designed primarily for Spanish learners and present vocabulary in isolation, without history, cultural context, or explanations of contemporary usage. In contrast, the dictionary proposed in this project is explicitly aimed at Spanish students and emphasizes contextualized and interactive learning. Each entry will incorporate tango lyrics and multimedia links to songs, explanations of cultural and historical significance, examples of current usage drawn from contemporary media, and interactive activities that encourage learners to practice lunfardo terms in meaningful present-day contexts. By integrating language learning, cultural history, and digital pedagogy, this project moves beyond a static glossary and presents lunfardo as a living and evolving component of Rioplatense Spanish. This project addresses these gaps by using Omeka, which allows for the creation of exhibits that function as a digital dictionary, while also supporting multimedia resources and interactive, pedagogically driven activities for student engagement.

Impact and Intended Results

Each term will be represented as an individual page on the Omeka platform, allowing for the organization of rich, multimedia content in a structured format. On each item page, students will find the lunfardo term alongside its standard Spanish equivalent and English translation. The page will also include tango lyrics featuring the term, with links to audio and video recordings, as well as explanations of its cultural significance in the context of tango and working-class identity. In addition, each entry will provide a link to current media sources, such as podcasts, news articles, or blogs, that demonstrate contemporary use of the term, accompanied by reflections on its historical evolution and cultural implications. By using Omeka in this way, the project will combine linguistic, historical, and cultural content in a single, navigable digital space, making it easier for students to explore the terms in both their historical and contemporary contexts. The platform also allows for future expansion and the integration of interactive activities, encouraging active student engagement with the material.

This project will make lunfardo more accessible to a wide audience, from students and scholars to Spanish learners and language lovers. By being presented in university conferences, workshops, and Spanish courses, it will support education and research. Additionally, sharing the dictionary on social media and Spanish-learning platforms will allow people around the world to explore lunfardo in a fun and interactive way.

The Plan

Phase 1: Research and Data Collection (February 2026). We will categorize terms by cultural significance, identify very well-known tangos that use the term, gather modern lunfardo uses interviews, podcasts, and YouTube sources. We will prepare media files: audio, video.

Phase 2: Omeka Platform Development (March 2026). We will install Omeka and configure plugins. We will upload collections and design interactive pages according to the following categories: Definition, Cultural Reflection, Video/Audio, Fun Facts.

Phase 3: Narrative Integration and Public Engagement (April 2026). We will write historical and cultural narratives linking expressions to social events and urban life. We will embed multimedia content: podcast clips, YouTube videos, images, and audio recordings. We will conduct user testing with collaborators and target audiences. We will refine the exhibit and prepare the final Omeka site for public launch.

Project Resources: Personnel and Management

List of experience and responsibilities of each staff member.

  • Natalia Bustos: research oversight and narrative.
  • Aaron Helton: Software evaluation, installation and configuration, deployment, and additional narrative, etc. 

Advisory Panel:

  • Oscar Conde – lexicography consultant.
  • Universidad Católica Argentina Spanish faculty – language validation.
  • Tango scholars – historical and cultural expertise.
  • Digital platform consultants

Final Product and Dissemination

The final product will be a hosted website serving fifteen exhibits on an Omeka installation. It will remain web accessible in its published format. Additionally, it will include a basic toolset in a GitHub repository that will facilitate long term maintenance, such as software updates and the capability of migrating the site elsewhere as necessary. The data itself will reside in the Omeka installation and will undergo regular backup to guard against disaster and maintain continuity. And finally, because Omeka is a full content management system, we will be able to sign in regularly to enter new terms, update or edit existing terms, or make other changes as necessary.

In addition to regular social media updates on BlueSky, which has attracted numerous digital humanities and other academic practitioners, we are submitting a proposal to the upcoming ACH conference, whose themes this year include questions of transnational challenges and how the digital humanities can meet those challenges. We also plan to reach out to the American Association of Teachers of Spanish and Portuguese to participate in some of their events and activities.

 

Project proposal for a *pretty* terrifying interpretative data website on feminist themes prevalent in horror video games

Last semester, the initial goal of the horror_games_feminist_themes project was to create a curated dataset by scraping the “Category: Horror video games” tree on Wikipedia to classify keywords from horror video games that feature a female or LGBTQ+ protagonist. Here’s a spreadsheet of the output .CSV file for convenience. This was my first time constructing a dataset, and my hope was to create one that could eventually help identify and analyze recurring feminist themes, patterns, and harmful tropes within the horror video game genre. I chose to scrape, curate, and then manually review Wikipedia pages because video games involve a wide range of elements, from gameplay mechanics to visual design; it would be nearly impossible to begin an analysis on gameplay alone. Starting with Wikipedia seemed like a feasible place to start. 

So now what?

The project I propose aims to build a website that brings the dataset to life. The creation of a public-facing website that translates a dataset into an accessible and interactive experience, one that can make invisible structures visible by dissecting feminist themes from a medium that is not often analyzed in such a way. I envision that there will be playful design choices to lighten the load of this sort of gruesome topic–something fun and feminine as an entry point to make sense of the genre and medium where patterns can be analyzed and challenged.

Here are some data viz examples from https://pudding.cool/ that I was inspired by while doing a brainstorm scan: 

What question or problem will this project answer? Horror studies is typically centered around film, and video games are often underexamined as cultural artifacts. Scholars like Barbara Creed highlight themes of female monstrosity, embodiment, and patriarchal structures in horror films. For example, the horror genre historically frames female bodies and the reproductive system as something monstrous or abject (Creed, 1993), and this is one of the themes that I have also noticed when sifting through reviewing keywords for my dataset. I’d like to build from these frameworks to examine horror video games as cultural artifacts that expose similar structures and themes, such as patriarchy and embodiment. 

What audience will this serve?

  • People who are interested in games, horror, and feminism.
  • Students who are interested in media/game studies and/or gender studies
  • The (female and LGBTQ+) gaming community(?)
  • This project contributes to DH by applying feminist principles of DH not only to interpret “data as capta” (Drucker, 2011) but also to design a project that analyzes the medium of video games, which, as mentioned previously, is often difficult to dissect and comprehensively analyze compared to other media.  

Tentatively…

the final product is some sort of data viz website, but we might need to refine the data a bit more. This may include refining the keywords and classifiers, or scraping more titles. We might need to narrow down the scope. One thing I found particularly interesting while constructing the dataset was the differences between horror games published in the 90s to early 2000s versus more recent titles. This could be one of the ways to help centralize and narrow the scope of this project. 

Tools, skillsets, and various roles I envision us needing:

Some of these roles may be merged, shared, or rotated amongst other team members (based on individual preferences) through the different stages of the project!

  • Web-developer:
    • HTML/CSS/JS 
    • Build out the site
  • UX & visual designer
    • Design the site’s color, typography, and layout
    • Tools tbd.
  • Data curator & researcher
    • Python (for pulling data)
    • Review Wikipedia pages and validate keyword entries
    • Refine thematic keywords/classifiers
    • Help document interpretative decisions
  • Writer & content development
    • Draft and edit website textual content
    • Help shape the project’s voice and tone
    • Content creation for the chosen social media platform

Potential barriers and questions:

  • Wikipedia bias: I want to acknowledge that creating a dataset using keywords pulled from Wikipedia can have its limitations and biases. 
  • How can the interface be used to display data and invite engagement without minimizing gendered-violence, trauma, or harm?

 

A Digital Intertextual Concordance of Female Epics

NB: When I saw all the great proposals at the end of last semester, I wasn’t sure I would pitch mine this semester, not because I don’t believe in my proposal, but because I wanted to work on everyone else’s projects too! I’m pitching it now both because I still want to do it, and perhaps to prompt others to pitch their projects.

This is an extract, cleaned up to incorporate the feedback I received, of my Fall Semester proposal, comprising the Abstract, the Enhancing the Humanities portion of the Narrative, a brief Environmental Scan, and the Final Product.


My project seeks to compile and exhibit a digital intertextual comparative concordance of themes that occur in epics authored by or attributed to women authors. The initial phase will focus on a small corpus comprising three themes (death, love, and vengeance) across each of three female epics, with later phases covering more themes and epics. Leveraging the work in Approaches to the Anglo and American female epic, 1621-1982, edited by Bernard Schweizer, it analyzes Telemachus by Anna Seward (unfinished as of 1809, officially published in 2016), Psyche by Mary Tighe (1805), and Aurora Leigh by Elizabeth Barrett Browning (1856) and facilitates questions of if or how the language used by women writing in traditionally male spaces to express these themes reflects feminist perspectives, as well as any insights that can be gained by comparative digital analysis.

The project has three main digital outputs: a toolset for extracting and documenting specified themes, a dataset comprising the extracted themes, and a Web-accessible display of those themes. The creation of this toolset, dataset, and Web-accessible presentation layer will further allow for future expansion, via, for example, text selections in other languages, additional translations, and selection of additional epics. While it aims to be neither a comprehensive collection of female epics nor a primary source for the epics it does include, the project nevertheless highlights the relative absence of such digital collections and serves as a thematic reference for scholars of epic literature, especially those interested in female epic literature.

Enhancing the Humanities

Historically, concordances have been laborious creations made for intense scholarship of works that, because of their cultural importance, were read and re-read, such as religious texts. Father Busa’s Index Thomisticus, created with the assistance of digital computers, is generally regarded as the beginning of digital humanities as a discipline. In the intervening years, more powerful computing technologies have made the creation of concordances per se easier, and at the same time, the rise of natural language chatbots and fuzzy searches based on statistical sampling presents us with the foregone conclusion that concordances lack comparative value in the face of powerful modern search technologies. Looking beyond the marketing terminology, however, we see that mere statistical correlations yield decontextualized results arranged according to internal algorithmic relevance. Concordances, as tools positioned specifically for textual and intertextual comparison and interpretation, remain vital parts of the landscape for examinations of the use of language to convey concepts, and digital concordances offer a chance to be more deliberate in building human-scale searches. The central questions afforded by this concordance are exploratory, focusing as it does on what we should automate, but the relative scarcity of scholarship focused here underscores the importance of conducting the scholarship in the first place.

In his introduction to Approaches to the Anglo and American Female Epic,1621-1982, Bernard Schweizer suggests that the epic is perhaps the most male-coded genre of literature, “so much so that epic and masculinity appear to be almost coterminous” (Schweizer et al 1). This gender assumption is apparent from several standpoints: first, of who has historically produced epics; second, who defined and formed the body of the genre’s critics; and third, the genre’s main characters. A fourth standpoint could be the themes of epics, but this question is afforded in part by the outputs of this project. And yet, as Schweizer and his contributors demonstrate, British and American women have been producing epics at least since the 17th century. Production of epics by women authors is not limited to modern American and the United Kingdom, however. At his blog, Interesting Literature, Dr. Oliver Tearle lists the Sumerian poem The Descent of Inanna, attributed to the high priestess Enheduanna, as a particularly early example of the female epic, suggesting that “if Enheduanna was the author of this poem, … that makes it the oldest work of poetry written by any named poet, male or female” (Tearle). His article goes on to list six other epic poems, half of which, had they all been published at the time of their writing, all should be in the public domain. There are likely to be others that have either been misattributed or forgotten, awaiting rediscovery.

Centering women-authored and attributed texts in Digital Humanities scholarship will bring more attention to these works, elevating them in the public consciousness, as well as the other non-epic works by the included authors. Additionally, it allows scholars interested in the use of figurative language a ready platform to explore how or even if the use of such language in female epics may differ from that used in male epics. 

This project focuses on three of the epics identified by Schweizer. The first, and most problematic from a sourcing standpoint, is Anna Seward’s epic poem Telemachus. In her introduction to The Collected Poems of Anna Seward, editor Lisa L. Moore writes that Seward had arranged with Walter Scott to publish a complete collection of her poems. Among the collection was one unpublished poem, an epic she considered her masterpiece, and which she “took special care to recommend … to Scott’s attention.” (Seward 37). When he published the collection in 1811, two years after Seward’s death, he had excluded Telemachus with no explanation. The poem would not appear in print until Moore laboriously transcribed it from the original manuscript and included it in her 2016 collection of Seward’s poems. What this means for scholars is that unlike the other epics in the selection, there is a question about copyright for Telemachus, and there is no public domain source from which one can acquire it. Careful attention is paid in this project to avoid full replication of the text, arguing that extractions of the text for concordancing and other search purposes constitutes a Fair Use claim.

The second and third epics included, having been published in 1805 and 1856 respectively, are firmly in the public domain. Mary Tighe’s Psyche, a six canto allegorical poem written in Spenserian stanzas, and Elizabeth Barrett Browning’s Aurora Leigh. As these were chosen specifically because of their treatment in Approaches to the Anglo and American Female Epic,1621-1982, they are naturally limited in their linguistic and temporal representation. Further, they were chosen because, with the exception of Telemachus, of their publication dates. This places some constraints on the broad applicability of the project and its tools and outputs in their initial phases, but an output of the project includes support to locate and process additional texts from beyond this immediate set.

This project naturally raises questions about what could potentially be included in the future. While it proposes eventually to encompass all female epics, a main question is what counts as an epic? Is it limited to poetry, or are prose works also applicable? If so, the field expands again, but a key consideration is to maintain some boundaries on the genre to keep the overall scope constrained. As the project grows, other types may be added, but the continuing mission will be to maintain sufficient constraint to demonstrate the related concepts. Additionally, the issue of copyright vis-a-vis translations of older, especially ancient and antique works, may impact selection even while the existence of multiple translation offers interesting opportunities to compare interpretations of figurative language from translator to translator. To account for this, the project will review concepts in the concordance annually, new works will be added when possible, and the project will maintain a registry of desired works that are unavailable because of copyright constraints. Additionally, the project is committed to obtaining permission from translators for less available texts.

A final consideration is the methodology. Existing digital concordances facilitate keyword searches for words, word forms, or phrases that occur in the works included. In most cases, the searches are limited to a single work. While a broad keyword search is possible and perhaps desirable, this project proposes both a curated approach focused on thematic subjects that are known to occur in the included works, and a fully intertextual display of those occurrences across all included works. By curating the theme selection, the intent is not to limit the possible explorations afforded by the texts and interface, but rather to help orient users to examine major themes that occur in the works. The selection of three initial themes (love, death, and vengeance) speaks to some of the timeless aspects of epic literature, but in no way is it asserted that these epics are limited to or even mainly about these subjects. They are mere starting points for additional analysis, and it is the intention of this project to set the basis for additional curated theme selections even as more works are identified for inclusion. During the execution of the project, the thematic selection may be adjusted to accommodate what the project team discover, but only if this has no impact on delivery.

Brief Environmental Scan

In the interest of brevity from here, I will just list the projects that informed mine.

  • The Index Thomisticus
  • Open Source Shakespeare
  • The Electronic Dictionary of Armenian Bibliography
  • The Chinese Text Project
  • The Hyper-Concordance at the Victorian Literary Studies Archive
  • Skovoroda Online Concordance
  • PHI Latin Texts
  • Intertextual Dante at the Digital Dante Project
  • The Women Writers Project at Women Writers Online

Final Product and Dissemination

The final output of the project will live in GitHub as a code repository containing the extracted themes properly annotated with the designed metadata schema, the tools and scripts used to generate the extractions, and the website, including all narrative content, that exhibits the Concordance. Additionally, the site itself will be hosted online at a location to be determined and secured by the project team. In addition, the project lead and co-lead will share links to the final project via their various social media accounts, namely BlueSky, as well as the DH program lighting talks, the CUNY IT Conference, and potentially other conferences. 



Project Proposal: The Humanities AI Hallucination Database

Brief Project Overview: This project will build a public-facing, interactive web archive for the Humanities AI Hallucination Database. We are starting with a significant advantage: a pre-generated seed dataset of 100 records documenting AI hallucinations related to Black histories. However, our goal this semester would be to expand this scope to document algorithmic erasures in Latinx, Asian-American, and white American histories as well. We aim to create a comparative dataset that reveals how generative AI distorts knowledge across different identities.

The Problem: AI chatbots are fast becoming de facto historians for students. However, these models often invent figures, misattribute theories, or erase narratives. Failures that disproportionately affect marginalized groups. The gap: There is no centralized, standardized repository that educators can use to show students specifically how these models fail across different cultural contexts. We need empirical data to verify if, for example, AI “hallucinates” Asian-American historical figures differently than it does Black feminist scholars.

Intended Audience

  • Educators & Librarians: Seeking diverse examples for teaching source evaluation and information literacy.

  • Ethnic Studies Scholars: Researchers analyzing comparative patterns in algorithmic bias.

  • Students: CUNY undergraduates learning to navigate generative AI tools critically.

Contribution to DH & Potential Impact This project contributes to Intersectional Digital Humanities and Critical Data/AI Studies. By moving this data from a spreadsheet to a public web interface, and expanding it to include Latinx, Asian-American, and white American histories, we are creating a vital Open Educational Resource (OER). Team members will help build a tool that operationalizes “epistemic justice” for multiple communities, giving educators the concrete evidence they need to challenge algorithmic authority in the classroom.

Final Product (What we will build) We will build a searchable, scalable web archive.

  • The Web Archive: A website where users can filter hallucinations by demographic (e.g., “Black History,” “Latinx History,” “Asian-American History”) and error type.

  • The Expanded Dataset: We will clean the existing 100 records and generate/verify 20-30 new records for Latinx and Asian-American topics to demonstrate the database’s capacity for growth.

  • Data Visualizations: Charts comparing error rates or types across different demographic categories.

Feasibility Assessment

  • Current Status: High feasibility with room for growth. We have a “Day 1” dataset (100 records) to start building the site immediately. The expansion into new histories provides a meaningful research task for the semester without overwhelming the team.

  • Skills Needed:

    • Frontend Dev: To build the site structure and search interface.

    • Research Leads (Crucial): To generate and verify new prompts regarding Latinx, Asian-American, and white American history (using JSTOR/Library of Congress, etc.).

    • Data Curator: To ensure metadata standards match across different historical categories.

  • Barriers: The main challenge is verifying new data accurately.