Category Archives: Group Project Updates

A not so terrifying update

Our project name finally came to us! We fell in love with A Pretty Terrifying Project, so we thought, why not just keep it? So our official project name is now…*drumroll* A Pretty Terrifying Project: Examining Feminist Theme Co-occurrences Across Horror Video Games. Ultimately, this was easy to come up with once we narrowed the scope of our project.

Everything seems to be on track, and we have a clear plan heading into spring break. The team has some smaller housekeeping things to take care of before spring break. Truly will be annotating some of the code in our current repo so I can navigate the existing code when working on it, and Michael will be updating some of the visualizations. In the next phase, we are focusing on website development, some close reading, and writing. I’m looking forward to working on the website during the break. I enjoy web development, so this is sort of a fun thing to do during my very uneventful spring break.

The team and I have each picked a game that falls into one of our key co-occurrences to do a close read during the spring break. Everyone’s pretty excited to work on these since we all chose areas and games we’re particularly interested in. Michael will be looking into Doki Doki Literature Club (girlhood + captivity), a cute, yet psychologically horrific game, while Truly will be working on Bloodborne (motherhood + violence), a dark, gothic, Lovecraftian-esque game. I, on the other hand, haven’t picked a game yet, but will focus on one that explores the co-occurrence of embodiment and violence. I’m interested to see how this collaborative writing experience pans out for us.

We’re trying to keep it light and enjoy the break, but we also took some time last night to plan ahead for after spring break and the final stretch! We’ve decided that ‘finished’ for us is a well-developed website that displays our visuals that holds important contextual analysis, such as related theoretical frameworks and close readings. We feel that this part may be the bulk of the work going forward, and we want to spend time ensuring we do thorough research and writing at this stage.

I think that the pretty terrifying team has a solid sense of what needs to be done to hit our upcoming milestones. Enjoy the spring break, everyone! Don’t work too hard!

Building the AI “Hallucinations” Website

We’re excited to share the initial draft of the website for our project. While we are still finalizing a permanent domain name, you can explore the current build over on GitHub Pages. Right now, the site is admittedly a bit barebones, but the core mission is already in motion. We are putting AI “hallucinations” front and center to critically examine how generative models produce inaccuracies, particularly regarding marginalized communities.

The project is fundamentally about making the invisible visible. The database currently features a straightforward structure that allows users to examine specific hallucinations alongside the type of distortion and the model responsible. It is a necessary tool for documenting and analyzing how historical and cultural data is processed by these systems.

Currently, the biggest conversation behind the scenes is our visual direction. We are working closely on fine-tuning our color stories to ensure the design reflects the gravity of the research. For now, the architectural foundation of the data is there, and the design will soon catch up to the weight of the work.

Lunfardo Website Draft: Double the Fun

First thing’s first: https://www.vocesdellunfardo.org/

After some hosting shuffling, we’re back on track. Of course the work didn’t stop just because the public facing site was down. We’ve gathered our landing page description and put together sections covering our definition (what Lunfardo is), the methodology behind the term collection and description, the objectives of the site, and the biographical info for the project team.

Natalia provided much of the boilerplate text in Spanish, and I’m doing my best to provide the English translations. Similar to Truly’s assertion, the whole thing is likely to change from visit to visit, possibly even stylistically, though I’m pretty happy with what I’m seeing at the moment on that front (accessibility of the header notwithstanding).

Onward!

A Pretty Terrifying Website Draft

Here’s our site so far! https://trulyj.github.io/feminist-horror-games-site/

Currently the URL is linked to my github account, but we might change the URL down the line. So far we have a landing page with some placeholders for data visualizations, an “about the project”/methods page and a “meet the team” page, but I’m still actively working on developing it, so by the time you click that link there may be more there!

Outreach for AI Hallucinations Project

For our outreach and social media plan we are going with a strategy that focuses on building a solid and very engaged community around the project and deprioritizes mainstream platforms with fickle algorithms and unstable visibility criteria that require more trouble than their worth. It’s divided into three phases that correspond to the project’s own development.

Phase one is for “Behind-the-scenes” work and community building. It begins with in-network outreach where teammates are tasked with talking about the project with at least ten peers, friends, and mentors. The goal here is to turn this project into a real thing we are attaching our names and faces to, and to begin integrating community-based stakeholders into our thinking of who this project will serve. As part of this phase, we’ll also set up appointments with the Digital Fellow and our own mentors in order to formally get advising on the project and, again, solidify stakeholder relationships. In order to track community growth, we will ask folks for their email and collect them in order to build a mailing list through which we’ll launch the project. By the end of this phase we should have collected a total of 30 emails.

Because this project will take on its final (for now) form as a website, we’ve given considerable thought to how links are shared and preserved in a digital landscape ruled by the ephemerality of algorithmic platforms. We are thinking of ways to land our website’s link to people’s Bookmark folders, Notion pages, Resource guides, spreadsheets, and digital toolkits. We are imagining a user who is “very online,” uses generative AI at work and in their personal life, but otherwise takes great care to educate themselves on everything they consume and engage with. These are people who make and share spreadsheets for fun, and are always looking for new ways to organize their chaotic digital lives. This person is likely an Are.na user (a platform like Pinterest but for designers, artists, academics, “technologists,” etc.) So in our effort to spread this link as far as wide as it can go, this phase will also identify up to 10 link repositories from all over the internet and, in parallel, build an dedicated Are.na board for the project where we will collect research papers, notes, and inspirations for the visual identity of the project as a way of slowly and quietly embedding the project into a platform that will connect it to its eventual user. 

The Are.na board will also serve as a collection of inspirations and resources for the project itself, which will be used to develop a visual identity (colors, typefaces, imagery) for the deck Sasha will use to present last semester’s version of the project during NYC Open Data Week on March 25. For this presentation, we’ll also add a slide with a QR code that links audience members to a form where they can enter their email to stay up to date on the project’s future. By the end of this phase, we hope to have a robust Are.na board, a list of link repositories, and 50 emails that reflect our in-community outreach efforts.

We have decided that, for the state of this project, it is not worth building an Instagram profile or comparable social media platform. Future iterations of this project might benefit from building a social media presence, especially on Instagram. But, for now, the effort required to make it worthwhile is simply not commensurate with what we can expect to get out of it. A successful Instagram launch requires near-daily posting across Posts, Stories, and Reels. High-quality assets and constant engagement and we’d risk distorting the project in order to satisfy the platform’s needs.

Instead, we’ll focus on “planting” the website’s link all over the internet and setting it up to spread like spores in the wind. We will submit it to any and all relevant link repositories, email it to our mailing list of what we hope are at least 100+ interested parties, and treat each page link within the website as an opportunity to share the project: We expect to circulate the visualizations, literature excerpts and editorial components on Are.na, where the behind-the-scenes posting will have laid the groundwork of audience building. These include We will email or mailing list when we build a Coming Soon page, to invite folks to the final presentation, and on launch day. We might consider drafting a brief newsletter series adapting our web copy for the following topics: On Methodology, Hallucinations As Cultural Artefacts, “Hallucination,” The Information Ecosystem (which for web-dev purposes will be finalized by April 30th). Or, we might produce a print pamphlet version of this copy and visualizations — to be determined in the final quarter of the project’s lifespan. Our hope is that our professional lives and networks will thus earn us more opportunities to share and discuss this work, and that every phase can benefit from the last.

Lunfardo Outreach Plan

The interactive dictionary of lunfardo is designed to reach college students studying Spanish at the intermediate (B2) level, students preparing to study abroad in Buenos Aires, international students in Buenos Aires, and instructors teaching Intermediate Spanish as well as courses on conversation and composition. By focusing on fifteen commonly used lunfardo terms, the project introduces learners to everyday language used in Argentina, helping them better understand local culture and communication practices.

The primary audience consists of undergraduate Spanish learners who may travel to Buenos Aires through study-abroad programs. For these students, the dictionary provides an accessible introduction to colloquial vocabulary that is not included in traditional textbooks. The resource will also serve instructors teaching intermediate Spanish courses at college level by offering a digital tool that can be incorporated into classroom activities focused on cultural competence, and linguistic variation in Spanish. The team has already contacted professors at local universities who may implement the tool in their classes and in Departmental websites.

In addition, the project is intended for international students currently studying in Buenos Aires who may encounter lunfardo expressions in daily interactions with other students, host families, and local communities. By providing contextualized examples and interactive activities, the dictionary supports students’ linguistic adaptation to the city and its local language uses. The team will contact the offices of international studies at universities in Buenos Aires that offer Spanish classes to the international students that arrive to study there. The team will also contact Spanish Schools in Buenos Aires, which offer Spanish classes to tourists and students visiting and living in Buenos Aires.

The project will be hosted on a publicly available website created with the DokuWiki platform. In this way, the project will easily be shared via email through academic and education communities. Outreach efforts will also include presentations at conferences, such as the one organized by the Association for Computers and the Humanities (ACH 2026), to which the team has already submitted an abstract. Conferences and workshops organized by the Modern Language Association and the American Association of Teachers of Spanish and Portuguese will also be an option. Finally, the project will be distributed through teaching communities in social media. These strategies aim to encourage the adoption of the resource in intermediate Spanish courses and to support students preparing for study abroad experiences in Buenos Aires.

 

The Terror Team: Social Media Plan

While we still may have a few things to work through a bit more, here are our thoughts on our social media plan!

Social Media and Outreach Plan

Pretty Terrifying Project

Chosen Platform

CUNY Academic Commons will be the chosen platform for posting project updates in a microblog style form. The microblog will be linked to our main project website. 

Post Frequency & Content

The goal is to post at least one blog per week. The team plans to rotate the post responsibility every week. While the team member is responsible for their week’s post, they will draft posts in our Google Drive so team members can approve and edit them before posting.

Each week’s update may include the following information:

Written Updates:

  • Milestones achieved during the week
  • Progress on dataset development/visualization design
  • Challenges during research development
  • Lessons learned/future considerations during the development of the project

Multimedia Updates

  • Screenshots of draft visualizations or graphics
  • Screenshots of the website interface
  • Possible videos showing interactive elements
  • Documentation of creation processes

This allows the audience/readers to follow the project’s development and understand the decisions made as it evolves over time. It also provides transparency on the research process rather than presenting the final results. 

Outreach

The project manager will reach out to CUNY Academic Commons groups that might overlap with our project’s interests to ask to share a forum post about our project and include a link to our project updates. The project manager will also research subreddits like r/digitalhumanities and r/gamingfeminism to create posts that share who we are and where to view our website and project updates.

Pre-Launch Checklist

  • Finalize project title
  • Deploy the CUNY Academic Commons website

 

DMP: The AI “Hallucinations” Project

The AI “Hallucinations” Project, incorporating the Black Knowledge Erasure Dataset (BKED) and the forthcoming Puerto Rican history expansion, is a research initiative dedicated to the systematic documentation and analysis of algorithmic epistemic erasure. This Data Management Plan (DMP) outlines the lifecycle of the data, from collection and verification to its long-term preservation on the project’s website.

Variable List

Variable Description
id Unique integer identifier for the record.
prompt_id Foreign key referencing prompt dataset (e.g., P001).
model The model that generated the response (e.g., gpt-5, gemini-2.5-flash).
model_response The full, unedited text output from the model.
error_type Controlled vocabulary label (e.g., erasure_by_omission, factual_error).
error_description Detailed qualitative explanation (human annotation)  of the identified error.
verification_source URL or citation used to verify or refute the model claim.
category Topical category (e.g., black_texts_authors).

Data Collection

Data collection utilizes custom Python scripts to interface with model APIs. Sources for ground-truth verification include the Schomburg Center for Research in Black Culture, the Library of Congress, and the Centro de Estudios Puertorriqueños at Hunter College.

Raw Model Outputs: JSON files containing direct responses from Large Language Models (LLMs), specifically GPT-5, Gemini, and Claude, triggered by specific historical and culturally relevant prompts.

Annotated Datasets: CSV files featuring human-verified annotations. Human-Verified Annotations are corrections grounded in archival and library sources. This will include categorizations of “error type” (e.g., invented citation, historical omission) and links/citations to “gold-standard” historical evidence. 

Standards and Metadata

To ensure interoperability and long-term utility for researchers/users, the project adheres to the following standards:

  • Schema Consistency: Data is structured in JSON and CSV formats to facilitate machine readability and easy integration into different analysis environments.
  • Documentation: Documented following C19 Data Collective standards. A comprehensive Data Dictionary and README.md are maintained in the GitHub repository, defining every variable and describing the methodology for human annotation to ensure reproducibility.
  • Quality Assurance: Each entry in the BKED and the Puerto Rican dataset undergoes a “Human-in-the-Loop” process, ensuring every “correction” is cited against a reputable archival source or peer-reviewed publication.

Storage

  • Version Control: GitHub serves as the primary collaborative environment for code and data storage.
  • Security: As the project does not collect Personal Identifiable Information (PII) or sensitive human subject data, security measures focus on integrity and provenance—preventing unauthorized modification of the archive through strict branch protection on GitHub.

Access, Sharing, and Licensing

The project prioritizes open access:

  • Public Repository: All code scripts and datasets will be hosted on a public GitHub repository.
  • Licensing: All datasets will be released under a Creative Commons Attribution-Noncommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. The underlying codebase is licensed under the MIT License.
  • Discovery: The primary access point is the project’s GitHub repository, the forthcoming website, an Are.na channel, etc. Data will be structured in machine-readable formats (CSV, JSON) to facilitate downstream research by the digital humanities community.

Archiving and Long-term Preservation

Upon project completion in May, long-term preservation will be secured through:

  • CUNY Academic Works: The final white paper and a “frozen” version of the dataset will be deposited in CUNY’s institutional repository.
  • Web Archiving: The project website will be submitted to the Internet Archive’s Wayback Machine and preserved as a static site (HTML/CSS/JS) to minimize future technical debt and ensure longevity beyond the current hosting budget.

Voices of Lunfardo Data Management Plan

This data management plan will be implemented and managed by Aaron Helton under the project supervision of Natalia Bustos.

The data produced by this project will include the Lunfardo terms that have been collected and described during the project’s activities, project narratives, a final report, Dokuwiki pages for each, associated page metadata expressed in Dublin Core, and the set of Dokuwiki code configurations, modifications, and template customizations necessary to re-create the site from backup.

Each description includes definitions in English and Spanish, instructional exercises, a brief narrative on cultural reflection, and links to tangos in which the terms appear. During the project, the data will be collected in a Google Docs document for editing and revision prior to transfer into the target system. The Google Docs document will not be retained. In their final format, the terms and their narrative elements will be stored on disk as plain text Dokuwiki objects consisting of simple markup (the Dokuwiki syntax) alongside the elements themselves. Furthermore, all terms will be available in Spanish and English, with Spanish being the default presentation language, which doubles the files and metadata necessary. 

The data will be made publicly available via the Voces del Lunfardo website (https://www.VocesDelLunfardo.org), whose contents will be automatically backed up on a daily basis. Project staff will retain their copyright and other intellectual property rights of the data produced, but have agreed to license these materials under a Creative Commons Attribution ShareAlike 4.0 (CC BY-SA 4.0) license to allow for the greatest dissemination of the materials. 

Every effort has been made to ensure the continued accessibility of the data produced by this project. The chosen hosting solution includes automatic daily backups, of which the ten most recent are available. Additional periodic backups will be shipped to a separate server to provide offsite backup and recovery capabilities. 

Pretty Terrifying Project

Data Management Plan

What data will you collect or create?

This project produces a curated dataset examining feminist themes within horror video games. The following data will be collected for each title:

  • Game Title
  • Wikipedia URL
  • Developers
  • Female Developer Present
  • Release Date
  • Platforms
  • Horror Subgenre
  • Player Perspective
  • Female Protagonist Playable Character?

Themes and keywords related to:

  • Motherhood
  • Domestitcity
  • Trauma and Mental Illness
  • Embodiment
  • Captivity
  • Violence
  • Sexualized Violence
  • Girlhood
  • LGBTQ+
  • Creed Archetypes
  • Suggested Supporting Evidence

How will the data be collected or created?

Data was collected through a process that began with a Wikipedia web scrape using Python, BeautifulSoup, and wikipediaapi to gather all pages and subcategories within the Category:
Horror_video_games. Data such as the game’s title, URL, and category were extracted to compile a curated list of games featuring female characters. A classifier was then built using control phrases and/or keywords to identify games featuring female characters and potential feminist themes. Data collection is ongoing and continues to be reviewed using both computational and manual methods.

Documentation and Metadata

Basic documentation and process methodology will be provided.

Documentation will include the following:

  • Data_Dictionary
  • Methodology
  • Basic Software Requirements
  • README

Ethics and Legal Compliance

How will you manage any ethical issues?

This project does not contain any personal information of individuals. All materials are publicly
available.

Storage and Backup

How will the data be stored and backed up during the research?

During the research and development stage of this project, all information will be stored in cloudstorage using the following platforms:

  • Google Drive
  • GitHub

Both will have regular updates by team members and will be automatically stored.

How will you manage access and security?

The final dissemination of this product will be a public-facing digital website accessible to all. All data collected will also be openly available. During the research phase, only team members and the faculty advisor, as needed, will be granted access to the platforms used.

Selection and Preservation

Which data are of long-term value and should be retained, shared, and/or preserved?

  • The dataset of horror games and their feminist themes itself
  • The data visualizations created from that dataset
  • The code for the website presenting those visualizations
  • Any accompanying documentation and written material connected to the project

What is the long-term preservation plan for the dataset?

We will store the dataset in a data repository such as Kaggle, and will look into institutional
repositories like CUNY Academic Works for the preservation of other aspects of the project.
Additionally, we plan to keep the website accessible for as long as possible, using free hosting on GitHub Pages.

Data Sharing

How will you share the data?

Final data will be shared via GitHub and openly available to the public for review or research
purposes

Are any restrictions on data sharing required?

Data produced through this project will be available under the Creative Commons License: CC-BY