Lunfardo Outreach Plan

The interactive dictionary of lunfardo is designed to reach college students studying Spanish at the intermediate (B2) level, students preparing to study abroad in Buenos Aires, international students in Buenos Aires, and instructors teaching Intermediate Spanish as well as courses on conversation and composition. By focusing on fifteen commonly used lunfardo terms, the project introduces learners to everyday language used in Argentina, helping them better understand local culture and communication practices.

The primary audience consists of undergraduate Spanish learners who may travel to Buenos Aires through study-abroad programs. For these students, the dictionary provides an accessible introduction to colloquial vocabulary that is not included in traditional textbooks. The resource will also serve instructors teaching intermediate Spanish courses at college level by offering a digital tool that can be incorporated into classroom activities focused on cultural competence, and linguistic variation in Spanish. The team has already contacted professors at local universities who may implement the tool in their classes and in Departmental websites.

In addition, the project is intended for international students currently studying in Buenos Aires who may encounter lunfardo expressions in daily interactions with other students, host families, and local communities. By providing contextualized examples and interactive activities, the dictionary supports students’ linguistic adaptation to the city and its local language uses. The team will contact the offices of international studies at universities in Buenos Aires that offer Spanish classes to the international students that arrive to study there. The team will also contact Spanish Schools in Buenos Aires, which offer Spanish classes to tourists and students visiting and living in Buenos Aires.

The project will be hosted on a publicly available website created with the DokuWiki platform. In this way, the project will easily be shared via email through academic and education communities. Outreach efforts will also include presentations at conferences, such as the one organized by the Association for Computers and the Humanities (ACH 2026), to which the team has already submitted an abstract. Conferences and workshops organized by the Modern Language Association and the American Association of Teachers of Spanish and Portuguese will also be an option. Finally, the project will be distributed through teaching communities in social media. These strategies aim to encourage the adoption of the resource in intermediate Spanish courses and to support students preparing for study abroad experiences in Buenos Aires.

 

The Terror Team: Social Media Plan

While we still may have a few things to work through a bit more, here are our thoughts on our social media plan!

Social Media and Outreach Plan

Pretty Terrifying Project

Chosen Platform

CUNY Academic Commons will be the chosen platform for posting project updates in a microblog style form. The microblog will be linked to our main project website. 

Post Frequency & Content

The goal is to post at least one blog per week. The team plans to rotate the post responsibility every week. While the team member is responsible for their week’s post, they will draft posts in our Google Drive so team members can approve and edit them before posting.

Each week’s update may include the following information:

Written Updates:

  • Milestones achieved during the week
  • Progress on dataset development/visualization design
  • Challenges during research development
  • Lessons learned/future considerations during the development of the project

Multimedia Updates

  • Screenshots of draft visualizations or graphics
  • Screenshots of the website interface
  • Possible videos showing interactive elements
  • Documentation of creation processes

This allows the audience/readers to follow the project’s development and understand the decisions made as it evolves over time. It also provides transparency on the research process rather than presenting the final results. 

Outreach

The project manager will reach out to CUNY Academic Commons groups that might overlap with our project’s interests to ask to share a forum post about our project and include a link to our project updates. The project manager will also research subreddits like r/digitalhumanities and r/gamingfeminism to create posts that share who we are and where to view our website and project updates.

Pre-Launch Checklist

  • Finalize project title
  • Deploy the CUNY Academic Commons website

 

Excuse Me While I Scream

Developing a website always seems fun and cool until you’re sitting there with about 100 tabs open on your web browser trying in vain to figure out how to do something simple like center some text on the page, after having spent hours battling dependency errors and just trying to work out the file structure of the theme you’re using earlier that day.

Excuse me while I scream.

Yelling at computers aside though, this week did bring a lot of progress, even if it was hard fought. Programming is inherently an unpredictable pursuit. When you write something, the words usually come out as you expect them too (unless you’re writing on a really messed up keyboard). Writing is just a matter of figuring out exactly what you want to say and how to say it. Literally putting the words on paper is the easy part. But when coding, you can know exactly what you want to do and how, and you can still be foiled by something tiny you overlooked somewhere. Making a typo when you write? It’s fine, maybe someone will laugh at you a little, but that’s the worst that’s going to happen. Making a typo in code? Whoops, the whole thing might not run now!

So yeah, coding can be frustrating. Especially when you have to do it with a deadline. But! When you do finally get something that works and looks cool, then it’s a great feeling,  like you’ve reached the light at the end of a long and spooky tunnel.

Of course, sometimes the light is a trap. Anglerfish are known for lurking deep in the dark zones of the ocean, shining their lights like beacons. And when small fish draw near, eager for a little light and warmth – the anglerfish chomps down on them. In horror, this kind of twisting of expectations is common. One example of this that we’ve encountered repeatedly in this project is horror around motherhood. Stereotypical depictions of mothers paint them as nurturing and loving. This is an image that stems from the expectations placed onto women to manage the household and remain devoted to their children above all else, even while the children’s father might go out to work and barely see the kids. But as Barbara Creed’s feminist horror archetypes show, motherhood can also frequently be scary. According to Creed, the figure of the mother in horror can represent a primal terror that seeks to consume what it once birthed, a manipulator molding her children in her image and using them for her gains, or a progenitor of evil who births monsters, willingly or not. These archetypes, though still pushing women into narrow boxes in some sense, also push back against the idea that women love being mothers and are happy to suppress their own senses of self for the purpose of caring for their children. It takes this in the extreme opposite direction, by showing us mothers who want to kill and consume or use their children as tools. And that last archetype, what Creed calls the “Monstrous Womb”, depicts women who give birth to terrors they may not want. I can imagine seeing horror like that being quite cathartic to women who were pressured into motherhood or who felt there was something wrong with them for not feeling only positive and nurturing things toward their children. Because sometimes even normal kids can be like monsters, and the depiction of the monstrous womb legitimizes that feeling for women typically expected to just be docile and happy.

Anyway, that post went all over the place. I guess my conclusion is this. Your kids might act like monsters. The websites you make might crash. No matter how much mothers are traditionally depicted as kind and perfect carers for their children, or hackers are depicted as techno-magicians who can effortlessly get computers to do what they want – in reality, there’s always the chance of something going wrong, of terror to befall you no matter how hard you work to fit into your designated role. It’s just another way that it’s interesting to explore horror as an exaggeration of the mundane.

The website is working though! For now. The terror of debugging will return though, no question about it.

DMP for Life

I have to confess that this week got away from me, and I ended up being much busier with my day job and thwarted my plans to steam company time and join Sasha’s skill share session, a GitHub and VSCode walkthrough that would prove foundational to our data curation efforts further down the line. But Sasha was generous enough to record the session and include the links into our shared documents, so I am all set to walk myself through the workshop and touch base with questions for her next week. All of this to say the first data-management technique I need to apply to everything I do, is keeping a better calendar!

All jokes aside, from some creative coding classes and working in an editorial context where a piece of writing goes through as many as four versions, each of them shared among multiple parties and requiring strict version control, I had picked up on a few tricks over time: Smarter file naming conventions, copies spread out across multiple sources and locations, documentation (README’s especially), version control. But Steve Zweibel’s Research Data Management presentation was so useful I saved the link to all my workspaces. Just having access to the language of data management — types of research data (by origin or form), data life cycle, FAIR principles, etc. — gave some purpose and unification to what have long been a set of haphazard practices I use to work.

When I think about data degradation, I am more likely to think about link rot or bit rot, the actual deterioration and degradation of the mechanisms of saving information. But Michener’s “Data Entropy” diagram charts how distance from data’s original context and the people who created it are factors that data management is meant to mitigate. We take pictures, keep scrapbooks, memento boxes, travel souvenirs, and ask our elders to rehash the same stories so we can keep that immediacy that binds us to source of information. Like links, bits, and pictures, any form of data collection is subject to deterioration. So in truth, link rot is not too different from dry rot and in both cases, “future you is your first user.”

Mentally Expanding Definitions of Data Management

“Data management” initially evokes a mind image that illustrates the act as simply saving a bunch of documents into a personal computer. Of course, this idea is miniscule to the actual process of data management which significantly entails organization. The act of organizing is not solely reflected by files being recognizable to an individual’s physical memory though. Rather, naming files and documents should be completed in a manner that indicates the specificities of the information found inside. From the standpoint of historical research, data management mostly includes organization of sources and other material that will be used in studying and publishing scholarship. The definition of data management expands, however, when engaged in a digital project that entails website creation.  

Entering the AI Hallucinations project as Research Lead has pushed me to figure out what and how data management will look from my end. While that means searching for relevant sources and helping my teammates do the same, that also means taking note of “hallucinations” and decisions made when performing research to verify a fact or falsity. Granted, I do not have to maintain the compatibility of digital material. Yet, a “hallucination” in historical fact could be anything – possibly – which points to an objective facilitation of misunderstood history. It is a specific detail that I, and the rest of the research group, will eventually have to study in order to gain further insight into knowledge errors generated by AI software.  

Personal Data Hygiene?

Last semester during one of the Intro to DH discussions, probably on archiving or something related, I recall mentioning my interest in personal digital archiving. Given that I work in a library and make library software, it’s natural to assume I have my own house in order on this front. I wouldn’t characterize it as “in order” so much as “here’s some stuff I’ve thought about and some stuff I practice”. It’s true true that I have maybe more tools and established practice at hand, but I think that, like most anyone else, I live in fear that something in this Rube Goldberg contraption of overlapping cloud and local copies of stuff, there’s a piece missing that I can’t see.

The concept of data hygiene comes from data collection practices, which those of you who have constructed datasets may have encountered. Within this practice, its main purpose is to ensure data is clean, consistent, organized, and well presented. As a practice, it represents an ongoing activity rather than a single event, something you have to revisit from time to time as a means of protecting that data.

Within the context of data management more generally, I see it as encompassing more than just the transactional quality of the data and its continued fit for purpose. Fundamentally, you can’t use data you can’t access. So in my view data hygiene must include ensuring continued accessibility of the data. That’s why so much of the Data Management Plan centers around where the data will be stored, what measures you will take to ensure that you can access it, and to ensure that some casual disaster doesn’t wipe it out.

Okay so I don’t have a Data Management Plan for my own personal data. I could, I suppose. I suspect I know people who have or would love to make such a thing. I am not one of those people, despite my weird spreadsheets I use to track things only I’m interested in. But I’m just paranoid enough about data loss that I have some tools and some practices.

Perhaps the best encapsulation of those tools and practices is the Calibre Web server I run for myself. It houses the 25 gigabytes of tabletop role-playing game materials (mostly PDFs) I’ve collected to date. Its goals include collecting, then later being able to find those materials. I search it regularly. And yet it’s still a mess of incomplete metadata and half-baked tagging and organization systems. I would need a dedicated project to properly catalog all of this material. And it’s not backed up regularly in part because I can theoretically re-create it.

Elsewhere, of course, I have extensive collections of notes, photographs, documents, and more specialized files residing in folders created by art and music programs. I have a handle on only some of these, mostly the notes and photographs, which are regularly backed up via paid cloud services, and which I open regularly, occasionally migrate elsewhere, and otherwise keep at hand.

Is this sufficient? In looking at the contents of a Data Management Plan, and knowing what I know about business continuity and disaster recovery, I suspect it’s better than nothing, but insufficient. At the very least, going through the exercise of creating the plan gave me a chance to reflect on my own practices.

DMP: The AI “Hallucinations” Project

The AI “Hallucinations” Project, incorporating the Black Knowledge Erasure Dataset (BKED) and the forthcoming Puerto Rican history expansion, is a research initiative dedicated to the systematic documentation and analysis of algorithmic epistemic erasure. This Data Management Plan (DMP) outlines the lifecycle of the data, from collection and verification to its long-term preservation on the project’s website.

Variable List

Variable Description
id Unique integer identifier for the record.
prompt_id Foreign key referencing prompt dataset (e.g., P001).
model The model that generated the response (e.g., gpt-5, gemini-2.5-flash).
model_response The full, unedited text output from the model.
error_type Controlled vocabulary label (e.g., erasure_by_omission, factual_error).
error_description Detailed qualitative explanation (human annotation)  of the identified error.
verification_source URL or citation used to verify or refute the model claim.
category Topical category (e.g., black_texts_authors).

Data Collection

Data collection utilizes custom Python scripts to interface with model APIs. Sources for ground-truth verification include the Schomburg Center for Research in Black Culture, the Library of Congress, and the Centro de Estudios Puertorriqueños at Hunter College.

Raw Model Outputs: JSON files containing direct responses from Large Language Models (LLMs), specifically GPT-5, Gemini, and Claude, triggered by specific historical and culturally relevant prompts.

Annotated Datasets: CSV files featuring human-verified annotations. Human-Verified Annotations are corrections grounded in archival and library sources. This will include categorizations of “error type” (e.g., invented citation, historical omission) and links/citations to “gold-standard” historical evidence. 

Standards and Metadata

To ensure interoperability and long-term utility for researchers/users, the project adheres to the following standards:

  • Schema Consistency: Data is structured in JSON and CSV formats to facilitate machine readability and easy integration into different analysis environments.
  • Documentation: Documented following C19 Data Collective standards. A comprehensive Data Dictionary and README.md are maintained in the GitHub repository, defining every variable and describing the methodology for human annotation to ensure reproducibility.
  • Quality Assurance: Each entry in the BKED and the Puerto Rican dataset undergoes a “Human-in-the-Loop” process, ensuring every “correction” is cited against a reputable archival source or peer-reviewed publication.

Storage

  • Version Control: GitHub serves as the primary collaborative environment for code and data storage.
  • Security: As the project does not collect Personal Identifiable Information (PII) or sensitive human subject data, security measures focus on integrity and provenance—preventing unauthorized modification of the archive through strict branch protection on GitHub.

Access, Sharing, and Licensing

The project prioritizes open access:

  • Public Repository: All code scripts and datasets will be hosted on a public GitHub repository.
  • Licensing: All datasets will be released under a Creative Commons Attribution-Noncommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. The underlying codebase is licensed under the MIT License.
  • Discovery: The primary access point is the project’s GitHub repository, the forthcoming website, an Are.na channel, etc. Data will be structured in machine-readable formats (CSV, JSON) to facilitate downstream research by the digital humanities community.

Archiving and Long-term Preservation

Upon project completion in May, long-term preservation will be secured through:

  • CUNY Academic Works: The final white paper and a “frozen” version of the dataset will be deposited in CUNY’s institutional repository.
  • Web Archiving: The project website will be submitted to the Internet Archive’s Wayback Machine and preserved as a static site (HTML/CSS/JS) to minimize future technical debt and ensure longevity beyond the current hosting budget.

Voices of Lunfardo Data Management Plan

This data management plan will be implemented and managed by Aaron Helton under the project supervision of Natalia Bustos.

The data produced by this project will include the Lunfardo terms that have been collected and described during the project’s activities, project narratives, a final report, Dokuwiki pages for each, associated page metadata expressed in Dublin Core, and the set of Dokuwiki code configurations, modifications, and template customizations necessary to re-create the site from backup.

Each description includes definitions in English and Spanish, instructional exercises, a brief narrative on cultural reflection, and links to tangos in which the terms appear. During the project, the data will be collected in a Google Docs document for editing and revision prior to transfer into the target system. The Google Docs document will not be retained. In their final format, the terms and their narrative elements will be stored on disk as plain text Dokuwiki objects consisting of simple markup (the Dokuwiki syntax) alongside the elements themselves. Furthermore, all terms will be available in Spanish and English, with Spanish being the default presentation language, which doubles the files and metadata necessary. 

The data will be made publicly available via the Voces del Lunfardo website (https://www.VocesDelLunfardo.org), whose contents will be automatically backed up on a daily basis. Project staff will retain their copyright and other intellectual property rights of the data produced, but have agreed to license these materials under a Creative Commons Attribution ShareAlike 4.0 (CC BY-SA 4.0) license to allow for the greatest dissemination of the materials. 

Every effort has been made to ensure the continued accessibility of the data produced by this project. The chosen hosting solution includes automatic daily backups, of which the ten most recent are available. Additional periodic backups will be shipped to a separate server to provide offsite backup and recovery capabilities. 

Pretty Terrifying Project

Data Management Plan

What data will you collect or create?

This project produces a curated dataset examining feminist themes within horror video games. The following data will be collected for each title:

  • Game Title
  • Wikipedia URL
  • Developers
  • Female Developer Present
  • Release Date
  • Platforms
  • Horror Subgenre
  • Player Perspective
  • Female Protagonist Playable Character?

Themes and keywords related to:

  • Motherhood
  • Domestitcity
  • Trauma and Mental Illness
  • Embodiment
  • Captivity
  • Violence
  • Sexualized Violence
  • Girlhood
  • LGBTQ+
  • Creed Archetypes
  • Suggested Supporting Evidence

How will the data be collected or created?

Data was collected through a process that began with a Wikipedia web scrape using Python, BeautifulSoup, and wikipediaapi to gather all pages and subcategories within the Category:
Horror_video_games. Data such as the game’s title, URL, and category were extracted to compile a curated list of games featuring female characters. A classifier was then built using control phrases and/or keywords to identify games featuring female characters and potential feminist themes. Data collection is ongoing and continues to be reviewed using both computational and manual methods.

Documentation and Metadata

Basic documentation and process methodology will be provided.

Documentation will include the following:

  • Data_Dictionary
  • Methodology
  • Basic Software Requirements
  • README

Ethics and Legal Compliance

How will you manage any ethical issues?

This project does not contain any personal information of individuals. All materials are publicly
available.

Storage and Backup

How will the data be stored and backed up during the research?

During the research and development stage of this project, all information will be stored in cloudstorage using the following platforms:

  • Google Drive
  • GitHub

Both will have regular updates by team members and will be automatically stored.

How will you manage access and security?

The final dissemination of this product will be a public-facing digital website accessible to all. All data collected will also be openly available. During the research phase, only team members and the faculty advisor, as needed, will be granted access to the platforms used.

Selection and Preservation

Which data are of long-term value and should be retained, shared, and/or preserved?

  • The dataset of horror games and their feminist themes itself
  • The data visualizations created from that dataset
  • The code for the website presenting those visualizations
  • Any accompanying documentation and written material connected to the project

What is the long-term preservation plan for the dataset?

We will store the dataset in a data repository such as Kaggle, and will look into institutional
repositories like CUNY Academic Works for the preservation of other aspects of the project.
Additionally, we plan to keep the website accessible for as long as possible, using free hosting on GitHub Pages.

Data Sharing

How will you share the data?

Final data will be shared via GitHub and openly available to the public for review or research
purposes

Are any restrictions on data sharing required?

Data produced through this project will be available under the Creative Commons License: CC-BY

A Trail of Data Breadcrumbs

One thing that stuck out to me from last class’s presentation was the quote: “Future you is your first user.” It’s easy to get lost in the muddle of guidelines and standards, the confusion of what you think you’re supposed to be doing. But imagining yourself coming back in 10, 20 years – or heck, even just a few weeks,  and trying to figure out what in the world you were doing makes data management a lot easier to understand. What can you do to make life a little easier for your future self? What- when you’re looking at other’s projects, do you wish existed to explore the core data that went into their work?

Many horror stories somehow involve navigation. Whether a haunted house, a creepy forest, or an abandoned building, horror tends to evoke some fear of the unknown, exemplified by shadowy places where you can easily get lost if you aren’t careful. So what’s a poor, terrified horror protagonist to do? One old reliable strategy is to create a path so you can return the way you came. This is what Hansel and Gretel did by leaving breadcrumbs so they could find their way back after escaping an evil witch who tried to cook and eat them (a fairy tale? yeah, no that is absolutely a horror story).

And leaving that path – including planning in advance how you will leave it – is kind of what data management is about!

Using data management as a way to primarily orient yourself within your project might seem kind of a selfish approach – after all the types of organization methods that instinctively make sense to you won’t always be easily comprehensible to others. But leaving a trail for your future self is just the first step. Once you know that everything is planned, organized and documented so you can jump right back in after a break or distraction, it’s a lot easier to expand from there. If you prefer sharing data on GitHub but a teammate prefers Google Drive, then great, it’s easy enough to copy data to another platform, so long as you make a point to keep all copies of the dataset in sync. If a data dictionary Excel spreadsheet works perfectly as a way for you to make sense of your data, but you’re presenting to a group full of visual learners, then you can create a visual aid. And it will be a lot easier to have a spreadsheet that makes perfect sense to you to pull from when making that visual aid than if you just used the raw dataset. Basically, what I’m saying is that if you start with a data management plan that makes life easier for you, it becomes a lot easier to start making life easier for anyone else who may want to use your data. Because having done the thing that makes it all make sense to you, you’re now coming from a place of understanding, understanding that you can share.

Maybe for you it’s breadcrumbs you leave, for someone else it’s a thread, or markings on trees, or a sequence of landmarks memorized by heart. It doesn’t matter how you learn the route through the scary unknown, because once you learn it, it’s known and significantly less scary, and you can teach it to other people too.

A fun tangent: While refreshing my memory on the breadcrumbs in Hansel and Gretel, I came across this page, which ties the concept to web design: https://www.syntagm.co.uk/design/articles/breadcrumbs.htm. This is something worth keeping in mind if we do end up with multiple pages on the site – ensuring users can easily return to the home page if they end up elsewhere from a search. Fun how sometimes silly metaphors loop back around to being relevant to the original thing that spawned them!