Still learning and navigating unique perspectives while affirming data

The past week was trying. Apart from dealing with serious allergy flare-ups, I can attest to spending longer time to engineer my prompts – than originally anticipated – before running them across 3 LLMs. The Hallucinations methodology requires each teammate to investigate and create prompts making historical inquiries of details that are attributable to a particular document. In practice, I assumed that the task would be able as simple as finding a letter, advertisement, document, or other source that gives a fact which could easily become the basis of a prompt.  

My assumptions were not entirely wrong, although I realized that I had not accounted for the challenge of arriving at a desired/usable source. More specifically, I had to face the prolonged task of reading and skimming through long manuscripts as well as letters between state officials that were merely conversations. I successfully completed my 15 prompts, though I changed 2 of them – not drastically in the sense that I did not deviate from the original source cited. Next, the algorithms ran the prompts across 3 LLMs – Claude, Gemini, and ChatGPT – until 45 different responses were generated. Attaining my prompt responses evoked a brief feeling of relief, knowing that a significant task was completed… However, not entirely.  

The next objective is to solidify our data by factchecking. I originally had the impression that this task would be accomplished by reading through the responses, identifying the hallucinations next, then describing them, while historically researching what has details about Puerto Rico’s past according to our prompts. Again, like the assumption explained earlier, I am certain this impression of the fact-checking job is not entirely wrong. On the other hand, meeting with GC Digital Fellows Eunah Cho and Lisa Rhody left me with highly provoking thoughts regarding the how in terms of factchecking. One of the points staying with me, Rhody mentioned that AI language models asking questions about clarity, and probably indicating when it does not have a definitive answer for an historical inquiry, is better than giving something wrong in confidence. Moreover, it is better for an autonomous language to be honest about areas of struggle/where it is lacking since it is trying to maintain a sense of trustworthiness as a functioning history-learning source. There is more nuance to these perspectives of course, however, I pondered how these perspectives can be leveraged for the task. I was especially confused on how to segway a history-studying background for fact-checking AI prompt responses. Eunah Cho and Lisa Rhody collectively provided a few questions for the Hallucinations group, one of which asks us to figure out a range of error. In other words, when does error arrive at the point of committing harm in belief, understanding, historical and socio-cultural consciousness, etc.? I wish to start the task of factchecking as soon as possible knowing that I will be out-of-state, but most importantly, I enter it with a clarified idea of how to confirm areas of accuracy and error, regardless of whether an Artificially Intelligent LLM provided desired responses. Hopefully, the Visualization Data Lab will provide me with an additional perspective, especially since I plan to discuss with my teammates how to proceed with the factchecking objective. 

Tying Up Loose Ends

The process of making a project often comes in fits and starts. Little pieces are created one by one, and in the beginning, it’s hard to see the cohesive whole they’ll all fit into.

One of the really fun moments of any creative activity is watching those pieces come together, and finally starting to see the bigger picture come into focus. That’s the stage we seem to be at now, which is exciting. Of course, that doesn’t mean we can just put our feet up and relax. Tying everything together is what separates a collection of interesting ideas from an actual project. No matter how good each piece is, if we can’t grab our audience’s attention, bring them along on some kind of journey, then there isn’t much point in our work. After all, the point of this project is for others to make use of it.

To that end, we’re finalizing the look of our website, and the text that goes on it. I gave some comments on my group-mates’ close readings, and after Naila’s adjustments to the website, I’m excited to give the site another pass to fix the bugs and smooth some rough edges. The website isn’t completely how I pictured it would be at first, but I do really like what we’ve come up with. Initially, we were just going to have charts and text stacked in a long vertical scroll. But now, some elements are also placed side by side. I think this makes scrolling more interesting – you get to see the layout switch up over time – and it also lets the user of the website view two elements together instead of needing to scroll between them. I think having that flexibility is important – even this late in the process, sometimes you need to make decisions that depart from things you decided earlier.

I’m also excited to continue revising the text during our next class and beyond. Bringing our three different writing styles together is another example of making pieces into a whole. It’s not about picking one person’s style to go with. Instead, it’s about making sure each individual’s voice is still heard, but also having a collective voice for the project so our audience doesn’t have to jump around between totally different kinds of writing too much.

Overall, things are moving in the right direction. And even with all these considerations to keep in mind as we begin the end of this process, I’m confident we’ll make something really cool!

Personal Blog: Stealing Company Time

On my suggestion, no, insistence, my team had a skill-share meeting where Sasha kindly walked the team through the technicals – the tools we needed to install on our computers and how we would use them to run our queries and store and share the results earlier this semester. My primary interest in this project was based on how it gave me the chance to work closely with a small dataset, nurturing it from its inception. I was also looking forward to a slow, critical encounter with LLMs. I was really eager to spend time with these tools and their outputs so that I can deepen my criticisms. 

But I did not make it to that skill-share on the day it was scheduled. We have a standing meeting on Fridays in order to keep up with our production schedule, that is where I do and share my share of the work on outreach and web design, and where we troubleshoot and make decisions as a group. Most of the time outside of class that I have spent on the project, has been dedicated to doing my part to support Sasha with web design and Chris with research and making sure the workflows I own – design, outreach, etc. — are well-tended to. I have to admit I’ve only been mildly successful in carving out extra time to dedicate to learning about the tools and methods we’re using to actually prompt the LLMs and document their responses. The few times I’d successfully done so was when I stole company time from my day job to spend a bit more time with the AI Hallucinations Project.

But I feel like I made up for it when we met at the library. While Chris was finalizing his prompts, Sasha kindly and generously walked me through the tools we would use: VSCode, Python, Homebrew, and the various APIs for each of the LLM’s we’d test. I realized she could’ve done all of the querying and documenting herself in a fraction of the time it took her to walk me through it all. But it was genuinely enjoyable to troubleshoot together. It also fulfilled the purpose of the project, and I suppose this class too. Being able to contribute to this part of the process ensured this experience didn’t feel like three contractors throwing their work into a Google Doc and onto a website. Now I can say I tested these LLMs myself, watched them slowly respond to my prompts one by one. And they tested me, I already know I will have to revise some prompts and run them again. The project would exist, and likely succeed, without my intervening in this part of the process. But I would’ve learned way less from it all if I hadn’t.

Update: AI Hallucinations Project

Excitement and anxiety are definitely rising as we close in on the final stages of this project. I might just be speaking for myself, but I have a feeling my teammates relate: As the time arrived where we would have to write out new prompts and actually produce the dataset, arguably the marquee feature of the AI Hallucinations project, we suddenly found ourselves oh-so-preoccupied with other things: Setting up appointments with the digital fellow and other advisors, adding to and refining the website, double-checking our archives and prompt-writing guidelines.

Reality had its own plans: The original plan was to each write about 15 prompts based on archival research on the Puerto Rican and Diasporican histories, query the models, and collect the data and then repeat the process for an aimed total of 100 prompt responses. But in the class before last, Sasha suggested we slash the size of the new dataset in half, so that we only write 15 prompts each for an aimed total of 50 responses. We were so focused on refining our fact-checking process and prompt-writing criteria that we soon realized we did not have enough time to break up the prompting process into two stages like we had originally planned.

Turns out, writing the prompts took up more time than any of us had originally expected. On Friday afternoon, we met at the main branch of the Brooklyn Public Library, with varying degrees of springtime allergies. None of us had finished writing our allotted prompts, as planned, so the first hour of our session was dedicated to finishing up that work together. Sasha and I finished writing our prompts quick enough that Sasha was able to walk me through prompting the models and organizing the output. We did not have time to fact-check any of the outputs together. So we spent the last 20 minutes of our time together reorganizing our schedule and priorities for the coming week.

Since we are further along on the website development side, we reprioritized as follows: We will each fact-check our own model outputs individually during the week, and come together to trouble-shoot in class. The other half of our time in class will be dedicated to making some final decisions regarding our web-hosting and domain because GoDaddy hates us. This is fact-checking week in more ways than one, as on Friday, we also confirmed two meetings with advisors: One with the Digital Fellow and another with Luke Walzer, both of whom have expressed an interest in going over our fact-checking procedures and methodologies. These meetings and our response fact-checking efforts during the week, should put us in a good position to start concluding the fact-checking phase of the project. After this coming week, we will focus on finalizing the website and preparing our materials for sharing and launch. 

Voces del Lunfardo Project Update

We continued this week with both a presentation on our project and the work of collecting and entering Lunfardo terms. The presentation was an opportunity to think about how to communicate the project to our classmates and, ultimately, other DHers at large. With the feedback from that experience we will adjust the presentation in preparation for the showcase at the end of the semester.

As for the substantive work, some discussion about a couple of the terms prompted a rethinking of the inclusion of a couple of the terms. We are trying to focus on terms that are in wide use today. Since the language developed from the late 19th and early 20th centuries, at least some of the words, though appearing in tangos and movies, have fallen out of favor. This just goes to show that even with a narrow scope and well defined goals, we can still encounter surprises along the way! But this is also not a serious setback, and we are still on track to complete the text entry by the end of the course.

Over the next week, our goal is to ensure every term has a page conveying the same kind of information and affording the same kind of interaction, after which we can commence the remaining copy editing and other revisions.

We’re Scarily Close to Semester’s End

I’m happy to report that we are on track for this project! By “on track” I don’t mean that we’ve followed our initial work plan exactly- that almost never happens. Even the best plans don’t perfectly survive contact with the chaos of reality. But we have hit our milestones in that we’re pretty much at the point in the process now that we thought we’d be at. The focus of this coming week, in our work plan, was on debugging the website and refining the written materials. This is still basically our plan for this week- revising each other’s close readings, putting together final versions of the user facing text around the data visualizations, and adding a few final pieces to the site, as well as fixing bugs. We might even start revising the presentation this week if there’s time – if not, that’s out focus the next couple of weeks!

How we got here is a little different than how we expected, though. When we initially made our work plan, the close readings were just an idea for a potential add-on, so we didn’t explicitly include them in the plan. But as time went on, they became more of a core part of the project, and some time that we initially thought would be spent refining the visualizations went to working on those. We also shifted around some of the work between us, which I think was valuable, since it gave us all a chance to try out some different tasks, and not get too stuck banging our heads against the same things again and again.

Overall, we’re at a good place, and not only that, it’s around the place we planned to be at this point! I’m excited as we enter the final stretch, and I’m very confident in our group to bring this project home and make something interesting and valuable.

Temporal Distortion (and Presentations!)

Changing the very nature of time… what horrifying entity could have this power? Why, CUNY of course! Thanks to the archaic time power of our strange institution, this week no longer has a Tuesday, and instead has two Thursdays! This has the effect that after a long break with no class, we have three classes over the course of eight days. Talk about hitting the ground running.

My main task in this chaotic collection of class time was to put together a presentation walking the class through my group’s Pretty Terrifying Project and then present it on Tuesday. …Did I say Tuesday? I mean “Thursday”, obviously.  This is fully my own fault – I volunteered to present, because honestly, I like giving presentations. Having a bunch of people watch you talk about something in a structured way feels a lot easier to me than talking to strangers one on one. Conversations can go any strange direction – a presentation has its core flow. Even if things stray, you always have that flow to return to.

So how did this presentation go? There’s definitely room for improvement, but this wasn’t meant to be perfect, and thanks to you all, Team Pretty Terrifying has plenty of feedback to work with to improve it further. That’s why I think it’s really important that we had a chance to do a first pass like this. It’s a great way to get accustomed to this particular format of presentation and work out all the mistakes now so when the big day comes, the presentation is clean and ready. That being said, I did kind of jump into presenting this week and I absolutely want to keep the door open if anyone else from my group wants to take the lead for the final presentation. I do think any other presenter can learn from my experience and step in if they’d like.

The main thing the feedback has got me thinking about is the structure of our project’s story and how to position it in time while presenting. While our process involved a lot of thinking before actually resulting in any website or data visualization, people clearly wanted to see the website and visualizations earlier on in the presentation. So just like CUNY, maybe we can mess with time a bit. As our professor suggested, it might be worth showing what we did first and then backtracking to how we got there. And there are some other options too – maybe we show the simple bar chart first thing and discuss the early process alongside it – still telling the project’s story mostly in order, but weaving the visualizations in as visual aids rather than as indicators of what work was done when. Another thing to consider is what presentation structure facilitates addressing everything we want to address while still staying within the time limit. So many options, so much to present on, and so little time, both to finish up what we need for this project, and to actually present. But we know what we’re doing, and we now have a firmer idea of our approach thanks to this practice round. As much as we’re getting into crunch time, it’s also exciting, knowing how close we’re getting to the final presentation.

See you all on Thursday! Weird… it feels like we just had a Thursday.

Lunfardo Actividad: ¿Sabes usar …?

Title translation: Lunfardo Activity: Do you know how to use …?

As we round the home stretch for the semester, I wanted to spend a little bit of time on something that I mentioned but hadn’t yet had a chance to detail, which is the quiz module we have in Voces del Lunfardo. Early in our discussions, Natalia and I were thinking about the interactive components we wanted to include in the project. Given the project’s strong pedagogical focus, we felt that tools to try out the words would be appropriate, and quizzes are a typical way of achieving this goal. We wanted to include some kind of quiz or interactive activity on each term’s page to reinforce learning.

So I set about looking for extensions that would match.

When we were still looking at Omeka, it became clear pretty quickly that there were no existing modules that supported a quiz-like affordance. That was one of the reasons we ultimately chose DokuWiki. The extensions ecosystem for DokuWiki is such that we suffered from the exact opposite problem: for almost anything you can think of, there are several options floating around, each with slightly different operations. An additional problem also arises, though, in that these little extensions vary in terms of their maintenance and compatibility with recent versions of DokuWiki. That can mean they don’t work at all, or it can mean there’s something off about them.

We chose and tested a couple different quiz modules, most of which were geared toward use as flash cards. That’s often an appropriate affordance for language learning, so the extensions seemed promising. The one that worked best, however, had some strange default behaviors that we couldn’t configure away, and styling work would have been required. I scratched my head for a bit about it but ultimately decided to just make a plugin myself. How hard could it be, right?

The main problem here is that my PHP skills are dated. DokuWiki is written in PHP, HTML, CSS, and JavaScript; its plugins are as well. I can read all of these, and I have worked with plenty of PHP before, but I also thought this was an interesting use case to farm out to a LLM. It is, after all, a very well scoped and defined problem, one that I could have spent a few hours or so tinkering around with, and I am well positioned to evaluate the outcome. And if it didn’t work, I could just chuck it in the bin with all the other experimental code I’ve ever produced.

NB: “AI” is not the first thing I generally reach for (never for writing like this, never for summarizing what I can read, and never to make art), but I have found the need to evaluate its code capabilities so I understand its strengths and many, many weaknesses. I have thoughts. In any case, you can consider this my disclosure that for this one purpose, I did use a LLM for assistance.

If you yourself have some experience with using LLMs to generate code, I probably will not surprise you in saying that it worked. Mostly. Even in the best-defined and scoped prompts, there is plenty of room for misalignment and misinterpretation, so it took a few tries to get everything I thought I needed, and I hand-tweaked the 5-10% it didn’t quite get. I have read through all the code in the plugin, and I compared the structure and code with that of existing quiz plugins for good measure. Presently it’s in a GitHub repository but has not been submitted to the DokuWiki community yet; this is under consideration. I set up the repository, including a GPL license, to be shared pending my own review of the DokuWiki’s code provenance policies, if any. Oh, and I suppose I would need to rename it as well to avoid confusion with any existing products or trademarks.

The inclusion of a custom-developed module does raise some interesting questions aside from any ethical considerations of where the code came from. A main concern is that this wasn’t fully accounted for in the data management plan. The code is in GitHub now, and there’s no reason at the moment to suspect anything would happen to it, but it is now an artifact of this project and should be brought under the plan all the same. GitHub is not a repository in an archival sense, but it is nevertheless a convenient place to host a piece of open source software, especially if one wants to both maintain and distribute it. I’m including this concern here for the sake of thoroughness, but the question remains open.

Thankfully, there should be little left to develop or configure. Now on to the other work, entering and revising terms.