The past week was trying. Apart from dealing with serious allergy flare-ups, I can attest to spending longer time to engineer my prompts – than originally anticipated – before running them across 3 LLMs. The Hallucinations methodology requires each teammate to investigate and create prompts making historical inquiries of details that are attributable to a particular document. In practice, I assumed that the task would be able as simple as finding a letter, advertisement, document, or other source that gives a fact which could easily become the basis of a prompt.
My assumptions were not entirely wrong, although I realized that I had not accounted for the challenge of arriving at a desired/usable source. More specifically, I had to face the prolonged task of reading and skimming through long manuscripts as well as letters between state officials that were merely conversations. I successfully completed my 15 prompts, though I changed 2 of them – not drastically in the sense that I did not deviate from the original source cited. Next, the algorithms ran the prompts across 3 LLMs – Claude, Gemini, and ChatGPT – until 45 different responses were generated. Attaining my prompt responses evoked a brief feeling of relief, knowing that a significant task was completed… However, not entirely.
The next objective is to solidify our data by factchecking. I originally had the impression that this task would be accomplished by reading through the responses, identifying the hallucinations next, then describing them, while historically researching what has details about Puerto Rico’s past according to our prompts. Again, like the assumption explained earlier, I am certain this impression of the fact-checking job is not entirely wrong. On the other hand, meeting with GC Digital Fellows Eunah Cho and Lisa Rhody left me with highly provoking thoughts regarding the how in terms of factchecking. One of the points staying with me, Rhody mentioned that AI language models asking questions about clarity, and probably indicating when it does not have a definitive answer for an historical inquiry, is better than giving something wrong in confidence. Moreover, it is better for an autonomous language to be honest about areas of struggle/where it is lacking since it is trying to maintain a sense of trustworthiness as a functioning history-learning source. There is more nuance to these perspectives of course, however, I pondered how these perspectives can be leveraged for the task. I was especially confused on how to segway a history-studying background for fact-checking AI prompt responses. Eunah Cho and Lisa Rhody collectively provided a few questions for the Hallucinations group, one of which asks us to figure out a range of error. In other words, when does error arrive at the point of committing harm in belief, understanding, historical and socio-cultural consciousness, etc.? I wish to start the task of factchecking as soon as possible knowing that I will be out-of-state, but most importantly, I enter it with a clarified idea of how to confirm areas of accuracy and error, regardless of whether an Artificially Intelligent LLM provided desired responses. Hopefully, the Visualization Data Lab will provide me with an additional perspective, especially since I plan to discuss with my teammates how to proceed with the factchecking objective.


