Personal Data Hygiene?

Last semester during one of the Intro to DH discussions, probably on archiving or something related, I recall mentioning my interest in personal digital archiving. Given that I work in a library and make library software, it’s natural to assume I have my own house in order on this front. I wouldn’t characterize it as “in order” so much as “here’s some stuff I’ve thought about and some stuff I practice”. It’s true true that I have maybe more tools and established practice at hand, but I think that, like most anyone else, I live in fear that something in this Rube Goldberg contraption of overlapping cloud and local copies of stuff, there’s a piece missing that I can’t see.

The concept of data hygiene comes from data collection practices, which those of you who have constructed datasets may have encountered. Within this practice, its main purpose is to ensure data is clean, consistent, organized, and well presented. As a practice, it represents an ongoing activity rather than a single event, something you have to revisit from time to time as a means of protecting that data.

Within the context of data management more generally, I see it as encompassing more than just the transactional quality of the data and its continued fit for purpose. Fundamentally, you can’t use data you can’t access. So in my view data hygiene must include ensuring continued accessibility of the data. That’s why so much of the Data Management Plan centers around where the data will be stored, what measures you will take to ensure that you can access it, and to ensure that some casual disaster doesn’t wipe it out.

Okay so I don’t have a Data Management Plan for my own personal data. I could, I suppose. I suspect I know people who have or would love to make such a thing. I am not one of those people, despite my weird spreadsheets I use to track things only I’m interested in. But I’m just paranoid enough about data loss that I have some tools and some practices.

Perhaps the best encapsulation of those tools and practices is the Calibre Web server I run for myself. It houses the 25 gigabytes of tabletop role-playing game materials (mostly PDFs) I’ve collected to date. Its goals include collecting, then later being able to find those materials. I search it regularly. And yet it’s still a mess of incomplete metadata and half-baked tagging and organization systems. I would need a dedicated project to properly catalog all of this material. And it’s not backed up regularly in part because I can theoretically re-create it.

Elsewhere, of course, I have extensive collections of notes, photographs, documents, and more specialized files residing in folders created by art and music programs. I have a handle on only some of these, mostly the notes and photographs, which are regularly backed up via paid cloud services, and which I open regularly, occasionally migrate elsewhere, and otherwise keep at hand.

Is this sufficient? In looking at the contents of a Data Management Plan, and knowing what I know about business continuity and disaster recovery, I suspect it’s better than nothing, but insufficient. At the very least, going through the exercise of creating the plan gave me a chance to reflect on my own practices.