The sounds of silence

I've been reading World War One diaries and letters (getting distracted by sources is an occupational hazard in my research) as I look for sample primary sources for teaching crowdsourcing at the HILT summer school in Maryland next week and for my CENDARI fellowship later this year.

I noticed one line in the Diary of William Henry Winter WWI 1915 that manages to convey a lot without directly giving any information about his opinions or relationship with this person:

'Major Saunders is supposed to be on his way back here as well but I don't know as he is coming back to our Coy, I hope not any way. We have got a good man now.'

There's nothing in the rest of the entries online that provides any further background. It may be that sections of this correspondence either didn't survive, weren't held by the same person, or perhaps were edited before deposit with the library or during transcription (it's particularly hard to judge as the site doesn't have images of the original document), so this particular silence may not have been intentional.

Whatever the case, it's a good reminder that there are silences behind every piece of content. While it's an amazing time to research the lives of those caught up in WWI as more and more private and public material is digitised and shared, silences can be created in many ways – official archives privilege some voices over others, personal collections can be censored or remain tucked away in a shoebox, and large parts of people's experiences simply went unrecorded. Content hidden behind paywalls or inaccessible to search engines (whether inadvertently hidden behind a search box or through lack of text transcription or description) is effectively hushed, if not exactly silenced. Sources and information about WWI collected via community groups on Facebook may be lost the next time they change their terms and conditions, or only partially shared. Our challenge is to make the gaps and questions about what was collected visible (audible?) while also being careful not to render the undigitised or unsearchable invisible in our rush to privilege the easily-accessible.

[Update: I've just realised that Winter might not have needed to provider further context as it seems many men in his unit were from the same region as him, and therefore his relationship with the Major may have pre-dated the war. Tacit knowledge is of course another example of the unrecorded, and one perhaps more familiar to us now than the unsayable.]

Why we need to save the material experience of software objects

Conversations at last month's Sustainable History: Ensuring today's digital history survives event [my slides] (and at the pub afterwards) touched on saving the data underlying websites as a potential solution for archiving them. This is definitely better than nothing, but as a human-computer interaction researcher and advocate for material culture in historical research, I don't think it's enough.

Just as people rue the loss of the information and experiential data conveyed by the material form of objects when they're converted to digital representations – size, paper and print/production quality, marks from wear through use and manufacture, access to its affordances, to name a few – future researchers will rue the information lost if we don't regard digital interfaces and user experiences as vital information about the material form of digital content and record them alongside the data they present.

Can you accurately describe the difference between using MySpace and Facebook in their various incarnations? There's no perfect way to record the experience of using Facebook in December 2013 so it could be compared with the experience of using MySpace in 2005, but usability techniques like screen-recording software linked to eyetracking or think-aloud tests would help preserve some of the tacit knowledge and context users bring to sites alongside the look-and-feel, algorithms and treatments of data the sites present to us. It's not a perfect solution, but a recording of the interactions and designs from both sites for common tasks like finding and adding a friend would tell future researchers infinitely more about changes to social media sites over eight years than simple screenshots or static webpages. But in this case we're still missing the notifications on other people's screens, the emails and algorithmic categorisations that fan out from simple interactions like these…

Even if you don't care about history, anyone studying software – whether websites, mobile apps, digital archives, instrument panels or procedural instructions embedded in hardware – still needs solid methods for capturing the dynamic and subjective experience of using digital technologies. As Lev Manovich says in The Algorithms of Our Lives, when we use software we're "engaging with the dynamic outputs of computation; studying software culture requires us to "record and analyze interactive experiences, following individual users as they navigate a website or play a video game … to watch visitors of an interactive installation as they explore the possibilities defined by the designer—possibilities that become actual events only when the visitors act on them".

The Internet Archive does a great job, but in researching the last twenty years of internet history I'm constantly hitting the limits of their ability to capture dynamic content, let alone the nuance of interfaces. The paradox is that as more of our experiences are mediated through online spaces and the software contained within small boxy devices, we risk leaving fewer traces of our experiences than past generations.

It's Backup Saturday!

Ironically (?), the original image is no longer available

If Backup Saturday is too casual, call it Digital Preservation Saturday. Whatever you call it, it's time to do some digital housekeeping.

This post is an attempt to reduce the number of sad status updates or requests for help I see when people have lost years of personal photos, contacts or calendars when their laptop or phone died or was stolen, or when people can't recover that vital document for their research or tax return… There's never a perfect time to do it, so just back up your files now. Phones and laptops are particularly easy to lose and are more likely to have precious photos or important documents, so start with them.

If you don't have an external hard drive order one online and in the meantime, burn to a CD or DVD or copy files to a USB stick.  There's no harm in having lots of copies (barring confusion over different versions of docs), so if you want to be really careful, swap external drives with a friend so you've each got an off-site copy of your most important files.  Use online services like Dropbox (my referral link, non-referral link) etc to keep files on your computer backup up online, but don't rely on them alone.  (The referral links give us each extra storage, which is nice.)

Backup email

Things change all the time so always check for more recent advice (this goes for everything on the page), but this article covers some good options for backing up Gmail (or try GMVault) and here's information on backing up Thunderbird, and try this if you're stuck on Outlook. I download an old Yahoo account to Thunderbird via POP mail, which might be the easiest way to deal with YMail and Hotmail.

While you're at it, back up your profile or preferences for your web browser – it's amazing how much information is stored in your browser history, bookmarks, etc. You can access saved passwords in Firefox and other browsers – obviously saving screenshots of the screen is a security risk but it can also help you remember older passwords if you're locked out of software.

Backup social media

'Turbulence' seems to be the IT trend for this decade (and maybe every decade), so it's a good idea to regularly back up whatever social media sites you rely on.  I haven't tried services like Backupify (more info) – if you've got experience with them, let me know in the comments.  Check back over your registration emails to remind yourself which services you've signed up for and use that as a checklist.

Services that backup tweets and other social media come and go (like Twapperkeeper and Twitoaster), so it's a good idea to not only choose services that let you easily export your archive, but also to put a monthly note in your calender to go in and actually run the export.  Saved copies of web pages might not work later, so a really low-tech solution is to copy all the text in a page and dump it into a text file or e.g Word document.  I use SearchHash to archive hashtags, but you have to get in quickly as the Twitter API often only provides access to the past few days' tweets.  You can also archive tweets via Google spreadsheets.

You can download your data from Facebook via the 'Download a copy of your Facebook data' on your settings page – it's not perfect, but again, it's better than nothing.  While Flickr is a good option for backing up images, you might also want to save the tags and comments that live on Flickr.  There are a number of tools for backing up Flickr, try these or these to start with.

Backup websites

Most blogs will let you export your posts, but the exported file isn't usually 'human-readable' until you've imported it into another blog, and there's always a chance that you'll lose some information.

An option that works well on all kinds of websites is HTTrack – I've used it for archiving sites and the results are good – it creates a locally-browseable static version of your site, preserving content and layouts.  This isn't the same as backing up your code or databases, but if you're at that point I assume you know how to backup these yourself. Bonus points if you've tested restoring from backups to check that the process actually works!

You can also add links to the Internet Archive (and while you're at it, why not make a donation?).

Backup devices

You can back up Apple products like iPods, iPhones, iPads with iTunes, but it doesn't hurt to download photos etc into other folders too – both MacOS and Windows have system apps that will download photos when you plug in the device – 'Image Capture' on my Mac and an Explorer window on my PC.

Nokia phones can be backed-up with Nokia PC Suite on Windows or iSync on MacOS (can be tricky). I've used SMS to Text on Android – it saved a file to my phone's disk, then I copied it over to my computer.

Backup other specialist software

Whatever you do, you probably use specialist software.  If you use reference management software, back it up!  Here are instructions for backing up EndNote, Mendeley and Zotero to get you started…

More digital housekeeping…

If you've made it this far, why not check that your anti-virus software is up-to-date, and run a deep scan?  If you haven't got anti-virus software, get some now – MoneySavingExpert has a useful guide to Free Antivirus Software. And speaking of money, if your bank doesn't keep all your bank statements online, or you're about to change chards, it's a good time to download your bank statements.

And if you've already done all that, why not offer to help a friend get their backup and anti-virus sorted?

'Go forth and digitise' – Bill Thompson at OpenTech 2010

I've realised events like OpenTech are a bit like geek Christmas – a brief intense moment of brilliant fun with inspiring people who not only get what you're saying, they'll give you an idea back that'll push you further… then it's back to the inching progress of everyday life, but hopefully with enough of that event energy to make it all easier. Anyway, enough rambling and onto my sketchy notes from the talk. Stuff in square brackets is me thinking aloud, any mistakes are mine, etc.

Giving the Enlightenment Another Five Hundred Years, Bill Thompson
Session 3, Track A #3A
[A confession – working in a museum, and a science museum at that, I have a long-standing interest in conserving enough of the past to understand the present and plan for the future, and just because it's fascinating. It was ace to hear from someone passionate about the role of archives and cultural heritage in the defence of reason, and even more ace to see the tweets flying around as other people got excited about it too.]

The importance of the scientific method; of asking hard questions and looking for refutation not confirmation.

But surely history is all about progress – what could go wrong? But imagine President Palin… History has shown that it's possible for progress to go backwards.

What can we do? He's not speaking on behalf of the BBC here, but his job is to figure out what you can do with the BBC's archive. [Video of seeing the BBC charter – the powerful impact of holding the actual physical object is reason enough to conserve things from the past, it's an oddly visceral connection to the people who made it that I've noticed again and again while working in museums and archaeology.]

We need to remember. To remember is to understand, to resist. We need to digitise. Remembering comes along with digitising; our experience of the world is so mediated by bits that unless we makes archives digital in some form, there's a real danger that they will be forgotten, inaccessible. Also need to build mechanisms so that stuff that's created now are preserved alongside the records of the past. We need to do it all. If we do it well, we'll give current and future generations the evidence they need to resist the onslaught of ignorance, the tide of unreason that's sweeping the world. Need to create reasonable digitisation of solid artefacts too.

We need to do it soon 'because the kids may not want to'. The technology exists but thinks there's a real danger that if not done in the next ten years, it won't be done; people won't realise the value of the archives and understand why it has to be done. Kids who've grown up on Google will never do the deep research that will take them to the stuff that's not digitised; non-digital stuff will fall into disuse; conservation/preservation will stop.
Don't let Google do it, they don't value the right things.

Once it's in bits, preserve the data and the artefact; catalogue it, make it findable, make it usable – open data world meets open knowledge world. Access to APIs and datasets is important to make sure material can be found. If you know it's there you can ask for it to be digitised. Build layers on top of the assets that have been digitised.

Need to make it usable so have to sort out the rights fiasco… Need a place to put it all, not sure that exists yet. New tools, services, standards so it can be preserved forever and found in future. Not a trivial task but vitally important. The information in the archives supports true understanding. Possibility of doing something transformative at the moment. [He finished with:] 'Go forth and digitise. And don't forget the metadata'.

Crowdsourcing metadata seems like a good idea; V&A gets a shout-out for crowdsourcing image cropping [with an ad hoc description 'which one of these are in focus' – they might be horrified to hear their photography described like that. I got all excited that other people were excited about crowdsourcing metadata, because creating interfaces with game dynamics to encourage people to create content about collections is my MSc dissertation project.]

OCRing text in digitised images – amazing [I need to find a reference to that – if we can do it it'd instantly make our archives and 2D collections much more accessible and discoverable]

Question re Internet Archive – ans that it doesn't have enough curation – 'like throwing your archives down a well before the invaders arrive' – they might be there in a usable form when you come back for them, they might not be.

Question: preservation and digital archaeology are two different things, how closely are they aligned? [digital archaeology presumably not destructive though]

[And that's the end of my notes for that session, notes on the Guardian platform and game session to come]

Notes from 'Catch the Wind: Digital Preservation and the Real World' at MCG's Spring Conference

These are my notes from Nick Poole's presentation 'Catch the Wind: Digital Preservation and the Real World' at the MCG Spring Conference. There's some background to my notes about the conference in a previous post. If I've made any comments below they're in [square brackets].

Nick's slides for 'Catch the Wind: Digital Preservation and the Real World' are online.

The MDA is now the Collections Trust. Their belief is that "everybody everywhere should have the right to access and benefit from cultural collections". Their work includes standards, professional development and public programmes wherever collections are kept and cared for and they have a remit across collections management, including documentation, digitisation and digital preservation.

We need to think about capturing and preserving digital surrogates, etc, or we'll end up with a 'digital dark age'.

We need a convergence of standards and practice in museums, libraries and archives, and to develop a community of professional practice.

Nick was interested to know if whether any museums are actively doing digital preservation. It turns out lots have some elements of digital preservation but it's not deeply embedded in the organisation. Nick sent a question to the Museums Computer Group (MCG) list: see the list archives for December 2007, or slide 6.

If you're not doing digital preservation, why not? And how do you decide whether and what is worth preserving? How do you preserve pieces of information or digital assets in their context needed for them to make sense?

Today is partly about the results of the enquiry begun with that email.

We know what we should be doing [slide 9, CHIN slide on workflow for 'Digital Preservation for Museums'.]

We know why we should be doing it:

The preservation and re-use of digital data and information forms both the cornerstone of future economic growth and development, and the foundation
for the future of memory.

From "Changing Trains at Wigan: Digital Preservation and the Future of Scholarship" by Seamus Ross – the 'common-sense bible about digital preservation'.

And there are lots of programs and diagrams (slides 11 – 15).

So if we know why and how we should be doing it, why aren't we doing it?

It's not necessarily about technology or money – is it about the culture in museums?
There's no funding imperative; project-funded digitisation seldom provides for (or requires) the kind of long-term embedded work that digital preservation requires.

It depends on the integration of workflows and systems which is still rare in museums. Some digital preservation principles fit more intuitively with an archival point of view than an object/artefact point of view.

Is it possibly also because museums aren't part of the scholarly/academic publishing loop which is giving rise to large scale digital preservation initiatives? e.g. Open Content Alliance.

We also don't have an expectation about the retrievability of non-object museum information that we do about collection information. [Too true, it doesn't seem to be valued the same way.]

We should learn from libraries and archives. We could mandate 'good enough' standards so digital assets can be migrated into stable environments in the future. There's so much going on that we'll never be able to draw a line in the sand and say 'standards happen now'. We need to tweak the way we work now, not introduce a whole new project.

A proposed national solution: could we aggregate 'just enough' metadata at a central point and preserve it there? But would organisations become disenfranchised from their own information, lose expertise in the curatorship of digital content, and would it blur the distinction between active and dormant records?

If not a national solution, then it must be local: but would it actually happen without statute, obligation or funding? Possibly through networks of people who support each other in digitisation work, but there are economic issues in developing infrastructure and expertise.

Museums seem oddly distant from current initiatives (e.g. Digital Preservation Coalition, Digital Curation Centre), and lack methodologies and tools that are specific to museum information. Do we need to develop collective approaches for digital preservation?

He hasn't got answers, just more questions.

We must start finding answers or the value of what we're doing right now will be lost in ten years time.

Questions
Mike: there was a slide 'is this stuff worth preserving' – but that question wasn't answered – is there lots of stuff we should and can just chuck away? Nick: the archival world view is more like that.

Alan [?]: born digital stuff like websites is difficult to 'index and scope'. The V&A website is divorced from libraries and archives – internal databases don't link to website [to capture non-collections records?]. What are the units of information or assets within a website? It's impossible to define boundaries and therefore to catalogue and preserve… How do we capture this content? Nick: web archiving solutions are already out there but do museums have the money for it?

John: to what extent could digital repositories be out-sourced? Nick: look at examples like the Archaeology Data Service. But for whatever reason, we're not following those models.

David: preservation was in NOF Digitise in business plan but … didn't happen. He doesn't think archives are ahead in preservation services. Museums use of collections management systems is different to academia using repositories – there's an interesting distinction between long-term archiving and day to day work.

Ian [?] – re-run what we've done with [digitising] object collections but think about information collections too [?]. Nick: there's a development path there in existing CollMS, possibly with hosted CollMS, We don't need entirely new systems, we already have digital asset management systems (DAMS), web software, CollMS.

[This reminds me about recent discussions we've had internally about putting older object captions and information records on our OAI repository – this might be a step towards a 'good enough' step towards digital preservation.]