If you’re interested in another perspective on dealing with user-generated tags or metadata, this blog post from last.fm, Fingerprinting and Metadata Progress Report talks about how they’re trying to create ‘order from chaos’:
So far our fingerprint server identified 23 million unique tracks, from the 650 million fingerprint requests you’ve thrown at it. Who knows how many unique tracks there are out there.. We have a couple of hundred million tracks based on spelling alone – but not all of them are spelt correctly.
They have some interesting issues to deal with in cleaning up their (i.e. your data, if you’re a last.fm user) data, especially when ‘the most popular spelling is not necessarily the correct one’. And what about bands that change their name (but are essentially the same band) or line-up (are they still the same band?) – when do you decide to create a new identifier?
They’re letting users who are logged in vote on potential corrections to an artist name, effectively testing crowdsourcing metadata corrections as well as the original data creation process. This model could work for museums – depending on the collection, some museums already get a lot of corrections when parts of their collections are published online. What would happen if we made that process transparent?