Over the past few years we’ve seen an increasing number of projects that take the phrase ‘human-computer interaction’ literally (perhaps turning ‘HCI’ into human-computer integration), organising tasks done by people and by computers into a unified system. One of the most obvious benefits of crowdsourcing on digital platforms has been the ability to coordinate the distribution and validation of tasks. Increasingly, data manually classified through crowdsourcing is being fed into computers to improve machine learning so that computers can learn to recognise images or words almost as well as we do. I’ve outlined a few projects putting this approach to work below.
This creates new challenges for the future: if fun, easy tasks like image tagging and text transcription can be done by computers, what are the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement? After all, Fast Company reported that ‘at least one Zooniverse project, Galaxy Zoo Supernova, has already automated itself out of existence’. What impact will this have on citizen science and history communities? How might machine learning free us to fly further, taking on more interesting tasks with cultural heritage collections?
The Public Catalogue Foundation has taken tags created through Your Paintings Tagger and achieved impressive results in the art of computer image recognition: ‘Using the 3.5 million or so tags provided by taggers, the research team at Oxford ‘educated’ image-recognition software to recognise the top tagged terms’. All paintings tagged with a particular subject (e.g. ‘horse’) were fed into feature extraction processes to build an ‘object model’ of a horse (a set of characteristics that would indicate that a horse is depicted) then tested to see the system could correctly tag horses.
The BBC World Service archive used an ‘open-source speech recognition toolkit to listen to every programme and convert it to text’ and keywords then asked people to check the correctness of the data created (Algorithms and Crowd-Sourcing for Digital Archives, see also What we learnt by crowdsourcing the World Service archive).
The CUbRIK project combines ‘machine, human and social computation for multimedia search’ in their technical demonstrator, HistoGraph. The SOCIAM: The Theory and Practice of Social Machines project is looking at ‘a new kind of emergent, collective problem solving’, including ‘citizen science social machines’.
And of course the Zooniverse is working on this, most recently with Galaxy Zoo. A paper summarised on their Milky Way project blog, outlines the powerful synergy between citizens scientists, professional scientists, and machine learning: ‘citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery’, addressing the weakness of each approach if deployed alone.
Further reading: an early discussion of human input into machine learning is in Quinn and Bederson’s 2011 Human Computation: A Survey and Taxonomy of a Growing Field. You can get a sense of the state of the field from various conference papers, including ICML ’13 Workshop: Machine Learning Meets Crowdsourcing and ICML ’14 Workshop: Crowdsourcing and Human Computing. There’s also a mega-list of academic crowdsourcing conferences and workshops, though it doesn’t include much on the tiny corner of the world that is crowdsourcing in cultural heritage.
Last update: March 2015. This post collects my thoughts on machine learning and human-computer integration as I finish my thesis. Do you know of examples I’ve missed, or implications we should consider?