About ‘a practical guide to crowdsourcing in cultural heritage’

book cover

Some time ago I wrote a chapter on ‘Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects‘ for the Routledge International Handbook of Research Methods in Digital Humanities, edited by Kristen Schuster and Stuart Dunn. As their blurb says, the volume ‘draws on both traditional and emerging fields of study to consider what a grounded definition of quantitative and qualitative research in the Digital Humanities (DH) might mean; which areas DH can fruitfully draw on in order to foster and develop that understanding; where we can see those methods applied; and what the future directions of research methods in Digital Humanities might look like’.

Inspired by a post from the authors of a chapter in the same volume (Opening the ‘black box’ of digital cultural heritage processes: feminist digital humanities and critical heritage studies by Hannah Smyth, Julianne Nyhan & Andrew Flinn), I’m sharing something about what I wanted to do in my chapter.

As the title suggests, I wanted to provide practical insights for cultural heritage and digital humanities practitioners. Writing for a Handbook of Research Methods in Digital Humanities was an opportunity help researchers understand both how to apply the ‘method’ and how the ‘behind the scenes’ work affects the outcomes. As a method, crowdsourcing in cultural heritage touches on many more methods and disciplines. The chapter built on my doctoral research, and my ideas were roadtested at many workshops, classes and conferences.

Rather than crib from my introduction (which you can read in a pre-edited version online), I’ve included the headings from the chapter as a guide to the contents:

  • An introduction to crowdsourcing in cultural heritage
  • Key conceptual and research frameworks
  • Fundamental concepts in cultural heritage crowdsourcing
  • Why do cultural heritage institutions support crowdsourcing projects?
  • Why do people contribute to crowdsourcing projects?
  • Turning crowdsourcing ideas into reality
  • Planning crowdsourcing projects
  • Defining ‘success’ for your project
  • Managing organisational impact
  • Choosing source collections
  • Planning workflows and data re-use
  • Planning communications and participant recruitment
  • Final considerations: practical and ethical ‘reality checks’
  • Developing and testing crowdsourcing projects
  • Designing the ‘onboarding’ experience
  • Task design
  • Documentation and tutorials
  • Quality control: validation and verification systems
  • Rewards and recognition
  • Running crowdsourcing projects
  • Launching a project
  • The role of participant discussion
  • Ongoing community engagement
  • Planning a graceful exit
  • The future of crowdsourcing in cultural heritage
  • Thanks and acknowledgements

I wrote in the open on this Google Doc: ‘Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects’, and benefited from the feedback I got during that process, so this post is also an opportunity to highlight and reiterate my ‘Thanks and acknowledgements’ section:

I would like to thank participants and supporters of crowdsourcing projects I’ve created, including Museum Metadata Games, In their own words: collecting experiences of the First World War, and In the Spotlight. I would also like to thank my co-organisers and attendees at the Digital Humanities 2016 Expert Workshop on the future of crowdsourcing. Especial thanks to the participants in courses and workshops on ‘crowdsourcing in cultural heritage’, including the British Library’s Digital Scholarship training programme, the HILT Digital Humanities summer school (once with Ben Brumfield) and scholars at other events where the course was held, whose insights, cynicism and questions have informed my thinking over the years. Finally, thanks to Meghan Ferriter and Victoria Van Hyning for their comments on this manuscript.


References for Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects

Alam, S. L., & Campbell, J. (2017). Temporal Motivations of Volunteers to Participate in Cultural Crowdsourcing Work. Information Systems Research. https://doi.org/10.1287/isre.2017.0719

Bedford, A. (2014, February 16). Instructional Overlays and Coach Marks for Mobile Apps. Retrieved 12 September 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/mobile-instructional-overlay/

Berglund Prytz, Y. (2013, June 24). The Oxford Community Collection Model. Retrieved 22 October 2018, from RunCoCo website: http://blogs.it.ox.ac.uk/runcoco/2013/06/24/the-oxford-community-collection-model/

Bernstein, S. (2014). Crowdsourcing in Brooklyn. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Bitgood, S. (2010). An attention-value model of museum visitors (pp. 1–29). Retrieved from Center for the Advancement of Informal Science Education website: http://caise.insci.org/uploads/docs/VSA_Bitgood.pdf

Bonney, R., Ballard, H., Jordan, R., McCallie, E., Phillips, T., Shirk, J., & Wilderman, C. C. (2009). Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report (pp. 1–58). Retrieved from Center for Advancement of Informal Science Education (CAISE) website: http://caise.insci.org/uploads/docs/PPSR%20report%20FINAL.pdf

Brohan, P. (2012, July 23). One million, six hundred thousand new observations. Retrieved 30 October 2012, from Old Weather Blog website: http://blog.oldweather.org/2012/07/23/one-million-six-hundred-thousand-new-observations/

Brohan, P. (2014, August 18). In search of lost weather. Retrieved 5 September 2014, from Old Weather Blog website: http://blog.oldweather.org/2014/08/18/in-search-of-lost-weather/

Brumfield, B. W. (2012a, March 5). Quality Control for Crowdsourced Transcription. Retrieved 9 October 2013, from Collaborative Manuscript Transcription website: http://manuscripttranscription.blogspot.co.uk/2012/03/quality-control-for-crowdsourced.html

Brumfield, B. W. (2012b, March 17). Crowdsourcing at IMLS WebWise 2012. Retrieved 8 September 2014, from Collaborative Manuscript Transcription website: http://manuscripttranscription.blogspot.com.au/2012/03/crowdsourcing-at-imls-webwise-2012.html

Budiu, R. (2014, March 2). Login Walls Stop Users in Their Tracks. Retrieved 7 March 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/login-walls/

Causer, T., & Terras, M. (2014). ‘Many Hands Make Light Work. Many Hands Together Make Merry Work’: Transcribe Bentham and Crowdsourcing Manuscript Collections. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Causer, T., & Wallace, V. (2012). Building A Volunteer Community: Results and Findings from Transcribe Bentham. Digital Humanities Quarterly, 6(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html

Cheng, J., Teevan, J., Iqbal, S. T., & Bernstein, M. S. (2015, April). Break It Down: A Comparison of Macro- and Microtasks. 4061–4064. https://doi.org/10.1145/2702123.2702146

Clary, E. G., Snyder, M., Ridge, R. D., Copeland, J., Stukas, A. A., Haugen, J., & Miene, P. (1998). Understanding and assessing the motivations of volunteers: A functional approach. Journal of Personality and Social Psychology, 74(6), 1516–30.

Collings, R. (2014, May 5). The art of computer image recognition. Retrieved 25 May 2014, from The Public Catalogue Foundation website: http://www.thepcf.org.uk/what_we_do/48/reference/862

Collings, R. (2015, February 1). The art of computer recognition. Retrieved 22 October 2018, from Art UK website: https://artuk.org/about/blog/the-art-of-computer-recognition

Crowdsourcing Consortium. (2015). Engaging the Public: Best Practices for Crowdsourcing Across the Disciplines. Retrieved from http://crowdconsortium.org/

Crowley, E. J., & Zisserman, A. (2016). The Art of Detection. Presented at the Workshop on Computer Vision for Art Analysis, ECCV. Retrieved from https://www.robots.ox.ac.uk/~vgg/publications/2016/Crowley16/crowley16.pdf

Csikszentmihalyi, M., & Hermanson, K. (1995). Intrinsic Motivation in Museums: Why Does One Want to Learn? In J. Falk & L. D. Dierking (Eds.), Public institutions for personal learning: Establishing a research agenda (pp. 66–77). Washington D.C.: American Association of Museums.

Dafis, L. L., Hughes, L. M., & James, R. (2014). What’s Welsh for ‘Crowdsourcing’? Citizen Science and Community Engagement at the National Library of Wales. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Das Gupta, V., Rooney, N., & Schreibman, S. (n.d.). Notes from the Transcription Desk: Modes of engagement between the community and the resource of the Letters of 1916. Digital Humanities 2016: Conference Abstracts. Presented at the Digital Humanities 2016, Kraków. Retrieved from http://dh2016.adho.org/abstracts/228

De Benetti, T. (2011, June 16). The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free). Retrieved 9 January 2012, from Microtask website: http://blog.microtask.com/2011/06/the-secrets-of-digitalkoot-lessons-learned-crowdsourcing-data-entry-to-50000-people-for-free/

de Boer, V., Hildebrand, M., Aroyo, L., De Leenheer, P., Dijkshoorn, C., Tesfa, B., & Schreiber, G. (2012). Nichesourcing: Harnessing the power of crowds of experts. Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2012, 16–20. Retrieved from http://dx.doi.org/10.1007/978-3-642-33876-2_3

DH2016 Expert Workshop. (2016, July 12). DH2016 Crowdsourcing workshop session overview. Retrieved 5 October 2018, from DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? website: https://docs.google.com/document/d/1sTII8P67mOFKWxCaAKd8SeF56PzKcklxG7KDfCRUF-8/edit?usp=drive_open&ouid=0&usp=embed_facebook

Dillon-Scott, P. (2011, March 31). How Europeana, crowdsourcing & wiki principles are preserving European history. Retrieved 15 February 2015, from The Sociable website: http://sociable.co/business/how-europeana-crowdsourcing-wiki-principles-are-preserving-european-history/

DiMeo, M. (2014, February 3). First Monday Library Chat: University of Iowa’s DIY History. Retrieved 7 September 2014, from The Recipes Project website: http://recipes.hypotheses.org/3216

Dunn, S., & Hedges, M. (2012). Crowd-Sourcing Scoping Study: Engaging the Crowd with Humanities Research (p. 56). Retrieved from King’s College website: http://www.humanitiescrowds.org

Dunn, S., & Hedges, M. (2013). Crowd-sourcing as a Component of Humanities Research Infrastructures. International Journal of Humanities and Arts Computing, 7(1–2), 147–169. https://doi.org/10.3366/ijhac.2013.0086

Durkin, P. (2017, September 28). Release notes: A big antedating for white lie – and introducing Shakespeare’s world. Retrieved 29 September 2017, from Oxford English Dictionary website: http://public.oed.com/the-oed-today/recent-updates-to-the-oed/september-2017-update/release-notes-white-lie-and-shakespeares-world/

Eccles, K., & Greg, A. (2014). Your Paintings Tagger: Crowdsourcing Descriptive Metadata for a National Virtual Collection. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Edwards, D., & Graham, M. (2006). Museum volunteers and heritage sectors. Australian Journal on Volunteering, 11(1), 19–27.

European Citizen Science Association. (2015). 10 Principles of Citizen Science. Retrieved from https://ecsa.citizen-science.net/sites/default/files/ecsa_ten_principles_of_citizen_science.pdf

Eveleigh, A., Jennett, C., Blandford, A., Brohan, P., & Cox, A. L. (2014). Designing for dabblers and deterring drop-outs in citizen science. 2985–2994. https://doi.org/10.1145/2556288.2557262

Eveleigh, A., Jennett, C., Lynn, S., & Cox, A. L. (2013). I want to be a captain! I want to be a captain!: Gamification in the old weather citizen science project. Proceedings of the First International Conference on Gameful Design, Research, and Applications, 79–82. Retrieved from http://dl.acm.org/citation.cfm?id=2583019

Ferriter, M., Rosenfeld, C., Boomer, D., Burgess, C., Leachman, S., Leachman, V., … Shuler, M. E. (2016). We learn together: Crowdsourcing as practice and method in the Smithsonian Transcription Center. Collections, 12(2), 207–225. https://doi.org/10.1177/155019061601200213

Fleet, C., Kowal, K., & Přidal, P. (2012). Georeferencer: Crowdsourced Georeferencing for Map Library Collections. D-Lib Magazine, 18(11/12). https://doi.org/10.1045/november2012-fleet

Forum posters. (2010, present). Signs of OW addiction … Retrieved 11 April 2014, from Old Weather Forum » Shore Leave » Dockside Cafe website: http://forum.oldweather.org/index.php?topic=1432.0

Fugelstad, P., Dwyer, P., Filson Moses, J., Kim, J. S., Mannino, C. A., Terveen, L., & Snyder, M. (2012). What Makes Users Rate (Share, Tag, Edit…)? Predicting Patterns of Participation in Online Communities. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 969–978. Retrieved from http://dl.acm.org/citation.cfm?id=2145349

Gilliver, P. (2012, October 4). ‘Your dictionary needs you’: A brief history of the OED’s appeals to the public. Retrieved from Oxford English Dictionary website: https://public.oed.com/history/history-of-the-appeals/

Goldstein, D. (1994). ‘Yours for Science’: The Smithsonian Institution’s Correspondents and the Shape of Scientific Community in Nineteenth-Century America. Isis, 85(4), 573–599.

Grayson, R. (2016). A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front. British Journal for Military History, 2(2). Retrieved from http://bjmh.org.uk/index.php/bjmh/article/view/96

Hess, W. (2010, February 16). Onboarding: Designing Welcoming First Experiences. Retrieved 29 July 2014, from UX Magazine website: http://uxmag.com/articles/onboarding-designing-welcoming-first-experiences

Holley, R. (2009). Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers (No. March). Canberra: National Library of Australia.

Holley, R. (2010). Crowdsourcing: How and Why Should Libraries Do It? D-Lib Magazine, 16(3/4). https://doi.org/10.1045/march2010-holley

Holmes, K. (2003). Volunteers in the heritage sector: A neglected audience? International Journal of Heritage Studies, 9(4), 341–355. https://doi.org/10.1080/1352725022000155072

Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., … Horton, J. (2013). The future of crowd work. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 1301–1318. Retrieved from http://dl.acm.org/citation.cfm?id=2441923

Lambert, S., Winter, M., & Blume, P. (2014, March 26). Getting to where we are now. Retrieved 4 March 2015, from 10most.org.uk website: http://10most.org.uk/content/getting-where-we-are-now

Lascarides, M., & Vershbow, B. (2014). What’s on the menu?: Crowdsourcing at the New York Public Library. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Latimer, J. (2009, February 25). Letter in the Attic: Lessons learnt from the project. Retrieved 17 April 2014, from My Brighton and Hove website: http://www.mybrightonandhove.org.uk/page/letterintheatticlessons?path=0p116p1543p

Lazy Registration design pattern. (n.d.). Retrieved 9 December 2018, from Http://ui-patterns.com/patterns/LazyRegistration website: http://ui-patterns.com/patterns/LazyRegistration

Leon, S. M. (2014). Build, Analyse and Generalise: Community Transcription of the Papers of the War Department and the Development of Scripto. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1), 43–52.

McGonigal, J. (n.d.). Gaming the Future of Museums. Retrieved from http://www.slideshare.net/avantgame/gaming-the-future-of-museums-a-lecture-by-jane-mcgonigal-presentation#text-version

Mills, E. (2017, December). The Flitch of Bacon: An Unexpected Journey Through the Collections of the British Library. Retrieved 17 August 2018, from British Library Digital Scholarship blog website: http://blogs.bl.uk/digital-scholarship/2017/12/the-flitch-of-bacon-an-unexpected-journey-through-the-collections-of-the-british-library.html

Mitra, T., & Gilbert, E. (2014). The Language that Gets People to Give: Phrases that Predict Success on Kickstarter. Retrieved from http://comp.social.gatech.edu/papers/cscw14.crowdfunding.mitra.pdf

Mugar, G., Østerlund, C., Hassman, K. D., Crowston, K., & Jackson, C. B. (2014). Planet Hunters and Seafloor Explorers: Legitimate Peripheral Participation Through Practice Proxies in Online Citizen Science. Retrieved from http://crowston.syr.edu/sites/crowston.syr.edu/files/paper_revised%20copy%20to%20post.pdf

Mugar, G., Østerlund, C., Jackson, C. B., & Crowston, K. (2015). Being Present in Online Communities: Learning in Citizen Science. Proceedings of the 7th International Conference on Communities and Technologies, 129–138. https://doi.org/10.1145/2768545.2768555

Museums, Libraries and Archives Council. (2008). Generic Learning Outcomes. Retrieved 8 September 2014, from Inspiring Learning website: http://www.inspiringlearningforall.gov.uk/toolstemplates/genericlearning/

National Archives of Australia. (n.d.). ArcHIVE – homepage. Retrieved 18 June 2014, from ArcHIVE website: http://transcribe.naa.gov.au/

Nielsen, J. (1995). 10 Usability Heuristics for User Interface Design. Retrieved 29 April 2014, from http://www.nngroup.com/articles/ten-usability-heuristics/

Nov, O., Arazy, O., & Anderson, D. (2011). Technology-Mediated Citizen Science Participation: A Motivational Model. Proceedings of the AAAI International Conference on Weblogs and Social Media. Presented at the Barcelona, Spain. Barcelona, Spain.

Oomen, J., Gligorov, R., & Hildebrand, M. (2014). Waisda?: Making Videos Findable through Crowdsourced Annotations. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive Load Theory and Instructional Design: Recent Developments. Educational Psychologist, 38(1), 1–4. https://doi.org/10.1207/S15326985EP3801_1

Part I: Building a Great Project. (n.d.). Retrieved 9 December 2018, from Zooniverse Help website: https://help.zooniverse.org/best-practices/1-great-project/

Preist, C., Massung, E., & Coyle, D. (2014). Competing or aiming to be average?: Normification as a means of engaging digital volunteers. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1222–1233. https://doi.org/10.1145/2531602.2531615

Raddick, M. J., Bracey, G., Gay, P. L., Lintott, C. J., Murray, P., Schawinski, K., … Vandenberg, J. (2010). Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. Astronomy Education Review, 9(1), 18.

Raimond, Y., Smethurst, M., & Ferne, T. (2014, September 15). What we learnt by crowdsourcing the World Service archive. Retrieved 15 September 2014, from BBC R&D website: http://www.bbc.co.uk/rd/blog/2014/08/data-generated-by-the-world-service-archive-experiment-draft

Reside, D. (2014). Crowdsourcing Performing Arts History with NYPL’s ENSEMBLE. Presented at the Digital Humanities 2014. Retrieved from http://dharchive.org/paper/DH2014/Paper-131.xml

Ridge, M. (2011a). Playing with Difficult Objects – Game Designs to Improve Museum Collections. In J. Trant & D. Bearman (Eds.), Museums and the Web 2011: Proceedings. Retrieved from http://www.museumsandtheweb.com/mw2011/papers/playing_with_difficult_objects_game_designs_to

Ridge, M. (2011b). Playing with difficult objects: Game designs for crowdsourcing museum metadata (MSc Dissertation, City University London). Retrieved from http://www.miaridge.com/my-msc-dissertation-crowdsourcing-games-for-museums/

Ridge, M. (2013). From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing. Curator: The Museum Journal, 56(4).

Ridge, M. (2014, November). Citizen History and its discontents. Presented at the IHR Digital History Seminar, Institute for Historical Research, London. Retrieved from https://hcommons.org/deposits/item/hc:17907/

Ridge, M. (2015). Making digital history: The impact of digitality on public participation and scholarly practices in historical research (Ph.D., Open University). Retrieved from http://oro.open.ac.uk/45519/

Ridge, M. (2018). British Library Digital Scholarship course 105: Exercises for Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions. Retrieved from https://docs.google.com/document/d/1tx-qULCDhNdH0JyURqXERoPFzWuCreXAsiwHlUKVa9w/

Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C., … Jacobs, D. (2012). Dynamic changes in motivation in collaborative citizen-science projects. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 217–226. https://doi.org/10.1145/2145204.2145238

Sample Ward, A. (2011, May 18). Crowdsourcing vs Community-sourcing: What’s the difference and the opportunity? Retrieved 6 January 2013, from Amy Sample Ward’s Version of NPTech website: http://amysampleward.org/2011/05/18/crowdsourcing-vs-community-sourcing-whats-the-difference-and-the-opportunity/

Schmitt, J. R., Wang, J., Fischer, D. A., Jek, K. J., Moriarty, J. C., Boyajian, T. S., … Socolovsky, M. (2014). Planet Hunters. VI. An Independent Characterization of KOI-351 and Several Long Period Planet Candidates from the Kepler Archival Data. The Astronomical Journal, 148(2), 28. https://doi.org/10.1088/0004-6256/148/2/28

Secord, A. (1994). Corresponding interests: Artisans and gentlemen in nineteenth-century natural history. The British Journal for the History of Science, 27(04), 383–408. https://doi.org/10.1017/S0007087400032416

Shakespeare’s World Talk #OED. (Ongoing). Retrieved 21 April 2019, from https://www.zooniverse.org/projects/zooniverse/shakespeares-world/talk/239

Sharma, P., & Hannafin, M. J. (2007). Scaffolding in technology-enhanced learning environments. Interactive Learning Environments, 15(1), 27–46. https://doi.org/10.1080/10494820600996972

Shirky, C. (2011). Cognitive surplus: Creativity and generosity in a connected age. London, U.K.: Penguin.

Silvertown, J. (2009). A new dawn for citizen science. Trends in Ecology & Evolution, 24(9), 467–71. https://doi.org/10.1016/j.tree.2009.03.017

Simmons, B. (2015, August 24). Measuring Success in Citizen Science Projects, Part 2: Results. Retrieved 28 August 2015, from Zooniverse website: https://blog.zooniverse.org/2015/08/24/measuring-success-in-citizen-science-projects-part-2-results/

Simon, N. K. (2010). The Participatory Museum. Retrieved from http://www.participatorymuseum.org/chapter4/

Smart, P. R., Simperl, E., & Shadbolt, N. (2014). A Taxonomic Framework for Social Machines. In D. Miorandi, V. Maltese, M. Rovatsos, A. Nijholt, & J. Stewart (Eds.), Social Collective Intelligence: Combining the Powers of Humans and Machines to Build a Smarter Society. Retrieved from http://eprints.soton.ac.uk/362359/

Smithsonian Institution Archives. (2012, March 21). Meteorology. Retrieved 25 November 2017, from Smithsonian Institution Archives website: https://siarchives.si.edu/history/featured-topics/henry/meteorology

Springer, M., Dulabahn, B., Michel, P., Natanson, B., Reser, D., Woodward, D., & Zinkham, H. (2008). For the Common Good: The Library of Congress Flickr Pilot Project (pp. 1–55). Retrieved from Library of Congress website: http://www.loc.gov/rr/print/flickr_report_final.pdf

Stebbins, R. A. (1997). Casual leisure: A conceptual statement. Leisure Studies, 16(1), 17–25. https://doi.org/10.1080/026143697375485

The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: Evaluated projects to drive up engagement (No. January; p. 19). Retrieved from Culture and Sport Evidence (CASE) programme website: http://www.culture.gov.uk/images/research/evidence_of_what_works.pdf

Trant, J. (2009). Tagging, Folksonomy and Art Museums: Results of steve.museum’s research (p. 197). Retrieved from Archives & Museum Informatics website: https://web.archive.org/web/20100210192354/http://conference.archimuse.com/files/trantSteveResearchReport2008.pdf

United States Government. (n.d.). Federal Crowdsourcing and Citizen Science Toolkit. Retrieved 9 December 2018, from CitizenScience.gov website: https://www.citizenscience.gov/toolkit/

Van Merriënboer, J. J. G., Kirschner, P. A., & Kester, L. (2003). Taking the load off a learner’s mind: Instructional design for complex learning. Educational Psychologist, 38(1), 5–13.

Vander Wal, T. (2007, February 2). Folksonomy. Retrieved 8 December 2018, from Vanderwal.net website: http://vanderwal.net/folksonomy.html

Veldhuizen, B., & Keinan-Schoonbaert, A. (2015, February 11). MicroPasts: Crowdsourcing Cultural Heritage Research. Retrieved 8 December 2018, from Sketchfab Blog website: https://blog.sketchfab.com/micropasts-crowdsourcing-cultural-heritage-research/

Verwayen, H., Fallon, J., Schellenberg, J., & Kyrou, P. (2017). Impact Playbook for museums, libraries and archives. Europeana Foundation.

Vetter, J. (2011). Introduction: Lay Participation in the History of Scientific Observation. Science in Context, 24(02), 127–141. https://doi.org/10.1017/S0269889711000032

von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 57. https://doi.org/10.1145/1378704.1378719

Wenger, E. (2010). Communities of practice and social learning systems: The career of a concept. In Social Learning Systems and communities of practice. Springer Verlag and the Open University.

Whitenton, K. (2013, December 22). Minimize Cognitive Load to Maximize Usability. Retrieved 12 September 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/minimize-cognitive-load/

WieWasWie Project informatie. (n.d.). Retrieved 1 August 2014, from VeleHanden website: http://velehanden.nl/projecten/bekijk/details/project/wiewaswie_bvr

Willett, K. (n.d.). New paper: Galaxy Zoo and machine learning. Retrieved 31 March 2015, from Galaxy Zoo website: http://blog.galaxyzoo.org/2015/03/31/new-paper-galaxy-zoo-and-machine-learning/

Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 17(2), 89–100.

Three ways you can help with ‘In their own words: collecting experiences of the First World War’ (and a CENDARI project update)

Somehow it’s a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, ‘A lot of battalions were involved in World War One’. I’ll do a retrospective post soon, and here’s a quick summary of on-going work.

First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask ‘where was a specific battalion at a specific time?’. Together, they help address a common situation for people new to WWI history who might ask something like ‘I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?’.

I’ve been working on streamlining and simplifying the public-facing task of collecting a personal narrative for each battalion, and have written a blog post, Help collect soldiers’ experiences of WWI in their own words, that reduces it to three steps:

  1. Take one of the diaries, letters and memoirs listed on the Collaborative Collections wiki, and
  2. Match its author with a specific regiment or battalion.
  3. Send in the results via this form.

If you know of a local history society, family historian or anyone else who might be interested in helping, please send them along to this post: Help collect soldiers’ experiences of WWI in their own words.

Work on specifying the relevant data structures to support a look-up service to answer questions about a specific units location and activities at a specific time largely moved to the wiki:

You can see the infobox structures in progress by flipping from the talk to the Template tabs. You’ll need to request an account to join in but more views, sample data and edge cases would be really welcome.

Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it’s necessary to support both core goals. I’ve been fortunate to have help (see ‘Thanks and recent contributions’ on ‘How you can help‘) but the task is on-going so get in touch if you can help!

So there are three different ways you can help with ‘In their own words: collecting experiences of the First World War’:

Finally, last week I was in New Zealand to give a keynote on this work at the National Digital Forum. The video for ‘Collaborative collections through a participatory commons‘ is online, so you can catch up on the background for my project if you’ve got 40 minutes or so to spare. Should you be in Dublin, I’m giving a talk on ‘A pilot with public participation in historical research: linking lived experiences of the First World War’ at the Trinity Long Room Hub today (thus the poster).

And if you’ve made it this far, perhaps you’d like to apply for a CENDARI Visiting Research Fellowships 2015 yourself?

Moving forward: modelling and indexing WWI battalions

A super-quick update from my CENDARI Fellowship this week. I set up the wiki for In their own words: linking lived experiences of the First World War a week ago but only got stuck into populating it with lists of various national battalions this week. My current task list, copied from the front page is to:

If you can help with any of that, let me know! Or just get stuck in and edit the site.
I’ve started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It’s all very lo-fi and much less designed than my usual projects, but I’m hoping people will be able to help regardless.
So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help? 

Looking for (crowdsourcing) love in all the right places

One of the most important exercises in the crowdsourcing workshops I run is the ‘speed dating’ session. The idea is to spend some time looking at a bunch of crowdsourcing projects until you find a project you love. Finding a project you enjoy gives you a deeper insight into why other people participate in crowdsourcing, and will see you through the work required to get a crowdsourcing project going. I think making a personal connection like this helps reduce some of the cynicism I occasionally encounter about why people would volunteer their time to help cultural heritage collections. Trying lots of projects also gives you a much better sense of the types of barriers projects can accidentally put in the way of participation. It’s also a good reminder that everyone is a nerd about something, and that there’s a community of passion for every topic you can think of.

If you want to learn more about designing history or cultural heritage crowdsourcing projects, trying out lots of project is a great place to start. The more time you can spend on this the better – an hour is ideal – but trying just one or two projects is better than nothing. In a workshop I get people to note how a project made them feel – what they liked most and least about a project, and who they’d recommend it to. You can also note the input and output types to help build your mental database of relevant crowdsourcing projects.

The list of projects I suggest varies according to the background of workshop participants, and I’ll often throw in suggestions tailored to specific interests, but here’s a generic list to get you started.

10 Most Wanted http://10most.org.uk/ Research object histories
Ancient Lives http://ancientlives.org/ Humanities, language, text transcription
British Library Georeferencer http://www.bl.uk/maps/ Locating and georeferencing maps (warning: if it’s running, only hard maps may be left!)
Children of the Lodz Ghetto http://online.ushmm.org/lodzchildren/ Citizen history, research
Describe Me http://describeme.museumvictoria.com.au/ Describe objects
DIY History http://diyhistory.lib.uiowa.edu/ Transcribe historical letters, recipes, diaries
Family History Transcription Project http://www.flickr.com/photos/statelibrarync/collections/ Document transcription (Flickr/Yahoo login required to comment)
[email protected] http://herbariaunited.org/atHome/ (for bonus points, compare it with Notes from Nature https://www.zooniverse.org/project/notes_from_nature) Transcribing specimen sheets (or biographical research)
HistoryPin Year of the Bay ‘Mysteries’ https://www.historypin.org/attach/project/22-yearofthebay/mysteries/index/ Help find dates, locations, titles for historic photographs; overlay images on StreetView
iSpot http://www.ispotnature.org/ Help ‘identify wildlife and share nature’
Letters of 1916 http://dh.tcd.ie/letters1916/ Transcribe letters and/or contribute letters
London Street Views 1840 http://crowd.museumoflondon.org.uk/lsv1840/ Help transcribe London business directories
Micropasts http://crowdsourced.micropasts.org/app/photomasking/newtask Photo-masking to help produce 3D objects; also structured transcription
Museum Metadata Games: Dora http://museumgam.es/dora/ Tagging game with cultural heritage objects (my prototype from 2010)
NYPL Building Inspector http://buildinginspector.nypl.org/ A range of tasks, including checking building footprints, entering addresses
Operation War Diary http://operationwardiary.org/ Structured transcription of WWI unit diaries
Papers of the War Department http://wardepartmentpapers.org/ Document transcription
Planet Hunters http://planethunters.org/ Citizen science; review visualised data
Powerhouse Museum Collection Search http://www.powerhousemuseum.com/collection/database/menu.php Tagging objects
Reading Experience Database http://www.open.ac.uk/Arts/RED/ Text selection, transcription, description.
Smithsonian Digital Volunteers: Transcription Center https://transcription.si.edu/ Text transcription
Tiltfactor Metadata Games http://www.metadatagames.org/ Games with cultural heritage images
Transcribe Bentham http://www.transcribe-bentham.da.ulcc.ac.uk/ History; text transcription
Trove http://trove.nla.gov.au/newspaper?q= Correct OCR errors, transcribe text, tag or describe documents
US National Archives http://www.amara.org/en/teams/national-archives/ Transcribing videos
What’s the Score at the Bodleian http://www.whats-the-score.org/ Music and text transcription, description
What’s on the menu http://menus.nypl.org/ Structured transcription of restaurant menus
What’s on the menu? Geotagger http://menusgeo.herokuapp.com/ Geolocating historic restaurant menus
Wikisource – random item link http://en.wikisource.org/wiki/Special:Random/Index Transcribing texts
Worm Watch http://www.wormwatchlab.org Citizen science; video
Your Paintings Tagger http://tagger.thepcf.org.uk/ Paintings; free-text or structured tagging

NB: crowdsourcing is a dynamic field, some sites may be temporarily out of content or have otherwise settled in transit. Some sites require registration, so you may need to find another site to explore while you’re waiting for your registration email.

In which I am awed by the generosity of others, and have some worthy goals

Grand Canal Dock at night, DublinA quick update from my CENDARI fellowship working on a project that’s becoming ‘In their own words: linking lived experiences of the First World War‘. I’ve spent the week reading (again a mixture of original diaries and letters, technical stuff like ontology documentation and also WWI history forums and ‘amateur’ sites) and writing. I put together a document outlining a rang of possible goals and some very sketchy tech specs, and opened it up for feedback. The goals I set out are copied below for those who don’t want to delve into detail. The commentable document, ‘Linking lived experiences of the First World War’: possible goals and a bunch of technical questions goes into more detail.

However, the main point of this post is to publicly thank those who’ve helped by commenting and sharing on the doc, on twitter or via email. Hopefully I’m not forgetting anyone, as I’ve been blown away by and am incredibly grateful for the generosity of those who’ve taken the time to at least skim 1600 words (!). It’s all helped me clarify my ideas and find solutions I’m able to start implementing next week. In no order at all – at CENDARI, Jennifer Edmond, Alex O’Connor, David Stuart, Benjamin Štular, Francesca Morselli, Deirdre Byrne; online Andrew Gray @generalising; Alex Stinson @ DHKState; jason webber @jasonmarkwebber; Alastair Dunning @alastairdunning; Ben Brumfield @benwbrum; Christine Pittsley; Owen Stephens @ostephens; David Haskiya @DavidHaskiya; Jeremy Ottevanger @jottevanger; Monika Lechner @lemondesign; Gavin Robinson ‏@merozcursed; Tom Pert @trompet2 – thank you all!

Worthy goals (i.e. things I’m hoping to accomplish, with the help of historians and the public; only some of which I’ll manage in the time)

At the end of this project, someone who wants to research a soldier in WWI but doesn’t know a thing about how armies were structured should be able to find a personal narrative from a soldier in the same bit of the army, to help them understand experiences of the Great War.

Hopefully these personal accounts will provide some context, in their own words, for the lived experiences of WWI. Some goals listed are behind-the-scenes stuff that should just invisibly make personal diaries, letters and memoirs more easily discoverable. It needs datasets that provide structures that support relationships between people and documents; participatory interfaces for creating or enhancing information about contemporary materials (which feed into those supporting structures), and interfaces that use the data created.

More specifically, my goals include:

    • A personal account by someone in each unit linked to that unit’s record, so that anyone researching a WWI name would have at least one account to read. To populate this dataset, personal accounts (diaries, letters, etc) would need to be linked to specific soldiers, who can then be linked to specific units. Linking published accounts such as official unit histories would be a bonus. [Semantic MediaWiki]
    • Researched links between individual men and the units they served in, to allow their personal accounts to be linked to the relevant military unit. I’m hoping I can find historians willing to help with the process of finding and confirming the military unit the writer was in. [Semantic MediaWiki]
    • A platform for crowdsourcing the transcription and annotation of digitised documents. The catch is that the documents for transcription would be held remotely on a range of large and small sites, from Europeana’s collection to library sites that contain just one or two digitised diaries. Documents could be tagged/annotated with the names of people, places, events, or concepts represented in them. [Semantic MediaWiki??]
    • A structured dataset populated with the military hierarchy (probably based on The British order of battle of 1914-1918) that records the start and end dates of each parent-child relationship (an example of how much units moved within the hierarchy)
    • A published webpage for each unit, to hold those links to official and personal documents about that unit in WWI. In future this page could include maps, timelines and other visualisations tailored to the attributes of a unit, possibly including theatres of war, events, campaigns, battles, number of privates and officers, etc. (Possibly related to CENDARI Work Package 9?) [Semantic MediaWiki]
    • A better understanding of what people want to know at different stages of researching WWI histories. This might include formal data gathering, possibly a combination of interviews, forum discussions or survey

 

Goals that are more likely to drop off, or become quick experiments to see how far you can get with accessible tools:
    • Trained ‘named entity recognition’ and ‘natural language processing’ tools that could be run over transcribed text to suggest possible people, places, events, concepts, etc [this might drop off the list as the CENDARI project is working on a tool called Pineapple (PDF poster). That said, I’ll probably still experiment with the Stanford NER tool to see what the results are like]
    • A way of presenting possible matches from the text tools above for verification or correction by researchers. Ideally, this would be tied in with the ability to annotate documents
    • The ability to search across different repositories for a particular soldier, to help with the above.

 

Linking lived experiences of WWI through battalions?

Another update from my CENDARI Fellowship at Trinity College Dublin, looking at ‘In their own words: linking lived experiences of the First World War’, which is a small-scale, short-term pilot based on WWI collections. My first post is Defining the scope: week one as a CENDARI Fellow. Over the past two weeks I’ve done a lot of reading – more WWI diaries and letters; WWI histories and historiography; specialist information like military structures (orders of battle, etc). I’ve also sketched out lots of snippets of possible functions, data, relationships and other outcomes.

I’ve narrowed the key goal (or minimum viable product, if you prefer) of my project to linking personal accounts of the war – letters, diaries, memoirs, photographs, etc – to battalions, by creating links from the individual who wrote them to their military unit. Once these personal accounts are linked to particular military units, they can be linked to higher units – from the battalion, ship or regiment to brigade, corps, etc – and to particular places, activities, events and campaigns. The idea behind this is to provide context for an individual’s experience of WWI by linking to narratives written by people in the same situation. I’m still working out how to organise the research process of matching the right soldier to the right battalion/regiment/ship so that relevant personal stories are discoverable. I’m also still working out which attributes of a battalion are relevant, how granular the data will be, and how to design for the inevitable variation in data quality (for example, the availability of records for different armies varies hugely). Finally, I’m still working out which bits need computer science tools and which need the help of other historians.

Given the number of centenary projects, I was hoping to find more structured data about WWI entities. Trenches to Triples would be useful source of permanent URLs, and terms to train named entity recognition, but am I missing other sources?

There’s a lot of content, and so much activity around WWI records, but it’s spread out across the internet. Individual people and small organisations are digitising and transcribing diaries and letters. Big collecting projects like Europeana have lots of personal accounts, but they’re often not transcribed and they don’t seem to be linked to structured data about the item itself. Some people have painstakingly transcribed unit diaries, but they’re not linked from the official site, so others wouldn’t know there’s a more easily read version of the diary available. I’ve been wondering if you could crowdsource the process of transcribing records held elsewhere, and offer the transcripts back to sites. Using dedicated transcription software would let others suggest corrections, and might also make it possible to link sections of the text to external ‘entities’ like names, places, events and concepts.

Albert Henry Bailey. Image:
Sir George Grey Special Collections,
Auckland Libraries, AWNS-19150909-39-5

To help figure out the issues researchers face and the variations in available resources, I’m researching randomly selected soldiers from different Allied forces. I’ve posted my notes on Private Albert Henry Bailey, service number 13/970a. You’ll see that they’re in prose form, and don’t contain any structured data. Most of my research used digitised-but-not-transcribed images of documents, with some transcribed accounts. It would definitely benefit from deeper knowledge of military history – for a start, which battalions were in the same place as his unit at the same time?

This account of the arrival and first weeks of the Auckland Mount Rifles at Gallipoli from the official unit history gives a sense of the density and specificity of local place names, as does the official unit diary, and I assume many personal accounts. I’m not sure how named entity recognition tools will cope, and ideally I’d like to find lists of places to ‘train’ the tools (including possibly some from the ‘Trenches to Triples’ project).

If there aren’t already any structured data sources for military hierarchies in WWI, do I have to make one? And if so, how? The idea would be to turn prose descriptions like this Australian War Memorial history of the 27th AIF Battalion, this order of battle of the 2nd Australian Division and any other suitable sources into structured data. I can see some ways it might be possible to crowdsource the task, but it’s a big task. But it’s worth it – providing a service that lets people look up which higher military units, places. activities and campaigns a particular battalion/regiment/ship was linked to at a given time would be a good legacy for my research.

I’m sure I’m forgetting lots of things, and my list of questions is longer than my list of answers, but I should end here. To close, I want to share a quote from the official history of the Auckland Mounted Rifles. The author said he ‘would like to speak of the splendid men of the rank and file who died during this three months’ struggle. Many names rush to the memory, but it is not possible to mention some without doing an injustice to the memory of others’. I guess my project is driven by a vision of doing justice to the memory of every soldier, particularly those ordinary men who aren’t as easily found in the records. I’m hoping that drawing on the work of other historians and re-linking disparate sources will help provide as much context as possible for their experiences of the First World War.


Update, 15 October 2014: if you’ve made it this far, you might also be interested in chipping in at ‘Linking lived experiences of the First World War’: possible goals and a bunch of technical questions.

It’s here! Crowdsourcing our Cultural Heritage is now available

My edited volume, Crowdsourcing our Cultural Heritage, is now available! My introduction (Crowdsourcing our cultural heritage: Introduction), which provides an overview of the field and outlines the contribution of the 12 chapters, is online at Ashgate’s site, along with the table of contents and index. There’s a 10% discount if you order online.

If you’re in London on the evening of Thursday 20th November, we’re celebrating with a book launch party at the UCL Centre for Digital Humanities. Register at http://crowdsourcingculturalheritage.eventbrite.co.uk.

Here’s the back page blurb: “Crowdsourcing, or asking the general public to help contribute to shared goals, is increasingly popular in memory institutions as a tool for digitising or computing vast amounts of data. This book brings together for the first time the collected wisdom of international leaders in the theory and practice of crowdsourcing in cultural heritage. It features eight accessible case studies of groundbreaking projects from leading cultural heritage and academic institutions, and four thought-provoking essays that reflect on the wider implications of this engagement for participants and on the institutions themselves.

Crowdsourcing in cultural heritage is more than a framework for creating content: as a form of mutually beneficial engagement with the collections and research of museums, libraries, archives and academia, it benefits both audiences and institutions. However, successful crowdsourcing projects reflect a commitment to developing effective interface and technical designs. This book will help practitioners who wish to create their own crowdsourcing projects understand how other institutions devised the right combination of source material and the tasks for their ‘crowd’. The authors provide theoretically informed, actionable insights on crowdsourcing in cultural heritage, outlining the context in which their projects were created, the challenges and opportunities that informed decisions during implementation, and reflecting on the results.

This book will be essential reading for information and cultural management professionals, students and researchers in universities, corporate, public or academic libraries, museums and archives.”

Massive thanks to the following authors of chapters for their intellectual generosity and their patience with up to five rounds of edits, plus proofing, indexing and more…

  1. Crowdsourcing in Brooklyn, Shelley Bernstein;
  2. Old Weather: approaching collections from a different angle, Lucinda Blaser;
  3. ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections, Tim Causer and Melissa Terras;
  4. Build, analyse and generalise: community transcription of the Papers of the War Department and the development of Scripto, Sharon M. Leon;
  5. What’s on the menu?: crowdsourcing at the New York Public Library, Michael Lascarides and Ben Vershbow;
  6. What’s Welsh for ‘crowdsourcing’? Citizen science and community engagement at the National Library of Wales, Lyn Lewis Dafis, Lorna M. Hughes and Rhian James;
  7. Waisda?: making videos findable through crowdsourced annotations, Johan Oomen, Riste Gligorov and Michiel Hildebrand;
  8. Your Paintings Tagger: crowdsourcing descriptive metadata for a national virtual collection, Kathryn Eccles and Andrew Greg.
  9. Crowdsourcing: Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives, Alexandra Eveleigh;
  10.  How the crowd can surprise us: humanities crowdsourcing and the creation of knowledge, Stuart Dunn and Mark Hedges;
  11. The role of open authority in a collaborative web, Lori Byrd Phillips;
  12. Making crowdsourcing compatible with the missions and values of cultural heritage organisations, Trevor Owens.

Defining the scope: week one as a CENDARI Fellow

I’m coming to the end of my first week as a Transnational Access Fellow with the CENDARI project at the Trinity College Dublin Long Room Hub. CENDARI ‘aims to leverage innovative technologies to provide historians with the tools by which to contextualise, customise and share their research’, which dovetails with my PhD research incredibly well. This Fellowship gives me an opportunity to extend my ideas about ‘Enriching cultural heritage collections through a Participatory Commons‘ without trying to squish them into a history thesis, and is probably perfectly timed in giving me a break from writing up.

View over Trinity College Dublin

There are two parts to my CENDARI project ‘Bridging collections with a participatory Commons: a pilot with World War One archives’. The first involves working on the technical, data and cultural context/requirements for the ‘participatory history commons’ as an infrastructure; the second is a demonstrator based on that infrastructure. I’ll be working out how official records and ‘shoebox archives’ can be mined and indexed to help provide what I’m calling ‘computationally-generated context’ for people researching lives touched by World War One.

This week I’ve read metadata schema (MODS extended with TEI and a local schema, if you’re interested) and ontology guidelines, attended some lively seminars on Irish history, gotten my head around CENDARI’s work packages and the structure of the British army during WWI. I’ve started a list of nearby local history societies with active research projects to see if I can find some working on WWI history – I’d love to work with people who have sources they want to digitise and generally do more with, and people who are actively doing research on First World War lives. I’ve started to read sample primary materials and collect machine-readable sources so I can test out approaches by manually marking-up and linking different repositories of records. I’m going to spend the rest of the day tidying up my list of outcomes and deliverables and sketching out how all the different aspects of my project fit together. And tonight I’m going to check out some of the events at Discover Research Dublin. Nerd joy!

‘The cooperative archive’?

Finally, I’ve dealt with something I’d put off for ages. ‘Commons’ is one of those tricky words that’s less resonant than it could be, so I looked for a better name than the ‘participatory history commons’. because ‘commons’ is one of those tricky words that’s less resonant than it could be. I doodled around words like collation, congeries, cluster, demos, assemblage, sources, commons, active, engaged, participatory, opus, archive, digital, posse, mob, cahoots and phrases like collaborative collections, collaborative history, history cooperative, but eventually settled on ‘cooperative archive’. This appeals because ‘cooperative’ encompasses attitudes or values around working together for a common purpose, and it includes those who share records and those who actively work to enhance and contextualise them. ‘Archive’ suggests primary sources, and can be applied to informal collections of ‘shoebox archives’ and the official holdings of museums, libraries and archives.

What do you think – does ‘cooperative archive’ work for you? Does your first reaction to the name evoke anything like my thoughts above?

Update, October 11: following some market testing on Facebook, it seems ‘collaborative collections’ best describes my vision.

Does citizen science invite sabotage?

Q: Does citizen science invite sabotage?

A: No.

Ok, you may want a longer version. There’s a paper on crowdsourcing competitions that has lost some important context in doing the rounds of media outlets. For example, on Australia’s ABC, ‘Citizen science invites sabotage‘:

‘a study published in the Journal of the Royal Society Interface is urging caution at this time of unprecedented reliance on citizen science. It’s found crowdsourced research is vulnerable to sabotage. […] MANUEL CEBRIAN: Money doesn’t really matter, what matters is that you can actually get something – whether that’s recognition, whether that’s getting a contract, whether that’s actually positioning an idea, for instance in the pro and anti-climate change debate – whenever you can actually get ahead.’.

The fact that the research is studying crowdsourcing competitions, which are fundamentally different to other forms of crowdsourcing that do not have a ‘winner takes all’ dynamic, is not mentioned. It also does not mention the years of practical and theoretical work on task validation which makes it quite difficult for someone to get enough data past various controls to significantly alter the results of crowdsourced or citizen science projects.

You can read the full paper for free, but even the title, Crowdsourcing contest dilemma, and the abstract makes the very specific scope of their study clear:

Crowdsourcing offers unprecedented potential for solving tasks efficiently by tapping into the skills of large groups of people. A salient feature of crowdsourcing—its openness of entry—makes it vulnerable to malicious behaviour. Such behaviour took place in a number of recent popular crowdsourcing competitions. We provide game-theoretic analysis of a fundamental trade-off between the potential for increased productivity and the possibility of being set back by malicious behaviour. Our results show that in crowdsourcing competitions malicious behaviour is the norm, not the anomaly—a result contrary to the conventional wisdom in the area. Counterintuitively, making the attacks more costly does not deter them but leads to a less desirable outcome. These findings have cautionary implications for the design of crowdsourcing competitions.

And from the paper itself:

‘We study a non-cooperative situation where two players (or firms) compete to obtain a better solution to a given task. […] The salient feature is that there is only one winner in the competition. […] In scenarios of ‘competitive’ crowdsourcing, where there is an inherent desire to hurt the opponent, attacks on crowdsourcing strategies are essentially unavoidable.’
From Crowdsourcing contest dilemma by Victor Naroditskiy, Nicholas R. Jennings, Pascal Van Hentenryck and Manuel Cebrian. Published 20 August 2014 doi: 10.1098/​rsif.2014.0532 J. R. Soc. Interface 6 October 2014 vol. 11 no. 99 20140532

I don’t know about you, but ‘an inherent desire to hurt the opponent’ doesn’t sound like the kinds of cooperative crowdsourcing projects we tend to see in citizen science or cultural heritage crowdsourcing.   The study is interesting, but it is not generalisable to ‘crowdsourcing’ as a whole.

If you’re interested in crowdsourcing competitions, you may also be interested in: On the trickiness of crowdsourcing competitions: some lessons from Sydney Design from May 2013. 

Helping us fly? Machine learning and crowdsourcing

Image of a man in a flying contrapation powered by birds
Moon Machine by Bernard Brussel-Smith via Serendip-o-matic

Over the past few years we’ve seen an increasing number of projects that take the phrase ‘human-computer interaction’ literally (perhaps turning ‘HCI’ into human-computer integration), organising tasks done by people and by computers into a unified system. One of the most obvious benefits of crowdsourcing on digital platforms has been the ability to coordinate the distribution and validation of tasks. Increasingly, data manually classified through crowdsourcing is being fed into computers to improve machine learning so that computers can learn to recognise images or words almost as well as we do. I’ve outlined a few projects putting this approach to work below.

This creates new challenges for the future: if fun, easy tasks like image tagging and text transcription can be done by computers, what are the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement? After all, Fast Company reported that ‘at least one Zooniverse project, Galaxy Zoo Supernova, has already automated itself out of existence’. What impact will this have on citizen science and history communities? How might machine learning free us to fly further, taking on more interesting tasks with cultural heritage collections?

The Public Catalogue Foundation has taken tags created through Your Paintings Tagger and achieved impressive results in the art of computer image recognition: ‘Using the 3.5 million or so tags provided by taggers, the research team at Oxford ‘educated’ image-recognition software to recognise the top tagged terms’. All paintings tagged with a particular subject (e.g. ‘horse’) were fed into feature extraction processes to build an ‘object model’ of a horse (a set of characteristics that would indicate that a horse is depicted) then tested to see the system could correctly tag horses.

The BBC World Service archive used an ‘open-source speech recognition toolkit to listen to every programme and convert it to text’ and keywords then asked people to check the correctness of the data created (Algorithms and Crowd-Sourcing for Digital Archives, see also What we learnt by crowdsourcing the World Service archive).

The CUbRIK project combines ‘machine, human and social computation for multimedia search’ in their technical demonstrator, HistoGraph. The SOCIAM: The Theory and Practice of Social Machines project is looking at ‘a new kind of emergent, collective problem solving’, including ‘citizen science social machines’.

And of course the Zooniverse is working on this, most recently with Galaxy Zoo. A paper summarised on their Milky Way project blog, outlines the powerful synergy between citizens scientists, professional scientists, and machine learning: ‘citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery’, addressing the weakness of each approach if deployed alone.

Further reading: an early discussion of human input into machine learning is in Quinn and Bederson’s 2011 Human Computation: A Survey and Taxonomy of a Growing Field. You can get a sense of the state of the field from various conference papers, including ICML ’13 Workshop: Machine Learning Meets Crowdsourcing and ICML ’14 Workshop: Crowdsourcing and Human Computing. There’s also a mega-list of academic crowdsourcing conferences and workshops, though it doesn’t include much on the tiny corner of the world that is crowdsourcing in cultural heritage.

Last update: March 2015. This post collects my thoughts on machine learning and human-computer integration as I finish my thesis. Do you know of examples I’ve missed, or implications we should consider?