Crowdsourcing and design for participation – Page 2

Crowdsourcing in cultural heritage, citizen science – recent updates

A small* collection of links from the past little while.

Projects

A new Zooniverse project, Decoding the Civil War, launched in June: 'Witness the United States Civil War by transcribing and deciphering messages and codes from the United States Military Telegraph'.
Another Zooniverse project, Camera CATalogue: 'Analyze Wildlife Photos to Help Panthera Protect Big Cats'.

Articles

Palmer, Stuart, and Deb Verhoeven, ‘Crowdfunding Academic Researchers–the Importance of Academic Social Media Profiles’, in ECSM 2016: Proceedings of the 3rd European Conference on Social Media (Academic Conferences and Publishing International, 2016), pp. 291–299
Preece, Jennifer, ‘Citizen Science: New Research Challenges for Human–Computer Interaction’, International Journal of Human-Computer Interaction, 32 (2016), 585–612 <http://dx.doi.org/10.1080/10447318.2016.1194153>
Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals, ‘Introduction: Special Section: Moving from Citizen to Civic Science to Address Wicked Conservation Problems’, Conservation Biology, 30 (2016), 450–55 <http://dx.doi.org/10.1111/cobi.12689> – has an interesting new model, putting citizen sciences 'on a continuum from highly instrumental forms driven by experts or science to more emancipatory forms driven by public concern. The variations explain why citizens participate in CS and why scientists participate too. To advance the conversation, we distinguish between three strands or prototypes: science-driven CS, policy-driven CS, and transition-driven civic science.'

…

'We combined Jickling and Wals’ (2008) heuristic for understanding environmental and sustainability education (Jickling & Wals 2008) and M. Fox and R. Gibson's problem typology (Fig. 1) to provide an overview of the different possible configurations of citizen science (Fig. 2). The heuristic has 2 axes. We call the horizontal axis the participation axis, along which extend the possibilities (increasing from left to right) for stakeholders, including the public, to participate in setting the agenda; determining the questions to be addressed; deciding the mechanisms and tools to be used; choosing how to monitor, evaluate, and interpret data; and choosing the course of action to take. The vertical (goal) axis shows the possibilities for autonomy and self-determination in setting goals and objectives. The resulting quadrants correspond to a particular strand of citizen science. All three occupied quadrants are important and legitimate.'

A heuristic of citizen science based on Wals and Jickling (2008). From Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals (2016)

* It's a short list this month as I've been busy and things seem quieter over the northern hemisphere summer.

Crowdsourcing workshop at DH2016 – session overview

A quick signal boost for the collaborative notes taken at the DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? (held in Kraków, Poland, on 12 July as part of the Digital Humanities 2016 conference, abstract below). We'd emphasised the need to document the unconference-style sessions (see FAQ) so that future projects could benefit from the collective experiences of participants. Since it can be impossible to find Google Docs or past tweets, I've copied the session overview below. The text is a summary of key takeaways or topics discussed in each session, created in a plenary session at the end of the workshop.

Participant introductions and interests – live notes
Ethics, Labour, sensitive material Key takeaway – questions for projects to ask at the start; don't impose your own ethics on a project, discussing them is start of designing the project.	Where to start Engaging volunteers, tips including online communities, being open to levels of contribution, being flexible, setting up standards, quality	Workflow, lifecycle, platforms What people were up to, the problems with hacking systems together, iiif.io, flexibility and workflows
Public expertise, education, what’s unique to humanities crowdsourcing The humanities are contestable! Responsibility to give the public back the results of the process in re-usable	Options, schemas and goals for text encoding Encoding systems will depend on your goals; full-text transcription always has some form of encoding, data models – who decides what it is, and when? Then how are people guided to use it?Trying to avoid short-term solutions
UX, flow, motivation Making tasks as small as possible; creating a sense of contribution; creating a space for volunteers to communicate; potential rewards, issues like badgefication and individual preferences. Supporting unexpected contributions; larger-scale tasks Project scale – thinking ahead to ending projects technically, and in terms of community – where can life continue after your project ends	Finding and engaging volunteers Using social media, reliance on personal networks, super-transcribers, problematic individuals who took more time than they gave to the project. Successful strategies are very-project dependent. Something about beer (production of Itinera Nova beer with label containing info on the project and link to website).	Ecosystems and automatic transcription Makes sense for some projects, but not all – value in having people engage with the text. Ecosystem – depending on goals, which parts work better? Also as publication – editions, corpora – credit, copyright, intellectual property
Plenary session, possible next steps – put information into a wiki. Based around project lifecycle, critical points? Publication in an online journal? Updateable, short-ish case studies. Could be categorised by different attributes. Flexible, allows for pace of change. Illustrate principles, various challenges. Short-term action: post introductions, project updates and new blog posts, research, etc to https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=CROWDSOURCING – a central place to send new conference papers, project blog posts, questions, meet-ups.

The workshop abstract:

Crowdsourcing – asking the public to help with inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge – is reasonably well established in the humanities and cultural heritage sector. The success of projects such as Transcribe Bentham, Old Weather and the Smithsonian Transcription Center in processing content and engaging participants, and the subsequent development of crowdsourcing platforms that make launching a project easier, have increased interest in this area. While emerging best practices have been documented in a growing body of scholarship, including a recent report from the Crowd Consortium for Libraries and Archives symposium, this workshop looks to the next 5 – 10 years of crowdsourcing in the humanities, the sciences and in cultural heritage. The workshop will gather international experts and senior project staff to document the lessons to be learnt from projects to date and to discuss issues we expect to be important in the future.

Photo by Digital Humanities ‏@DH_Western

The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage).

If you're new to crowdsourcing, here's a reading list created for another event.

April news in crowdsourcing, citizen science, citizen history

Another quick post with news on crowdsourcing in cultural heritage, citizen science and citizen history in April(ish) 2016…

Acceptances for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? have been sent out. If you missed the boat, don't panic! We're taking a few more applications on a rolling basis to allow for people with late travel approval for the DH2016 conference in July.

Probably the biggest news is the launch of citizenscience.gov, as it signals the importance of citizen science and crowdsourcing to the US government.

From the press release: 'the White House announced that the U.S. General Services Administration (GSA) has partnered with the Woodrow Wilson International Center for Scholars (WWICS), a Trust instrumentality of the U.S. Government, to launch CitizenScience.gov as the new hub for citizen science and crowdsourcing initiatives in the public sector.

CitizenScience.gov provides information, resources, and tools for government personnel and citizens actively engaged in or looking to participate in citizen science and crowdsourcing projects. … Citizen science and crowdsourcing are powerful approaches that engage the public and provide multiple benefits to the Federal government, volunteer participants, and society as a whole.'

There's also work to 'standardize data and metadata related to citizen science, allowing for greater information exchange and collaboration both within individual projects and across different projects'.

Other news:

Responses to questions about if the volunteers agreed that the Zooniverse… From Science Learning via Participation in Online Citizen Science

A new Zooniverse article is out! Science Learning via Participation in Online Citizen Science by Karen Masters, Eun Young Oh, Joe Cox, Brooke Simmons, Chris Lintott, Gary Graham, Anita Greenhill, Kate Holmes (PDF) – there's also a blog post summarising the research if reading scientific papers isn't your thing: 'We were also able to find evidence in the survey responses that project specific science knowledge correlated positively with measures of active engagement in the project. Put plainly, people who classified more on a given project we found to know more about the scientific content of that project'.
The US Holocaust Memorial Museum's 'U.S. Newspapers and the Holocaust' project has been getting lots of press
HistoryPin has new location and date suggestion features
Geoffrey Belknap has an article on Science Gossip in the Guardian: 'People power: how citizen science could change historical research'
Meghan Ferriter has blogged on Volunpeers: Hashtag, Identity, & Collaborative Engagement and Dreamy Digital Engagement at SXSW Interactive
On my reading list – a blog post on the importance of community research and participatory research practices, 'Why the future of social change belongs to community research' and another on 'Is citizen science about science or outreach?'
New Job Opening: Research Assistant/Developer in Extreme Citizen Science (closes May 22)
Slides on Human-Computer Collaboration at NYPL Labs by Mauricio Giraldo
The European Citizen Science Association (ECSA) Newsletter is a great source of news about events and publications, and the CitSci list has regular posts from practioners and researchers.
The Journal of Science Communication has a series of articles on citizen science
WWW '16 Proceedings of the 25th International Conference on World Wide Web has a number of papers on technical aspects crowdsourcing (nb: I haven't had time to actually read any of them!)
Citizen science and gene editing! Governance: Learn from DIY biologists by Todd Kuiken in Nature (March 2016)

Have I missed something important? Let me know in the comments or @mia_out.

SXSW, project anniversaries and more – news on heritage crowdsourcing

Photo of programme — Our panel listing at SXSW

I've just spent two weeks in Texas, enjoying the wonderful hospitality and probing questions after giving various talks at universities in Houston and Austin before heading to SXSW. I was there for a panel on 'Build the Crowdsourcing Community of Your Dreams' (link to our slides and collected resources) with Ben Brumfield, Siobhan Leachman, and Meghan Ferriter. Siobhan, a 'super-volunteer' in more ways than one, posted her talk notes on 'How cultural institutions encouraged me to participate in crowdsourcing & the factors I consider before donating my time'.

In other news, we (me, Ben, Meghan and Christy Henshaw from the Wellcome Library) have had a workshop accepted for the Digital Humanities 2016 conference, to be held in Kraków in July. We're looking for people with different kinds of expertise for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing?. You can apply via this form.

One of the questions at our SXSW panel was about crowdsourcing in teaching, which reminded me of this recent post on 'The War Department in the Classroom' in which Zayna Bizri 'describes her approach to using the Papers of the War Department in the classroom and offers suggestions for those who wish to do the same'. In related news, the PWD project is now five years old! There's also this post on Primary School Zooniverse Volunteers.

The Science Gossip project is one year old, and they're asking their contributors to decide which periodicals they'll work on next and to start new discussions about the documents and images they find interesting.

The History Harvest project have released their Handbook (PDF).

The Danish Nationalmuseet is having a 'Crowdsource4dk' crowdsourcing event on April 9. You can also transcribe Churchill's WWII daily appointments, 1939 – 1945 or take part in Old Weather: Whaling (and there's a great Hyperallergic post with lots of images about the whaling log books).

I've seen a few interesting studentships and jobs posted lately, hinting at research and projects to come. There's a funded PhD in HCI and online civic engagement and a (now closed) studentship on Co-creating Citizen Science for Innovation.

And in old news, this 1996 post on FamilySearch's collaborative indexing is a good reminder that very little is entirely new in crowdsourcing.

From grey dots to trenches to field books – news in heritage crowdsourcing

Apparently you can finish a thesis but you can't stop scanning for articles and blog posts on your topic. Sharing them here is a good way to shake the 'I should be doing something with this' feeling.* This is a fairly random sample of recent material, but if people find it useful I can go back and pull out other things I've collected.

Victoria Van Hyning, ‘What’s up with those grey dots?’ you ask – brief blog post on using software rather than manual processes to review multiple text transcriptions, and on the interface challenges that brings.

Melissa Terras, 'Crowdsourcing in the Digital Humanities' – pre-print PDF for a chapter in A New Companion to Digital Humanities.

Richard Grayson, 'A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front' – a peer-reviewed article based on data created through Operation War Diary.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement – a poster by Lesley Parilla and Meghan Ferriter reported on the Biodiversity Heritage Library blog.

Ben Brumfield, Crowdsourcing Transcription Failures – a response to a mailing list post asking 'where are the failures?'

And finally, something related to my interest in participatory history commons – Martin Luther King Jr. Memorial Library – Central Library launches Memory Lab, a 'DIY space where you can digitize your home movies, scan photographs and slides, and learn how to care for your physical and digital family heirlooms'. I was so excited when I about this project – it's addressing such important issues. Jaime Mears is blogging about the project.

* How long after a PhD does it take for that feeling to go? Asking for a friend.

Exercises for 'The basics of crowdsourcing in cultural heritage'

I'm running a workshop (at a Knowledge Exchange event organised by the Scottish Network on Digital Cultural Resources Evaluation and the Museums Galleries Scotland Digital Transformation Network) to help people get started with crowdsourcing in cultural heritage. These exercises are designed to give participants some hands-on experience with existing projects while developing their ability to discuss the elements of successful crowdsourcing projects. They are also an opportunity to appreciate the importance of design and text in marketing a project, and the role of user experience design in creating projects that attract and retain contributors.

Exercise: compare front pages

Choose two of the sites below to review.

The most important question to keep in mind is: how effective is the front page at making you want to participate in a project? How does it achieve that?

NYPL Ensemble and NYPL Menus
Map Warper and Georeferencer
Transcribe Bentham and DIY History
Trove and Californian Newspapers (try a search for 'gold' and '1848' or something similar to get to articles)
PCF Tagger and Tiltfactor metadata games

Exercise: try some crowdsourcing projects

Try one of the sites listed above; others are listed in this post; non-English language sites are listed here. You can also ask for suggestions!

Attributes to discuss include:

The overall 'call to action'

Is the first step toward participating obvious?
Is the type of task, source material and output obvious?

Probable audience

Can you tell who the project wants to reach?
Does text relate to their motivations for starting, continuing?
How are they rewarded?
Are there any barriers to their participation?

Data input and data produced

What kinds of tasks create that data?
How are contributions validated?

How productive, successful does the site seem overall?

Exercise: lessons from game design

Go to http://git.io/2048
Spend 2 minutes trying it out
Did you understand what to do?
Did you want to keep playing?

Exercise: your plans

Some questions to help make ideas into reality:

Who already loves and/or uses your collections?
Which material needs what kind of work?
Do any existing platforms meet most of your needs?
What potential barriers could you turn into tasks?
How will you resource community interaction?
How would a project support your mission, engagement strategy and digitisation goals?

How an ecosystem of machine learning and crowdsourcing could help you

Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, 'machine learning' is a label for technologies that allow computers to learn from the data available to them. It means they don't have to specifically programmed to know how to do a task like categorising images – they can learn from the material they're given.)

One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, 'human computation' systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.

I've been thinking about 'ecosystems' of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they're interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project.

The New York Public Library's Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building 'footprints', entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They've also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)

Screenshot from NYPL's Building Inspector — Check building footprints in NYPL's Building Inspector

After teaching 'crowdsourcing cultural heritage' at HILT over the summer, where the concept of 'ecosystems' of crowdsourced tasks was put into practice as we thought about combining classification-focused systems like Zooniverse's Panoptes with full-text transcription systems, I thought it could be useful to give some specific examples of ecosystems for human computation in cultural heritage. If there are daunting data cleaning, preparation or validation tasks necessary before or after a core crowdsourcing task, computational ecosystems might be able to help. So how can computational ecosystems help pre- and post-process cultural heritage data for a better crowdsourcing experience?

While older ecosystems like Project Gutenberg and Distributed Proofreaders have been around for a while, we're only just seeing the huge potential for combining people + machines into crowdsourcing ecosystems. The success of the Smithsonian Transcription Center points to the value of 'niche' mini-projects, but breaking vast repositories into smaller sets of items about particular topics, times or places also takes resources. Machines can learn to classify source material by topic, by type, by difficulty or any other system that crowdsourcers can teach it. You can improve machine learning by giving systems 'ground truth' datasets with (for example) a crowdsourced transcription of the text in images, and as Ted Underwood pointed out on my last post, comparing the performance of machine learning and crowdsourced transcriptions can provide useful benchmarks for the accuracy of each method. Small, easy correction tasks can help improve machine learning processes while producing cleaner data.

Computational ecosystems might be able to provide better data validation methods. Currently, tagging tasks often rely on raw consensus counts when deciding whether a tag is valid for a particular image. This is a pretty crude measure – while three non-specialists might apply terms like 'steering' to a picture of a ship, a sailor might enter 'helm', 'tiller' or 'wheelhouse', but their terms would be discarded if no-one else enters them. Mining disciplinary-specific literature for relevant specialist terms, or finding other signals for subject-specific expertise would make more of that sailor's knowledge.

Computational ecosystems can help at the personal, as well as the project level. One really exciting development is computational assistance during crowdsourcing tasks. In Transcribing Bentham … with the help of a machine?, Tim Causer discusses TSX, a new crowdsourced transcription platform from the Transcribe Bentham and tranScriptorium projects. You can correct computationally-generated handwritten text transcription (HTR), which is a big advance in itself. Most importantly, you can also request help if you get stuck transcribing a specific word. Previously, you'd have to find a friendly human to help with this task. And from here, it shouldn't be too difficult to combine HTR with computational systems to give people individualised feedback on their transcriptions. The potential for helping people learn palaeography is huge!

Better validation techniques would also improve the participants' experience. Providing personalised feedback on the first tasks a participant completes would help reassure them while nudging them to improve weaker skills.

Most science and heritage projects working on human computation are very mindful of the impact of their choices on the participants' experience. However, there's a risk that anyone who treats human computation like a computer science problem (for example, computationally assigning tasks to the people with the best skills for them) will lose sight of the 'human' part of the project. Individual agency is important, and learning or mastering skills is an important motivation. Non-profit crowdsourcing should never feel like homework. We're still learning about the best ways to design crowdsourcing tasks, and that job is only going to get more interesting.

Save

Crowdsourcing the world's heritage

NB: this post was last updated 17 March 2025. In general, I add new sites but don't remove old sites that are no longer live. This post is now supplemented with another on National approaches to crowdsourcing / citizen science. I've also shared a 2015 list of 'participatory digital heritage sites' that includes many crowdsourcing sites. Contact me via my main website contact page to suggest a site.

It's all too easy to overlook international crowdsourcing projects in languages other than English so I thought I'd collect some projects related to cultural heritage, history and science here (following my definition of crowdsourcing in cultural heritage as 'asking the public to help with tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge'). This list is drawn from my PhD research, but this is a fast-moving field and I was focusing on early modern England, so inevitably this list will be missing loads of examples. Please suggest links to help people discover new projects! Also, I'm often taking my best guess at the correct translation for terms, so please correct me if I've misunderstood.

If you're interested in crowdsourcing in cultural heritage, my edited volume has chapters with lessons learnt from a range of projects.

The Zooniverse platform has a post on projects that have been translated into languages including Arabic, Bangla, Chinese, Czech, Dutch, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish and Ukrainian.
AfroCrowd is 'an outreach initiative and Wikimedia usergroup which seeks to increase awareness of the Wikimedia and free knowledge, culture, and software movements among potential editors of African descent' with links to Haitian, Igbo, Twi, Yoruba, Garifuna, French, Spanish Wikipedia and more
Moravian Lives offers text transcription in English, German and Swedish. Thanks @KatherineFaull for sharing!
DigiTalkoot has been and gone (launched February 2011, closed November 2012) but was a great example of tasks that helped correct scanned text for the Historical Newspaper Library of The National Library of Finland.
The National Library of France was involved in a pilot project called 'Correct' to correct errors in scanned documents. Further information: Josse, Isabelle. La bnF engagée dans un projet de R&D pour la conception de la plateforme Correct (Correction et enrichissement collaboratifs de textes). Bulletin des bibliothèques de France. [en ligne], n° 5, 2013, http://bbf.enssib.fr/consulter/bbf-2013-05-0037-008. ISSN 1292-8399
The French version of WikiSource has lots of books to be transcribed.
There's a German-language transcription project for the Digitale Edition Nachlasses Franz Brümmer and related Refine!Editor – it looks like it was designed for student participation and that interested people can register to transcribe via the contact page. Via Simone Waidmann, “Erschließung Historischer Bestände Mittels Crowdsourcing: Eine Analyse Ausgewählter Aktueller Projekte,” Perspektive Bibliothek 3, no. 1 (2014): 33–58, http://journals.ub.uni-heidelberg.de/index.php/bibliothek/article/view/14020.
ARTigo is a German project, with English, French and German-language interfaces. Tag images of artworks through six different games! They also have an active German-language blog.
Red een Portret ('Save a portrait') from the Amsterdam City Archives – help identify photographs or donate money to support the project
Ajapaik is an Estonian project asking for help identifying historical images.
Transcriptorium has several non-English datasets you can review to help train their handwriting recognition software
Ancient Lives is the site for you if you want to learn the ancient Greek alphabet while transcribing papyri.
Arthur Schnitzler digital 'is using the Transcribo software to produce a digital transcription and annotation of both typescript and manuscript material'.
The Bracero History Archive is collecting oral histories in both English and Spanish.
Cymru1900Wales and Cynefin are both working on Welsh maps and have Welsh and English-language interfaces
Danish Demographic Database includes transcriptions from volunteers.
Europeana 1914-1918 and Europeana 1989 are collecting records in many European languages. Wir Waren So Frei is also collecting records about the fall of the Berlin Wall.
You can index records in many languages on Ancestry's World Archives Project.
You can also help Improve Google Translate (not really a heritage project but it helps other projects). Similarly, you can help translate the crowdsourcing platform Pybossa into Italian or learn a language while translating text with Duolingo.
You can 'use the site's comment features to share any supplements (such as citations to published works, transcription of notes not yet addressed, authorial attribution for a particular text, etc.) or remarks on the significance of the manuscript codices and contents' to help Islamic Manuscripts at Michigan.
Itinera Nova has volunteer transcribers
You can help correct and annotate records from 'more than 100 European archives' in the Monasterium.Net collaborative archive.
Help transcribe Dutch natural history collections with Naturalis.
Transcribe Swedish census records from 1760 with Stockholms Stad.
Help index Dutch records with Vele Handen.
The Norwegian The Digital Inn is for 'sources/documents digitised by institutions, associations or persons outside the organisation of the National Archives of Norway' – a fantastic way of collecting the work that community historians are doing
The Danish Politiets registerblade – help transcribe records from the city register.
The Croatian Museum of Broken Relationships
Dry stone walls crowdsourced
The British Library's LibCrowds Convert-a-Card card catalogue transcription project has Pinyin and Indonesian cards for transcription
The National Library of Israel has a crowdsourcing project in Hebrew (via this Pybossa post)
Sefaria, 'a living library of Jewish texts', 'building a free living library of Jewish texts and their interconnections, in Hebrew and in translation'
Footprints, Jewish books through time and place
La Grande Collecte is collecting French records about the First World War
KB Kranten – Editor, help correct digitized newspapers OCR. A collaboration between Dutch national library & Meertens Institute
Edvard Munchs tekster
Demogen, from the State Archives of Belgium
The Estonian Digitalgud – digital 'working bees' to collect information about historical images
Index records about Estonian soldiers in the two World Wars via Eestlased Esimeses maailmasõjas
An L-Crowd Project: TranscribeJP@Japanese Association for DigitalHumanities and Microtasks
Estoria de Espanna and Estoria de Espanna Project blog, 'aiming to transcribe these 13th-century manuscripts, tagging them (especially for person names and toponyms) so as to reconstruct afterwards biographies and itineraries'.
Les herbonautes, a French herbarium transcription project led by the Paris Natural History Museum
Loki is a Finnish project on maritime, coastal history
Swedish Species Information Centre 'Species Observations' (hat tip Sanja Halling)
sandbyborg.se, http://www.platsr.se, http://www.crowdculture.se (hat tip Max Valentin)
Donald Sturgeon‏ @donaldsturgeon said: '@chinesetextproj has an active Wiki section in which Chinese texts are transcribed/OCR post-corrected & annotated: http://ctext.org/wiki.pl?if=en'. Find out more about transcribing, proof-reading, translations, discussion and other forms of contribution on their 'Ways to Help' page.
Danish Family Search projects include indexing church, school and census records, recording street names and categorising professions.
Danish National Archives crowdsourcing https://cs.sa.dk/?locale=en and overview page (suggested by Alex Mendes)
Crowd-correction platform Kokos was 'built to improve the OCR quality of the digitized yearbooks of the Swiss Alpine Club (SAC) from the 19th century', working with French and German
j. Hocker @julianhocker said, 'take a look at interlinking.bbf.dipf.de, it is a project about a encyclopedia for children that was printed in the 19th century'
@BenWBrum pointed me to a Chinese character transcription project on the Smithsonian's platform then @TranscribeSI pointed out some additional Chinese and Japanese-language projects
VinKo ('Varieties in Contact') is an online questionnaire developed at the Universities of Trento and Verona to gather information about the minority languages and dialects spoken in the area between Innsbruck and Verona
@BenWBrum's From the Page platform has French and Spanish language pages from the Louisiana Historical Center at the New Orleans Jazz Museum for transcription
@Lisa_Chupin shared Noms de Vendée, aiming to deepen engagement as well as enrich and correct archival records.
Judaica DH at Penn @judaicadh shared, 'Scribes of the Cairo Geniza classifies/transcribes Hebrew & Arabic fragments' https://www.scribesofthecairogeniza.org/
http://openbolshoi.ru/ (Russian)
Sweden's Digitala forskarsalen ('digital research hall') includes indexing and transcription projects
The Dutch hetvolk.org set of tools / projects (thanks Enno Meijers)
'Maak de Surinaamse slavenregisters openbaar', crowdfunding/crowdsourced transcription project c 2017 (original instructions page) using hetvolk
China – the Shengxuanhuai Manuscript Transcription Initiative, aka the Transcribe Sheng project
The French RECITAL (Contribuez librement à une expérience de transcription participative des REgistres de la Comédie-ITALienne de Paris au XVIIIe siècle). 'Ces documents uniques donnent à réviser l'état des connaissances sur l'économie du spectacle et toute l'histoire culturelle du XVIIIe. Votre aide nous est précieuse' https://recital.univ-nantes.fr/
Kino in der DDR at the University of Erfurt collects information, experience and documents on the cinema history of East Germany. Interview with the project leaders (in German).
Also possibly other academic German citizen humanities projects
Nikola Dyordyevich shared the Serbian 'Улице Панчева' / 'Streets of Panchevo' project with old maps, images, etc. Serbian site: https://улицепанчева.срб. English site: https://ulicepanceva.in.rs/en/
“All Tolstoy in one click” was a Russian language crowdsourcing project that asked volunteers to correct OCR layouts and transcription. Technical details; main site https://readingtolstoy.ru.
The Czech/German (Bavarian) PhotoStruk, crowdsourcing information related to archival photographs of now-destroyed sites on the Czech – Bavarian border. More inL ‘Geoinformatics and Crowdsourcing in Cultural Heritage: A Tool for Managing Historical Archives’. Agris On-Line Papers in Economics and Informatics https://doi.org/10.7160/aol.2018.100207.

Crowdsourcing Wien, a platform from the Austrian Wien Museum und Wienbibliothek im Rathaus. Collections include playbills and letters.

English-language projects tend to be easier to find, but for completeness:

UK – irecord.org.uk/ (thanks Rita Singer @_bydbach_)

USA – archives.gov/citizen-archivist and weather.gov/cle/CWOP (thanks @BuffaloResearch), crowd.loc.gov, transcription.si.edu/

'Your project goes here' – what have I missed?

reseau-correct.fr correction — Correcting text from the Bibliothèque nationale de France on 'Correct'.

Three ways you can help with 'In their own words: collecting experiences of the First World War' (and a CENDARI project update)

Somehow it's a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, 'A lot of battalions were involved in World War One'. I'll do a retrospective post soon, and here's a quick summary of on-going work.

First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask 'where was a specific battalion at a specific time?'. Together, they help address a common situation for people new to WWI history who might ask something like 'I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?'.

I've been working on streamlining and simplifying the public-facing task of collecting a personal narrative for each battalion, and have written a blog post, Help collect soldiers’ experiences of WWI in their own words, that reduces it to three steps:

Take one of the diaries, letters and memoirs listed on the Collaborative Collections wiki, and
Match its author with a specific regiment or battalion.
Send in the results via this form.

If you know of a local history society, family historian or anyone else who might be interested in helping, please send them along to this post: Help collect soldiers’ experiences of WWI in their own words.

Work on specifying the relevant data structures to support a look-up service to answer questions about a specific units location and activities at a specific time largely moved to the wiki:

Talk:British battalions and regiments in World War I
Talk:British Army Hierarchies
Template talk:Battalion – what information should be recorded on every battalion/unit page?
Template talk:Infobox command structure – what structured data should be recorded about military hierarchies?
Template talk:Infobox theatre of war/doc – what structured data should be recorded about a unit's activities and engagements in the war?
Template talk:Infobox military unit – what structured data should be recorded about a battalion/unit?

You can see the infobox structures in progress by flipping from the talk to the Template tabs. You'll need to request an account to join in but more views, sample data and edge cases would be really welcome.

Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it's necessary to support both core goals. I've been fortunate to have help (see 'Thanks and recent contributions' on 'How you can help') but the task is on-going so get in touch if you can help!

So there are three different ways you can help with 'In their own words: collecting experiences of the First World War':

collect diaries linked to specific battalions;
help check or complete the lists of Australian battalions, British battalions and regiments, Canadian battalions and regiments, Indian battalions, Italian battalions and New Zealand battalions in World War;
review and contribute to the data structures needed to record information about military units in the Talk and Template pages above

Finally, last week I was in New Zealand to give a keynote on this work at the National Digital Forum. The video for 'Collaborative collections through a participatory commons' is online, so you can catch up on the background for my project if you've got 40 minutes or so to spare. Should you be in Dublin, I'm giving a talk on 'A pilot with public participation in historical research: linking lived experiences of the First World War' at the Trinity Long Room Hub today (thus the poster).

And if you've made it this far, perhaps you'd like to apply for a CENDARI Visiting Research Fellowships 2015 yourself?

Moving forward: modelling and indexing WWI battalions

A super-quick update from my CENDARI Fellowship this week. I set up the wiki for In their own words: linking lived experiences of the First World War a week ago but only got stuck into populating it with lists of various national battalions this week. My current task list, copied from the front page is to:

Populate list of military units: Australian battalions in World War I, British battalions and regiments in World War I), Canadian battalions in World War I, Indian battalions in World War I, Italian battalions in World War I, New Zealand battalions in World War I. A list of battalions is needed to form the basis for the collecting process. (I'm starting with a list of divisions because I can get it from Wikipedia, but I know this is problematic)
Collate lists of personal diaries, letters, memoirs that can be linked to units through their authors
Collate lists of official unit diaries and histories
Collate resources on researching World War One records to help researchers know where to start
Create a sample battalion page as a demonstrator to show how personal accounts can be linked
Collate information about private letters, diaries and memoirs

If you can help with any of that, let me know! Or just get stuck in and edit the site.

I've started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It's all very lo-fi and much less designed than my usual projects, but I'm hoping people will be able to help regardless.

So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help?