How to… use crowdsourcing as a research tool
A crowd in Paris, copyright James Cridland, http://James.Cridland.net
“Crowds disturb, escalate and then threaten the social order…They come to the fore in contexts of grievance, protest and rising uncertainty. Crowds are agents of change.” (Wexler, 2011, p. 12)
So wrote one sociologist when reflecting on the ways his discipline has viewed the crowd in the past, contrasted with the new perception of the crowd as “a form of collective intelligence that solves problems”.
This latter phenomenon is known as crowdsourcing and is the subject of this article.
What is crowdsourcing, why is it used and how does it work?
Crowdsourcing involves three elements:
- A problem or task is assigned to a wide, and random, audience [the crowd], rather than a selection of experts.
- The crowd then generates shared content, on a voluntary basis (although there may be some small payment), which may be data to resolve a particular problem (for example, Galaxy Zoo where people are asked questions about images), or a collection of memorabilia associated with a particular community (for example, East London Lives, which documents the effects of the Olympics on the lives of East Londoners).
- The above activities are facilitated by online technology.
Projects meeting the first two criteria are not new: for example, the Oxford English Dictionary used volunteers to provide citations of word use.
However, the term “crowdsourcing” was first used by Jeff Howe in a 2006 Time magazine article, to describe the phenomenon of online content created by amateurs.
Howe recently wrote a book, Why the Power of the Crowd Is Driving the Future of Business (Howe, 2008), which looks at online examples of crowdsourcing from the business and scientific worlds: you can read a review here.
Why use crowdsourcing?
Why use the crowd rather than experts, and why should people volunteer their time to provide content to a website when it is so much easier just to browse what is there?
A cynical response to the former question is that it comes down to funding. Goodchild (2007), in discussing the phenomenon of volunteered geography, claims that citizen input to mapping helps fill a hole created by the decline in government funded mapping (p.217).
The challenges of funding, as well as the ease with which anyone can upload content to the Internet, have, according to the UK’s Joint Information Systems Committee (JISC), changed the way in which online collections are created and used (JISC, 2010).
There may, however, be genuine statistical reasons why the crowd, by its nature diverse, performs better than experts. Take prediction, for example: when the disparate guesses of a large number of people are averaged out, the results are often better, and more accurate, than that of an individual expert.
It’s a phenomenon which has been used by researchers at George Mason University to study ways in which crowdsourcing can help in intelligence gathering.
And in some cases, the crowd may have better knowledge than the expert. According to Goodchild (2007), locals may be able to provide early warning of natural disasters because they are familiar with the area, whilst satellites only pass infrequently.
As to why people volunteer their time, one explanation is the desire to contribute to the common good, and to gain recognition from others, as is the case with the academic community (JISC, 2010, p. 7).
Upshall (2011) suggests other reasons: interest in the subject, ease of contribution, and some form of reward, which may or may not be financial.
Academic uses of crowdsourcing
Apart from citizen science projects such as Galaxy Zoo, the main applications of crowdsourcing to academe lie in the areas of geospatial data, of digital content collections round a particular theme, and crowdsourcing of non-complex but non-automatic tasks, often connected with heritage documents. Another interesting use is in the area of intelligence.
According to Andy Hudson-Smith of the Centre for Advanced Spatial Analysis (CASA) at University College London (Hudson-Smith, 2011), crowdsourcing methods have a particular application to geography through Volunteered Geographic Information.
CASA uses a number of different techniques to represent space-time data in pursuit of its research on city systems. The data is explored through a wide range of methods including social physics, statistical models, augmented reality – and crowdsourcing.
Working with colleagues at the School of Geography at the University of Leeds, CASA has been developing large-scale toolkits to facilitate the collection of data by crowdsourcing.
One such tool is SurveyMapper, which enables any user to quickly set up a survey and collect a large amount of data which can be shown on a map.
Surveys can be on any topic, from complex world issues such as climate change to the more personal (such as how happy are you) and pragmatic, such as how fast is your broadband speed or how do you get to work.
Despite, or because of, its ease of use, SurveyMapper has been used extensively by the academic community, as well as the BBC.
Example of a survey from SurveyMapper
CASA is developing a number of other toolkits suitable for large data capture:
- GEMMA, a tool which enables novice users to create complex maps. Publicly available or crowd-sourced data on any topic can be overlaid on freely available mapping services such as Google Maps or Open Street Map.
- TALISMAN: geospaTial datA anaLysIS & SiMulAtioN. This project will develop methods for geospatial data analysis, looking specifically at interactions which reflect potential flows in and between locations.
- The "Tweet-o-Meter", which can mine and analyse data from Twitter, linking it to 16 global cities.
Quercia et al. (2012) provide an example of Twitter being used for social research: they analysed tweets according to "topic models" and were able to demonstrate a correlation between topic and community deprivation.
These involve members of communities uploading content – photos, letters, descriptive accounts, etc. – to a particular online collection point on a particular theme. The collection may be managed by a library, museum, or university, but the items in the collection are supplied by volunteers.
Two important such collections, both funded by the UK’s JISC, are:
- East London Lives, which is a "living archive" documenting the experience of Londoners around and in the run-up to the 2012 Olympics and Paralympics. Residents, schools and taxi drivers were interviewed, and there was a particular focus on the Olympic bid pledges of community regeneration and greater health and well-being.
- The Great War Archive: the public were invited to send in digital memorabilia, or family stories handed down through generations. The result was a unique collection of primary material which tells the story of the First World War from the point of view of the ordinary soldier.
Another community-based collection, this time of research, not funded by JISC and international, is Mendeley’s database of research literature. The Mendeley software is scaled to handle 120 million uploaded documents and hold seven terabytes of data.
Projects which crowdsource a task
There are a number of projects which use crowdsourcing to outsource a particular task or set of tasks.
Most such projects are cultural or heritage-based. Two examples are:
- The Transcribe Bentham project, where volunteers transcribe, from scratch, adding TEI XML tags, the rough notebooks of the famous philosopher, which are heavily annotated and corrected.
- Diderot’s and d’Alembert’s Encyclopédie, the collaborative translation of which is being managed by the University of Michigan.
The complexity of the above projects places great demands on volunteers: in the case of the Encyclopédie, the requirement to understand 18th-century French has limited the take-up.
A rather unusual application of crowdsourcing is being deployed by the US intelligence community. Several groups of researchers are looking at ways of harnessing the power of crowds to predict future events.
The project is DAGGRE, Decomposition-based Elicitation and Aggregation, and is lead by a team based at George Mason University.
Researchers crowdsourced through blog postings and Twitter, asking participants to comment and predict on particular world events, for example the stability of Kim Jong Il’s regime in North Korea.
The predictions are then broken down into variables, the questions themselves into smaller parts so that the respondent can answer that part that best approximates to his or her knowledge.
The idea is to have a “heads up” on momentous world events, such as, say, the Arab Spring.
How to succeed at crowdsourcing
There is no shortage of advice on crowdsourcing projects. The following is a digest of some of the main points.
Before you start
- Make your challenge big (JISC, 2010, p. 19). On the other hand, the problem or goal should be comprehensible, and clearly achievable (Upshall, 2011).
- Question your assumptions: crowdsourcing is about the blurring of lines between the expert and non-expert, the professional and the amateur, and academe and the world outside. Believe that expertise can reside in the “amateur”, and in return, you will be able to achieve larger objectives and gain new insights.
- Upshall (2011) also suggests that some projects yield poor results because expectations are too high, particularly in respect of volunteers’ skills levels. An example of this would be the Encyclopédie referred to above, which requires knowledge of 18th century French; however, it would be difficult to see how this project could have proceeded without securing the right skills.
Managing the project
emember, crowdsourcing is not just about the crowd: it is a two-way engagement. Be a strong part of your community, in particular by creating a strong online presence. Keep the site active, as in the following example from the Transcribe Bentham project, which gives a progress update:
The Transcribe Bentham home page, http://www.ucl.ac.uk/transcribe-bentham/
- Make the site user friendly: people should be able to contribute and upload items fairly easily, without too many checks and barriers. For example, Galaxy Zoo displays a photo of a galaxy then asks the user to classify it according to a number of options which are illustrated. If the user can upload material without knowledge of mark-up, as in the case of the Encylopédie, that is an advantage.
Provide training materials and resources for the user – both the above examples do this, and Galaxy Zoo has an excellent tutorial.
The Galaxy Zoo tutorial
- Think about how you will publicize your project. The DAGGRE project as George Mason used blog postings and Twitter; the Great War Archive was advertised through posters and road shows.
How to ensure quality control will be very important for anyone using crowdsourcing to capture research data, or to complete a research task. The information must be acceptable to the academic community.
- The RunCoCo project (RunCoCo, 2011) suggests that quality assurance happens through peer review: the community itself evaluates, corrects and challenges.
Most projects establish some sort of protocol, and some require particular skills from participants. An example of the latter is the Encylopédie project, with its requirement for 18th century French. Another citizen science project, Christmas Bird Count has detailed guidelines on how to run a count, contact a regional editor etc.
The guidelines page from the Christmas Bird Count website
Copyright is always a thorny issue, and crowdsourced projects need to be aware of ownership and intellectual property issues of community-generated content.
The East London Lives project got over the problem by treating contributors as researchers, which meant that all rights were owned by the University of East London (JISC, 2010). They also used a Creative Commons licence for non commercial use, and an Open Educational licence (material could be used for research and teaching).
There is quite a lot of open source software, some of which was mentioned in the previous section in connection with geospatial projects.
RunCoCo offers CoCoCo open source software, which provides a web interface for searching and browsing approved content.
Another crowdsourcing tool is TypeWright, used for developing the searchable text for a page of images.
This article has examined various academic uses of crowdsourcing, including geographical information, community collections, and outsourced tasks for heritage projects.
The nature of the outputs of a crowdsourced project can work to the disadvantage of an academic, whose success is measured in terms of peer reviewed journal articles (Terras, 2011). On the other hand, the large amount of data generated can be mined for research the results of which can be published.
Whatever the output, it is important not to work alone – to share information and to build on the expertise of others. RunCoCo can be helpful here: it is building a support network, and also provides training. It’s a matter of tapping into the wisdom of the crowd!
Goodchild, M.F. (2007), “Citizens as sensors: the world of volunteered geography”, GeoJournal 69:211–221
Howe, J. (2008), Why the Power of the Crowd Is Driving the Future of Business, Crown Business, New York, NY
Hudson-Smith, A. (2011), “Crowdsourcing opening up new opportunities”, MethodsNews Summer 2011, National Centre for Research Methods, available at http://eprints.ncrm.ac.uk/1843/, accessed January 24th 2012
Joint Information Systems Committee, JISC (2010), Capturing the Power of the Crowd and the Challenge of Community Collections, available at: http://www.jisc.ac.uk/publications/jiscinform/2010/~/link.aspx?_id=0ED8E52991054128B38E24FA31C8E9A4&_z=z, accessed January 24th 2012
Quercia, D., O`Se ́aghdha, D., and Crowcroft, J. (2012), “Talk of the City: Our Tweets, Our Community Happiness”, available at http://www.bartlett.ucl.ac.uk/casa/events/2012-01-25-daniele-quercia, downloaded 22nd January 2012
RunCoCo (2011), RunCoCo: How to run a community collection online, available at http://projects.oucs.ox.ac.uk/runcoco/resources/RunCoCo_Report.pdf, accessed January 24th 2012
Upshall, M. (2011), “Crowdsourcing for education”, paper presented to Online Information 2011, 29 November – 1 December, Olympia National Hall and Conference Centre, London, available at http://www.online-information.co.uk, accessed 23rd January 2012
Wexler, M. N. (2011), “Reconfiguring the sociology of the crowd: exploring crowdsourcing”, International Journal of Sociology and Social Policy, Vol. 31 No. 1/2.