Wooden signpost

How to...
Use digital tools for research

Introduction

In 2000, two behavioural accounting researchers urged their colleagues to use the Web for surveys, pointing out the way the practice had taken hold in the marketing discipline, and the effectiveness of the medium for collecting data and gathering subjects (Herron and Young, 2000).

Twenty years on, the use of web-based surveys is well accepted, but e-research has moved considerably beyond the use of a single instrument to capture data. There has been a real explosion of digital tools which can store, search and retrieve information, and analyse and compare data in ways that produce new information at a granular level which would have been unthinkable ten years ago.

Thus we can learn about obscure members of the eighteenth century underworld from parish and criminal records, correct old maps by comparing them with Google, and read up on complaints about litter on a particular street.

Yet other tools help with collaboration and community creation, both in the "official" sense of different research groups working together across institutions, as well as the more informal one of "crowdsourcing", which involves using social networks to find people with similar research interests. (Some of these tools are explored in the guide, "How to... use social software tools for research".)

None of this changes the essence of research – which is to come up with zgood ideas and questions, and use sound methodology to investigate them. It does, however, change the way research is done, open up new, information and communication technology based methods, and vastly increase the number of questions that can be asked.

"Growing Knowledge: The evolution of research" is an exhibition held from 12 October 2010 to 16 July 2011 at the British Library in London which shows the way innovative projects – selected from a wide range of disciplines – use new technologies in research to reveal new types of knowledge.

The idea is to inspire researchers to cross disciplinary and media boundaries and consider how they can re-purpose the tools used by the projects for their own purposes.

Digital research & the social sciences

Digital technologies are being used in the social sciences to such an extent that the term "e-social science" has been coined. There are also a number of large-scale projects on the subject being undertaken, especially in the UK, Australia, Canada and New Zealand.

The website of the University of Canterbury, New Zealand, defines e-social science as:

  • the ability to link databases in different locations across the globe, integrate data sets and perform comparative analyses;
  • to use the Web for surveys and experiments; and
  • have research teams collaborating in virtual laboratories (University of Canterbury, n.d.).

Examples of projects include:

  • The Manchester eResearch Centre which has the twin objectives of creating an e-infrastructure for social science, and social studies of the development of e-science. It builds on the work of the UK's National Centre for e-Social Science, which it coordinated.
  • The Oxford e-Social Science Project, which was set up to study the impact of the new digital approaches and their ethical, legal and institutional implications. The first part of this project (2005-2008) looked at a number of case studies and drew out common themes, while the second phase (2008-2011) is looking at these themes in depth.
  • The development of a hub for e-social science in New Zealand, in the form of an advanced social statistics data service.
  • In India and South Asia, a portal has been set up to offer access to social science journals, as well as to provide scholars with the opportunity to showcase working papers, and discuss policy applications.

How can digital tools help social science researchers?

There are four main ways in which digital technology can help research:

  1. Community and collaboration – new technologies enable the creation of online communities. These are explored in "How to.. use social software tools for research" (particularly the section on virtual research environments). Other tools such as Twitter and social bookmarking can be used to help share information. And whereas previously you would have had to scan the relevant literature to find people of similar interests, you can now do this through crowdsourcing – using social software tools to stimulate interest.
  2. Search and retrieval of information – information retrieval tools such as catalogues and databases have become much more powerful and user friendly, allowing you to refine your results and tag items. Examples include the British Library catalogue and the British Library's own Management and Business Studies Portal. All these resources can also be explored from the comfort of one's own home, without the need to visit the physical library.
  3. The ability to use data in new and creative ways, by overlaying one data set on another. This is one of the most interesting aspects of digital research and is explored more fully in the next section.
  4. The ability to capture paintings, ancient texts, manuscripts, and other objects in digital form. Images of artefacts can also be seen in 3D through Polynomial Texture Mapping, see http://materialobjects.com/ptm/. The main application is to literary and historical subjects, but the ability to view artefacts has a potential application in anthropology.

The world of linked data

The Semantic Web is the web of linked data, as opposed to documents; social science is, like science, essentially about the examination of data. (For more on the Semantic Web, see the information management viewpoint, "The Semantic Web – a new tool for libraries?" ).

The linking of data means that every item of data within a data set, in, say, a spreadsheet or a database, is given a web address – a unique resource indicator or URI – which is then machine readable, and enables the data item to be located independently of its set. Crucially, it also means that data sets can be combined.

This means that the researcher can ask many more questions, and obtain far more detailed answers. Examples include:

London Lives

The London Lives project (http://www.londonlives.org/index.jsp) is a tool for researching lives of people who lived in London in the eighteenth century, whose details are not easily available in more conventional sources. Subtitled "Crime, poverty and social policy in the metropolis", it comprises a searchable database of 14 archives, 15 further data sets, and 240,000 manuscripts.

The archives include parish archives, as well as records from criminal proceedings, coroners, hospitals and guilds, workhouses, income tax payments, salaries, etc. This all adds up to considerable biographical information on hundreds of thousands of individual Londoners.

Government data

Another example is the way that governments at both local and national level are making their data sets publicly available. The US Federal Government launched Data.gov in 2009, and the UK followed with data.gov.uk in 2010. Other countries have open data projects, notably Australia and New Zealand, The Netherlands, Sweden, Spain, Austria and Denmark, and, at the city level, Vancouver and London (Sheridan and Tennison, 2010).

Both projects have as their stated aims greater transparency, democratic accountability, and participation. Providing detailed information on health, education, transport, taxation, spending, crime, etc. helps citizens be better informed and able to see how their own particular area of local government compares with others (Shadbolt, 2011).

Public bodies can be held accountable (e.g. publishing death rates per hospital). Those in government who watch spending can look at value for money and good investment and procurement decisions, so public service delivery can be improved.

The above examples show how the general public can use data; however, it is not difficult to see how government data sets can benefit researchers working in the areas of social policy, sociology, geography, and political science.

There is another benefit, however: the data.gov.uk project provided high standards for linked data which enables the publisher to retain control, and helps prevent inappropriate misuse. This in turn sets a standard for other linked data projects, which will benefit research in other areas.

For an account of the technical development of data.gov.uk, see Sheridan and Tennison, 2010.

Geo-spatial information

data.gov.uk paid particular attention to two types of data: statistical and geo-spatial. This was because practically all data sets contained statistics, such as the number of pupils of a certain age in school, and references to a real world location (Sheridan and Tennison, 2010).

Linked data for geo-spatial information enables data to be compared across physical space. The EU has decreed that European countries should be able to exchange spatial information, and in the UK, the Ordnance Survey has opened up its data for public use. The challenge, however, is to provide identifiers for spatial objects that represent objects in the real world, and also take account of changes, such as those to boundaries.

How researchers analyse linked data

The ability to link different data sets, and to have access to a large volume of data, changes the methods researchers can use. In particular, it necessitates a different approach to the study of text. Many researchers (in the humanities at least) are used to reading every word of text, and even if they skim, they usually do so in a linear fashion.

However, when dealing with the large corpora of data that digital collections make possible, and powerful new software tools, this approach is both limited and impractical. The best way to undertake digital research is to look for patterns in data.

Visualization tools

Visualization is a good way of creating patterns, and there are a number of software tools that help.

Tableau Public, for example, takes data in text form, and, providing it is formatted in the correct way (i.e. with a left-hand column which states values, which it interprets as dimensions, whilst numbers are perceived as measurements), turns it into a visualization of the user's choice. Similar tools include Many Eyes and Google Visualization API.

GeoVUE: Geographic Virtual Urban Environments uses visualization techniques and GIS to create new models of cities, virtual urban environments. Currently it has three projects: one which links publicly available data to non-proprietary maps; one which is specifically concerned with pollution data, and one in which the user can build their own simulation.

Simulation

Social scientists, like economists, want to be able to forecast scenarios in a number of areas such as housing, health care, education and transport. A good way of doing this is to create simulation tools, to which variables can be applied and outputs generated.

Social simulation is thus an expanding field, creating a demand for tools, services and research communities. The National eInfrastructure for Social Simulation has been set up to cater for this demand.

Text mining

Text mining offers the opportunity to describe a piece of text in statistical terms. For example, text can be analysed and repetitions identified. Each repetition can be given a unique identifier, and a map created of where the repetitions occur in the text.

The project, Data Mining with Criminal Intent applies text mining to historical sources, and in particular to the history of crime, using a form of compression analysis. Dr Tim Hitchcock explains how text mining can help research nineteenth century domestic violence, allusions to which are scarce as it was not then classed as a crime.

There were, however, a total of 1,200 trials involving spouse murder, which themselves define a context for domestic violence. Thus by studying accounts of these trials, it is possible to build up a model of what domestic violence looked like, use the model to define other cases, then refine the model and apply it to newspapers and novels.

Text mining can also be used to ease the burden of producing systematic reviews, made more difficult by the deluge of information (Ananiadou et al., 2007). Text mining techniques can be used in query expansion, document screening (topic clustering), and synthesizing (sentences selected from documents based on the most significant terms and classification techniques).

Qualitative longitudinal analysis

Perhaps one of the most interesting methods of analysis associated with linked data (although not uniquely) is the combination of qualitative with longitudinal methods: the ability to measure change over time.

Qualitative longitudinal (QL) analysis combines the richness of qualitative data ("detailed, contextualised data that can answer 'how' and 'why' questions about the social world'), with longitudinal research, with its dynamic view of social processes (University of Leeds, n.d.).

Timescapes is the first major longitudinal study to be funded in the UK. Run by the University of Leeds, it explores how personal and family relationships change over time.

A major feature of the project is Timescapes' digital resource centre and archive, which comprises data generated by the various projects (seven in all, encompassing all generations). The archive is open to all, the idea being to benefit the growing international community of QL researchers.

The existence of so much rich data at one location also benefits those using secondary analysis – innovative analytical strategies and different perspectives brought to bear on existing data, seen with new eyes.

However, to conduct secondary analysis successfully, it is important to have to hand not just the participants' data, but also other relevant information such as the interview schedules and the motivation behind the questions, as well as field notes and data tables for each project, and details of the researchers (Baker, 2010).

Longitudinal research is a very powerful form of research, according to Timescapes' director Bren Neale, because we can see how change happens. However it requires the build-up of rich, huge data sets, and these are best hosted in a digital archive with good search and retrieve facilities, that can by definition be open to all.

To read more about QL, and secondary analysis, see www.timescapes.leeds.ac.uk/events-dissemination/publications.php.

Conclusion

E-social science offers the potential to hold and structure data in a more systematic way, combining and integrating data sets so that it is possible to cut through information overload and find the precise details you want.

For the social sciences, it also offers the possibility of modelling scenarios, measuring and observing change over time, and linking facts to a particular place.

This is big research involving big data sets and the necessity for cross-institutional e-infrastructure, but the reward is new forms of knowledge which will yield greater understanding about ourselves, and the possibility for more targeted, relevant and cost-effective policymaking.

References

Ananiadou, S., Procter, R., Rea, B., Sasaki, Y. and Thomas, J. (2007), "Supporting systematic reviews using text mining", in Proceedings of the 3rd International Conference on e-Social Science, available at: http://www.ncess.ac.uk/events/conference/2007/papers/paper208.pdf [accessed 23 March 2011].

Baker, S. (2010), "Reflections on secondary analysis of the 'Siblings and Friends' data", secondary analysis report on the Siblings and Friends: The Changing Nature of Children's Lateral Relationships project, available at: http://www.timescapes.leeds.ac.uk/research-projects/projects/siblings-f… [accessed 23 March 2011].

Herron, T.L. and Young, G.R. (2000), "E-research: Moving behavioral accounting research into cyberspace", in Hunton, J.E. (Ed.), Advances in Accounting Behavioral Research, Vol. 3, Emerald Group Publishing Limited, UK, pp. 265-280.

Shadbolt, N. (2011), "A year of data.gov.uk", Guardian.co.uk, Datablog [blog], 21 January, available at: http://www.guardian.co.uk/news/datablog/2011/jan/21/data-gov-nigel-shad… [accessed 23 March 2011].

Sheridan, J. and Tennison, J. (2010), "Linking UK government data", in Christian Bizer, C., Heath, T., Berners-Lee, T. and Hausenblas, M. (Eds), Proceedings of the WWW2010 Workshop on Linked Data on the Web (LDOW 2010), 27 April, Raleigh, North Carolina, available at: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-628/l… [accessed 23 March 2011].

University of Canterbury (n.d.), "e-Social Science emerging research and agendas", available at: http://www.ssrc.canterbury.ac.nz/about/e_social_science.shtml [accessed 22 March 2011].

University of Leeds (n.d.), "Timescapes: Methods and ethics", available at: http://www.timescapes.leeds.ac.uk/methods-ethics/ [accessed 22 March 2011].