How to... use secondary data and archival material

What is secondary data and archival material?

Primary and secondary data

All research will involve the collection of data. Much of this data will be collected directly through some form of interaction between the researcher and the people or organization concerned, using such methods as interviews, focus groups, surveys and participant observation. Such methods involve the collection of primary data, and herein lies the opportunity for the researcher to develop and demonstrate the greatest skill.

However sometimes the researcher will use data which has already been collected for other purposes – in other words, he or she is going to an existing source rather than directly interacting with people. The data may have been:

  • Deliberately collected and analysed, for example for some official survey such as the UK Labour Market Trends (now published as Economic & Labour Market Review (ELMR)) or General Household Survey.
  • Created in a more informal sense as a record of people's activities, for example, letters or other personal items, household bills, company records, etc. At some point, they may have been deliberately collected and organized into an archive.

Either way, such material is termed secondary data.

Rather confusingly, the latter form of secondary data is also referred to as primary source material.

"Primary resources are sources that are usually created at the time of an event. Primary resources are the direct evidence or first hand accounts of historical events without secondary analysis or interpretation."
(York University Libraries Archival Research Tutorial)

This distinguishes them from secondary sources which describe, analyse and refer to the primary sources.

The above definitions and distinctions can be described diagrammatically as follows:

Image: What kind of source is it?. Primary sources: 1. data collected by observation, interview, gocus group, survey and, 2. secondary data in the form of records left by people of their activities. Secondary sources: 1. secondary data collected with a particular research design, and 2. secondary literature which critically analyses data. Tertiary sources: 1. tertiary sources which can locate secondary sources and data sets.

Types of secondary data

Secondary data is found in print or electronic form, if the latter, on CD-ROM, as an online computer database, or on the Internet. Furthermore, it can be in the form of statistics collected by governments, trade associations, organizations that exist to collect and sell statistical data, or just as plain documents in archives or company records.

A crucial distinction is whether or not the data has been interpreted, or whether it exists in raw form.

  • Raw data, also referred to as documentary or archival data, will exist in the form in which it was originally intended, for example meeting minutes, staff records, reports on new markets, accounts of sales of goods/services etc.
  • Interpreted data, which may also be referred to as survey data, will have been collected for a particular purpose, for example, to analyse spending patterns.

Because interpreted data will have been collected deliberately, the plan behind its collection and interpretation will also have been deliberate – that is, it will have been subjected to a particular research design. (See "Using published datasets" section in this guide.)

By contrast, raw data will not have been processed, and will exist in its orginal form. (See "Using archival data" section in this guide.)

When and why to use secondary data

There are various reasons for using secondary data:

  • A particularly good collection of data already exists.
  • You are doing a historical study – that is, your study begins and ends at a particular point in time.
  • You are covering an extended period, and analysing development over that period – a longitudinal study.
  • The unit that you are studying may be difficult, or simply too large, to study directly.
  • You are doing a case study of a particular organization/industry/area, and it is important to look at the relevant documents.

You should pay particular attention to the place of secondary documents within your research design. How prominent a role you give to this method may depend on your subject: for example, if you are researching in the area of accounting, finance or business history, secondary documentary sources are likely to play an important part. Otherwise, use of secondary data is likely to play a complementary part in your research design. For example, if you are studying a particular organization, you would probably want to supplement observation/interviews with a look at particular documents produced by that organization.


In "Learning lessons? The registration of lobbyists at the Scottish parliament" (Journal of Communication Management, Vol. 10 No. 1), the author uses archival research at the Scottish parliament as a supplementary research method (along with the media and focus groups), his main method being interviews and participant observation of meetings.

This point is further developed in the "Secondary data as part of the research design" section of this guide. Reasons for using the different types of secondary data are further developed in the individual sections.

NB If you are doing a research project/dissertation/thesis, check your organization's view of secondary data. Some organizations may require you to use primary data as your principle research method.

Advantages and disadvantages of secondary data collection

The advantages of using secondary data are:

  1. The fact that much information exists in documented form – whether deliberately processed or not – means that such information cannot be ignored by the researcher, and generally saves time and effort collecting data which would otherwise have to be collected directly. In particular:
    • Many existing data sets are enormous, and far greater than the researcher would be able to collect him or herself, with a far larger sample.
    • The data may be particularly good quality, which can apply both to archival data (e.g. a complete collection of records on a particular topic) and to published data sets, particularly those which come from a government source, or from one of the leading commercial providers of data and statistics.
  2. You can access information which you may otherwise have had to secure in a more obtrusive manner.
  3. Existence of a large amount of data can facilitate different types of analysis, such as:
    • longitudinal or international analysis of information which would have otherwise been difficult to collect due to scale.
    • manipulation of data within the particular data set, including the comparison of particular subsets.
  4. Unforseen discoveries can be made – for example, the link between smoking and lung cancer was made by analysing medical records.

The disadvantages of secondary data collection are:

  1. There may be a cost to acquiring the data set.
  2. You will need to familiarize yourself with the data, and if you are dealing with a large and complex data set, it will be hard to manage.
  3. The data may not match the research question: there may be too much data, or there may be gaps, or the data may have been collected for a completely different purpose.
  4. The measures, for example between countries/states/historical periods, may not be directly comparable. (See the "Secondary data as part of the research design" section of this guide for a further development of this topic.)
  5. The researcher has no control over the quality of the data, which may not be seen as rigorous and reliable as data which are specifically collected by the researcher, who has adopted a specific research design for the question.
  6. Collecting primary data builds up more research skills than collecting secondary data.
  7. Company data particularly may be seen as commercially sensitive, and it may be difficult to gain access to company archives, which may be stored in different departments or on the company intranet, to which access may be difficult.