How to...
Use secondary data & archival material

Find out what secondary data is – as opposed to primary data – and how to go about collecting and using it.

What is secondary data & archival material?

Primary & secondary data

All research will involve the collection of data. Much of this data will be collected directly through some form of interaction between the researcher and the people or organisation concerned, using such methods as interviews, focus groups, surveys and participant observation. Such methods involve the collection of primary data, and herein lies the opportunity for the researcher to develop and demonstrate the greatest skill.

However sometimes the researcher will use data which has already been collected for other purposes – in other words, he or she is going to an existing source rather than directly interacting with people. The data may have been:

Deliberately collected and analysed, for example for some official survey such as the UK Labour Market Trends (now published as Economic & Labour Market Review (ELMR)) or General Household Survey.
Created in a more informal sense as a record of people's activities, for example, letters or other personal items, household bills, company records, etc. At some point, they may have been deliberately collected and organised into an archive.

Either way, such material is termed secondary data.

Rather confusingly, the latter form of secondary data is also referred to as primary source material.

"Primary resources are sources that are usually created at the time of an event. Primary resources are the direct evidence or first hand accounts of historical events without secondary analysis or interpretation."
(York University Libraries Archival Research Tutorial)

This distinguishes them from secondary sources which describe, analyse and refer to the primary sources.

The above definitions and distinctions can be described diagrammatically as follows:

Types of secondary data

Secondary data is found in print or electronic form, if the latter, on CD-ROM, as an online computer database, or on the Internet. Furthermore, it can be in the form of statistics collected by governments, trade associations, organisations that exist to collect and sell statistical data, or just as plain documents in archives or company records.

A crucial distinction is whether or not the data has been interpreted, or whether it exists in raw form.

Raw data, also referred to as documentary or archival data, will exist in the form in which it was originally intended, for example meeting minutes, staff records, reports on new markets, accounts of sales of goods/services etc.
Interpreted data, which may also be referred to as survey data, will have been collected for a particular purpose, for example, to analyse spending patterns.

Because interpreted data will have been collected deliberately, the plan behind its collection and interpretation will also have been deliberate – that is, it will have been subjected to a particular research design.

By contrast, raw data will not have been processed, and will exist in its original form. (See "Using archival data" section in this guide.)

When and why to use secondary data

There are various reasons for using secondary data:

A particularly good collection of data already exists.
You are doing a historical study – that is, your study begins and ends at a particular point in time.
You are covering an extended period, and analysing development over that period – a longitudinal study.
The unit that you are studying may be difficult, or simply too large, to study directly.
You are doing a case study of a particular organisation/industry/area, and it is important to look at the relevant documents.

You should pay particular attention to the place of secondary documents within your research design. How prominent a role you give to this method may depend on your subject: for example, if you are researching in the area of accounting, finance or business history, secondary documentary sources are likely to play an important part. Otherwise, use of secondary data is likely to play a complementary part in your research design. For example, if you are studying a particular organisation, you would probably want to supplement observation/interviews with a look at particular documents produced by that organisation.

Example

In "Learning lessons? The registration of lobbyists at the Scottish parliament" (Journal of Communication Management, Vol. 10 No. 1), the author uses archival research at the Scottish parliament as a supplementary research method (along with the media and focus groups), his main method being interviews and participant observation of meetings.

This point is further developed in the "Secondary data as part of the research design" section of this guide. Reasons for using the different types of secondary data are further developed in the individual sections.

NB If you are doing a research project/dissertation/thesis, check your organisation's view of secondary data. Some organisations may require you to use primary data as your principle research method.

Advantages and disadvantages of secondary data collection

The advantages of using secondary data are:

The fact that much information exists in documented form – whether deliberately processed or not – means that such information cannot be ignored by the researcher, and generally saves time and effort collecting data which would otherwise have to be collected directly. In particular:

Many existing data sets are enormous, and far greater than the researcher would be able to collect him or herself, with a far larger sample.
The data may be particularly good quality, which can apply both to archival data (e.g. a complete collection of records on a particular topic) and to published data sets, particularly those which come from a government source, or from one of the leading commercial providers of data and statistics.

You can access information which you may otherwise have had to secure in a more obtrusive manner.
Existence of a large amount of data can facilitate different types of analysis, such as:

longitudinal or international analysis of information which would have otherwise been difficult to collect due to scale.
manipulation of data within the particular data set, including the comparison of particular subsets.

Unforseen discoveries can be made – for example, the link between smoking and lung cancer was made by analysing medical records.

The disadvantages of secondary data collection are:

There may be a cost to acquiring the data set.
You will need to familiarise yourself with the data, and if you are dealing with a large and complex data set, it will be hard to manage.
The data may not match the research question: there may be too much data, or there may be gaps, or the data may have been collected for a completely different purpose.
The measures, for example between countries/states/historical periods, may not be directly comparable. (See the "Secondary data as part of the research design" section of this guide for a further development of this topic.)
The researcher has no control over the quality of the data, which may not be seen as rigorous and reliable as data which are specifically collected by the researcher, who has adopted a specific research design for the question.
Collecting primary data builds up more research skills than collecting secondary data.
Company data particularly may be seen as commercially sensitive, and it may be difficult to gain access to company archives, which may be stored in different departments or on the company intranet, to which access may be difficult.

Using published data sets

What are they?

As discussed in the previous section, these are sources of data which have already been collected and worked on by someone else, according to a particular research design. Other points to note are:

Mostly they will have been collect by means of a survey, which may be:

a census, which is an "official count", normally carried out by the government, with obligatory participation, for example the UK population censuses carried out every ten years
a repeated survey, which involves collecting information at regular intervals, for example government surveys about household expenditure
an ad hoc survey, done just once for a particular purpose, such as for example a market research survey.

Interpreted data as referring to a particular social unit is termed a data set.
A database is a structured data set, produced as a matrix with each social unit having a row, and each variable a column.
Sometimes, different data sets are combined to produce multiple source secondary data: for example, the publication Business Statistics of the United States: Patterns of Economic Change contains data on virtually all aspects of the US economy from 1929 onwards. Such multiple source data sets may have been compiled on:

a time series basis, that is they are based on repeated surveys (see above) or on comparable variables from different surveys to provide longitudinal data
a geographical basis, providing information on different areas.

Key considerations

There are a number of points to consider when using data sets, some practical and others associated with the research design (yours and theirs).

Practical considerations relate to cost and use:

Whilst much data is freely available, there may be a charge. For example, Business Statistics of the United States: Patterns of Economic Change is priced US$147. So, when deciding what data to use it's a good idea to check what's already in your library.
Is the data available in computerised form, or will you have to enter it manually? If it is available in computerised form, is it in a form suitable to your research design (see below) or will you have to tabulate the data in a different form?

Research considerations include:

Is the data set so important to your research that you cannot ignore it? For example, if you were doing a project which involved top corporations, you could not afford to ignore the publications which provided data and statistics, such as Europe's 15,000 Largest Companies 2006.
Does the data generally cover the research question?

Is the coverage relevant, or does it leave out areas (e.g. only Asia as opposed to Australasia) or time periods (e.g. only starting in 1942 when you wanted data from 1928)?
Are the variables relevant, for example if you are interested in household expenditure does it break down the households in ways relevant to your project?
Are the measures used the same, for example, is growth in sales expressed as an amount or a percentage?
In the case of data from different countries, has the data been collected in the same way? For example, workers affected by strikes may include those directly affected in one country, and those indirectly affected in another.
Is the data reliable, and current? Note that data from government, and reputable commercial sources, is likely to be trustworthy but you should be wary of information on the Internet unless you know its source. Data from trustworthy sources is likely to have been collected by a team of experts, with good quality research design and instruments.
The advantage of survey data in particular is that you have access to a far larger sample than you would otherwise have been able to collect yourself.
There is an obvious advantage to using a large data source, however you need to allow for the time needed to extract what you want, and to re-tabulate the data in a form suitable for your research.
How has the data been collected, for example it it longitudinal or geographical? This will affect the type of research question it can help with, for example, if you were comparing France and Germany, you would obviously want geographical data.
How intrinsic to your research design will the use of secondary data be? Beware of relying on it entirely, but it may be a useful way of triangulating other research, for example if you have done a survey of shopping habits, you can assess how generalisable your findings are by looking at a census.
While use of secondary data sets may not be seen as rigorous as collecting data yourself, the big advantage is that they are in a permanently available form and can be checked by others, which is an important point for validity.

And finally...

Will the benefits you gain from using secondary data sets as a research methods outweigh the costs of acquiring the data, and the time spent sorting out what is relevant?

Sources

Producers of published secondary data include:

Governments and intergovermental organisations, who produce a wide variety of data. For example, from the US Government come such titles as Budget of the United States Government, Business Statistics of the United States: Patterns of Economic Change, County and City Extra (source of data for every state), and Handbook of U.S. Labor Statistics.
Trade associations and organisations representing particular interests, such as for example the American Marketing Association. These may have data and information relevant to their particular interest group.
Commercial providers. These are often the best source of information for commercial and financial data. Their focus may be:
- company information: for example AMADEUS provides pan European information on companies that includes balance sheets, profit and loss, ratios, descriptive etc., while FAME does a similar job for companies in the UK and Ireland.
- market research: for example, Mintel specialises in consumer, media and market research and published reports into particular market sectors, whilst Key Note "boasts one of the most comprehensive databases available to corporations in the UK", having published almost 1,000 reports spanning 30 industry sectors.

Where to find such information? The key is to have a very clear idea of what it is you are trying to find: what particular aspects of the research question are you attempting to answer?

You may well find sources listed in your literature review, or your tutor may point you in certain directions, but at some point you will need to consult the tertiary literature, which will point you in the direction of archives, indexes, catalogues and gateways. Your library will probably have Subject Guides covering your areas of interest. The following is a very basic list:

UK Economic and Social Data Services (ESDS). Contains links to: UK Data Archive (University of Essex); Institute for Social and Economic Research (University of Essex); Manchester Information and Associated Services (University of Manchester); and Cathie Marsh Centre for Census and Survey Research (University of Manchester). These contain access to a wide range of national and international data sets.
http://epp.eurostat.ec.europa.eu. Statistics of the European Union.
University of Michigan. Gateway to statistical resources on the Web.
D&B Hoovers. Company information on US and international companies.

Using archival data

What are they?

Archival, or documentary secondary data, are documentary records left by people as a by product of their eveyday activity. They may be formally deposited in an archive or they may just exist as company records.

Historians make considerable use of archival material as a key research technique, using a wide range of personal documents such as letters, diaries, household bills, which are often stored in some sort of formal "archive".

Business researchers talk about "archival research" because they use many of the same techniques for recording and analysing information. Companies, by their very nature, tend to create records, both officially in the form of annual reports, declarations of share value etc., and unofficially in the e-mails, letters, meeting minutes and agendas, sales data, employee records etc. which are the by-product of their daily activities.

If you are studying a business and management related subject, you may make use of archival material for a number of reasons:

Your research takes a historical perspective, and you want to gain insight into management decisions outside the memories of those whom you interview.
Archival research is an important tool in your particular discipline – for example, finance and accounting.
You wish to undertake archival research as part of qualitative research in order to triangulate with interviews, focus groups etc., or perhaps as exploratory research prior to the main research.
You may be undertaking a case study, or basing your research project on your own organisation; in either case, you should look at company documents as part of this research.

Examples

In "Financial reporting and local government reform – a (mis)match?" (Qualitative Research in Accounting & Management, Vol. 2 No. 2), Robyn Pilcher uses archival research – "Data was obtained from annual reports provided electronically to the DLG and checked against hard copies of these reports and supporting notes" – and interviews as exploratory research to investigate use of flawed financial figures by political parties, before carrying out a detailed examination of a few councils.

"Coalport Bridge Tollhouse, 1793-1995" (Structural Survey, Vol. 14 No. 4) is a historical study of this building drawing on such documents as maps, plans, photos, account books, meeting minutes, legal opinions and census records.

As distinct from published data sets, you will have to record and process the data yourself, in order to create your own data set.

Sometimes this archival material will be stored in "official" archives, such as the UK Public Record Office. Mostly however, it will be company specific, stored in official company archives or perhaps in smaller collections in individual departments or business units. Records can exist in physical or electronic form – the latter commonly on the company intranet.

Example

Whatever the company's archiving policy, there is no doubt that businesses provide a rich source of data. Here is a (non exhaustive) list of the forms that data can take:

Organisational records – for example HR, accounts, pay roll data etc.
Data referring to the sales of goods or services
Project files
Organisation charts
Letters
E-mails
Faxes
Meeting minutes and agendas
Reports
Diaries
Sales literature: catalogues, copies of adverts, brochures etc.
Annual reports
Reports to shareholders
Transcripts of speeches
Non textual material: maps and plans, videos, tapes, photographs.

Management Information Systems can hold a considerable amount of data. For example, the following HR records may be held:

data on recruitment, e.g. details of vacancies, dates, job details and criteria
staff employment details, for example job analysis and evaluation, salary grades, terms and conditions of employment, job objectives, job competencies, performance appraisals
data relevant to succession and career planning, e.g. the effects of not filling jobs
management training and development, e.g. training records showing types of training.

Source: Peter Kingsbury (1997), IT Answers to HR Questions, CIPD.

The media (newspapers, magazines, advertisements, television and radio programmes, books, the Internet) can also throw valuable light on events, and media sources should not be ignored.

Key considerations

There are a number of points to consider when using archival material:

You will need to gain access to the company, and this may prove difficult (see the "Gaining access to, and using, archives" section in this guide). On the other hand, if you are doing a report/project on your own organisation, access may be a lot easier, although even here you should gain agreement to access and use of material.
Even if you are successful in gaining access to the company, it may be difficult and time-consuming to locate all the information you need, especially if the company does not have a clear archiving policy, and you may need to go through a vast range of documents.
The data may be incomplete, and may not answer your research question – for example, there may be a gap in records, correspondence may be one-sided and not include responses.
The data may be biased, in other words it will be written by people who have a particular view. For example, meeting minutes are the "official" version and often things go on in meetings which are not recorded; profitability in annual reports may be reported in such a way as to show a positive rather than a true picture.
Informal and verbal interactions cannot be captured.
Archival research is time-consuming, both in locating and in recording documents, so for that reason may not be feasible for smaller projects.
You will also need to decide how to record data: historians are used to laboriously copying out documents considered too frail to photocopy, and business researchers may need to resort to this if (as is likely) company documents are considered confidential, although in such cases, note-taking may also be out. You will also need to find a suitable way of coding and referring to particular documents.
Finally, you will need to construct your own data set, for which you will need to have a particular research method.

Example

In "Participatory group observation – a tool to analyse strategic decision-making" (Qualitative Market Research, Vol. 5 No. 1), Christine Vallaster and Oliver Koll highlight the benefit of multiple methods for studying complex issues, it being thus possible to supplement the weaknesses of one method with the strengths of another and study a phenomenon from a diversity of views, and achieve a high degree of validity. In the case in question, archival research was used to analyse documents (organisation charts, company reports, memos, meeting minutes), and whilst the limitations in terms of incompleteness, selectivity, and not being authored by interviewees were acknowledged, so was their supporting value to interviews, and the same textual analysis method was used for both methods.

Secondary data as part of the research design

We have already mentioned, as part of our discussion of the two main types of secondary data, some considerations in respect to how they are used as part of the research. In this section, we shall look more generally at how secondary data can fit in to the overall research design.

Theoretical framework

Researchers take different views of the facts they are researching. For some, facts exist as independent reality; others admit the possibility of interpretation by the actors concerned. The two views, and their implication for the documents and data concerned, can be summed up as follows:

Positivists see facts as existing independently of interpretation, so documents are an objective reflection of reality.
Interpretivists, and even more so realists, see reality as influenced by the social environment, open to manipulation by those who are part of it. A document must be seen in its social context, and an attempt to make sense of that context.

Some examples would be:

minutes of a sales meeting the purpose of which was to monitor sales, with sales being affected by external influences
brochure or flyer which was created for a particular item, and designed to appeal to current fashions
training records of people doing National Vocational Qualifications (used in the UK to acknowledge the value of existing skills).

Reliability and validity

Reliability and validity is important to any research design, and an important consideration with secondary data is the extent to which it relates to the research question, in other words how reliably it can answer it. You need to consider the fit very carefully before deciding to proceed. Some questions which may help here are:

How reliable is the data?

In the case of published data, you will be able to make a judgement by looking at its provenance: does it come from the government, or from a reputable commercial source? The same applies to the Internet – what is the source? Look for publisher information and copyright statements. How up to date is the material?

You also need to make intrinsic judgements, however: what is the methodology behind the survey, and how robust is it? How large was the sample and what was the response rate?

There are fewer obvious external measures you can use to check unpublished, archival material: that from businesses can be notoriously inconsistent and inaccurate. Records can be incomplete with some documents missing; sometimes, whole archives can disappear when companies are taken over. In addition, some documents such as letters, reports, e-mails, meeting minutes etc. have a subjective element, reflecting the view of the author, or the perceived wishes of the recipient. For example, meeting minutes may not reflect a controversial discussion that took place but only the agreed action points; a report on sales may be intended to put a positive spin on a situation and disguise its real seriousness. It helps when assessing reliability to consider who the intended audience is.

If you are using media reports, be aware that these may only include what they consider to be the most pertinent points.

Measurement validity

One of the biggest problems with secondary data is to do with the measurements involved. These may just not be the same as the ones you want (e.g. sales given in revenue rather than quantity), they may deliberately be distorted (e.g. non recording of minor accidents, sick leave etc.), or they may be different for different countries. If the measures are inexact, you need to take a view as to how serious the problem is and how you can address it.

Coverage

Does the data cover the time frame, geographical area, and variable in which you are interested? For example, if you are studying a particular period in a company, do you have meeting minutes to cover that period, or do they stop/start at a time within the boundaries of that period? Do you have the sales figures for all the countries your are interested in, and all the product types?

You can greatly increase the validity and reliability of your use of secondary data if you triangulate with another research method. For example if you are seeking insights into a period of change within a company, you can use documentary records to compare with interviews with key informants.

Examples

"Leading beyond tragedy: the balance of personal identity and adaptability" (Leadership & Organization Development Journal, Vol. 26 No. 6) is a case study of the Norwegian company Wilhelmson's Lines loss of key employees in a plane crash, and uses archival research along with on-site interviews and participant observation as the tools of case study analysis.

"The human resource management practice of retail branding: an ethnography within Oxfam Trading Division" (International Journal of Retail & Distribution Management, Vol. 33 No. 7) uses an ethnographic approach and includes scanning the company intranet along with participant observation and interviews.

Quantitative or qualitative?

Documentary data can be used as part of a qualitative or quantitative research design.

Much data, whether from company archives or from published data sets, is statistical, and can therefore be used as part of a quantitative design, for example how many sales were made of a particular item, what were reasons for absenteeism, company profitability etc.

One way of using secondary data in quantitative research is to compare it with data you have collected yourself, probably by a survey. For example, you can compare your own survey data with that from a census or other published survey, which will inevitably have a much larger sample, thereby helping you generalise, and/or triangulate, your findings.

Textual data can also be used qualitatively, for example marketing literature can be used to as backup information on marketing campaigns, and e-mails, letters, meeting minutes etc. can throw additional light on management decisions.

Content analysis is often quoted as a method of analysis: this involves analysing occurrence of key concepts and ideas and either draw statistical inferences or carry out a qualitative assessment, looking at the main themes that emerge.

Gaining access to, and using, archives

Archives may be found in national collections, such as the UK's Public Record Office, or as smaller collections associated with national, local or federal government organisations, academic libraries, professional or trade associations, or charities; they may also be found in companies. The latter are generally closely controlled; the former are most likely to be publically available. This page gives a brief overview of how to gain access to archival collections, and what you can expect when you get there.

Preparation

An archival collection, even an open one, is not like a library where you can just turn up. You need to establish opening hours, and then make arrangements to visit.

It is best to write ahead explaining:

Your project
Precisely what it is you are looking for.

In order to be clear about point 2, you will need to know not only the precise scope of your research but also how this particular collection can help you. You will therefore need to spend time researching (perhaps more than one) collection, so make sure that this is allowed for in your research plan.

You also need to understand the key difference between libraries and archives:

Archives are collections of unpublished material, housed in closed stacks, organised according to the principles of the original collector. You can only access the material in situ, and you will need to handle the collection with special care.
Libraries contain published material, in open stacks, classified according to a particular system, and you may be able to take the material out on loan.

Locating sources

Bibliographic databases are good sources for finding archival collections: you can search by subject, keyword, personal or geographical name. Whilst not containing records of each item, catalogue records of archival collections are generally lengthier than for published materials and may include a summary of materials contained in the collection.

More detailed information about the collection, usually at the level of the box or folder, is found in Finding Aids.

You can find suitable databases through your library's Subject Guides.

Gaining access to commercial collections

As indicated above, commercial archival or document collections are more tightly controlled than public ones, access to which will depend upon a clearly stated request and proof of identity.

Commercial sources, by contrast, may require more negotiation, and more convincing, because of the perceived sensitivity of their material and the fact that they exist for their customers and shareholders, and not as an archival collection. Companies understandably count the opportunity cost of time spent "helping a researcher with their enquiries", not to mentioning opening up possibly sensitive documents to the prying eyes of an outsider.

This can cause problems to the researcher because if the research project is based on one or a few companies, if access is denied then the overall validity of the research will be prejudiced. Given the likelihood that other research methods, such as interview, survey etc. are also being used, it is best to approach access in the widest sense, and stress the benefits to the organisation, the credibility of the researcher, and assurance of confidentiality.

How to...Use secondary data & archival material

On this page

What is secondary data & archival material?

Primary & secondary data

Types of secondary data

When and why to use secondary data

Example

Advantages and disadvantages of secondary data collection

Using published data sets

What are they?

Key considerations

Sources

Using archival data

What are they?

Examples

Example

Key considerations

Example

Secondary data as part of the research design

Theoretical framework

Reliability and validity

How reliable is the data?

Measurement validity

Coverage

Examples

Quantitative or qualitative?

Gaining access to, and using, archives

Preparation

Locating sources

Gaining access to commercial collections

How to...
Use secondary data & archival material