Why collect usage statistics?
Usage is about how, and particularly how much, the resources and services a library has made available to its customers are being utilized. Whereas user studies look at human information behaviour, usage is more concerned with the resources and services themselves, and with obtaining quantifiable, statistical information.
Usage is much more easily measured in an electronic environment: users leave digital footprints which make it comparatively easy to see who has used what, and when. Vendors and publishers can provide information on use down to the level of page viewed, whereas in a print environment it is much more difficult to measure usage systematically and therefore obtain reliable performance data. Collecting usage statistics, or e-metrics as they are often known, has therefore become an important task for librarians, helping them to ensure that they are providing the right resources and getting good value out of their budget.
Why collect usage statistics?
Because so much user data are available from so many different sources, information overload is a real danger. It therefore helps to know the purpose for which you are collecting statistics. Is it to assess the value of a particular deal? To see the times of the highest usage? To observe the impact of marketing a particular resource? To justify expenditure?
Conyers (2006a and 2006b) presents a number of reasons for libraries to collect usage statistics:
- Because of the availability of reliable sources. Much work has been done in recent years by publishers and vendors to produce reliable, consistent and credible statistics. The organization COUNTER (Counting Online Usage of NeTworked Electronic Resources) has produced codes of practice for journals, databases, books and reference works with which more and more publishers are complying on an international basis.
- In order to assess the value of resources. When considering renewing an item, you can find out the level of use, and the cost per request. Particularly in an era of bundled deals, electronic resources do not come cheap, and it is important to know that they are being used. Do people really use a particular database in sufficient numbers and across the range of titles? If there are not that many users would the money be better spent on something else? If only selected titles are being used, and there is a long tail of unused titles, would it be better to negotiate a deal around those titles? Knowledge of how resources are used can help not only justify expenditure, but also inform purchasing decisions, and find better purchasing models.
- For the purposes of library promotion and publicity. If a particular resource has been heavily used, or if there has been a large percentage increase in, say, use of the library website or OPAC, letting people know is good public relations.
- To illustrate and examine trends. For example, are there fewer visits to the library, but more use of the website? Book issues going down, but full-text articles on the increase? For example, in their study of Newcastle University Library, Taylor-Roe and Spencer (2005) found usage statistics showed declines in interlibrary loans and photocopying of over 60 per cent, but a significant increase in full-text downloads.
- It can help plan infrastructure. If customers are still coming to the library, although in fewer numbers, we need to know why in order to provide appropriate resources: is it the computers, or the study space, which may elsewhere be at a premium?
- It can determine the level of help needed by users. It might be that a particular resource needs more publicity – do marketing students know about the acquisition of an e-textbook? What are the most popular ways of accessing resources – through the catalogue, the library web pages, etc.?
- Because it is a requirement. There is an "e-measures" question as part of the statistics which the Society of College, University and National Libraries (SCONUL) requires libraries to produce. In fact, the reason for collecting statistics is not purely to justify your own library’s performance, but to add to the wider picture of what libraries are doing both nationally and internationally. How do the metrics you produce for your library compare with those produced by other libraries? What are the emerging trends?
Publishers and other vendors also benefit from usage statistics in that the knowledge gained will help them adjust their marketing mix. For example they can experiment with new pricing models, assess what are the most productive distribution channels, make informed product development decisions and generally benefit from improved market analysis (Shepherd, 2006).
What statistics should be collected?
The Higher Education Funding Council for England funded e-measures project, part of whose remit was to develop statistical and performance indicators for electronic resources, defined six types of electronic service:
- digital documents
- visits to the library
- electronic enquiries.
Measures have been developed by COUNTER for the first three and the statistical reports required are outlined below. Guidance is given by SCONUL: see http://www.sconul.ac.uk/topics_issues/performance_improvement/ and particularly the related publication, SCONULguidance.doc.
COUNTER has produced a number of codes of practice on what usage information should be given, the latest being issued in March 2008 (in draft form). They are:
- Journal Report 1: Number of Successful Full-Text Article Requests by Month and Journal. (This includes whether the article was downloaded in PDF or HTML.)
- Journal Report 1a: Number of Successful Full-Text Article Requests from an Archive by Month and Journal
- Journal Report 2: Turnaways (unsuccessful logins to an electronic service due to exceeding the simultaneous user limit allowed by the licence) by Month and Journal
JR1, the number of successful full-text article requests, has come to be seen as a standard measure, and JR1a, full-text articles by archive, is a 2008 innovation.
E-metrics for journals have been around for a few years and are quite sophisticated and reliable. The problem lies in the amount of time these statistics take to compile, and difficulty in getting a complete picture, despite the greater degree of COUNTER compliance on the part of publishers (see part 3, "How to collect statistics, and where from"). Most libraries also use, in addition to databases from publishers, hosting services such as Ingenta, or gateways such as SwetsWise, which also produce e-metrics: some publishers include these statistics, others do not. Publishers themselves also frequently swap their titles around, which adds to the confusion.
The latest draft release of COUNTER standards was issued alongside those for journals, in March 2008. The reports required are:
- Database Report 1: Total Searches and Sessions by Month and Database.
- Database Report 2: Turnaways by Month and Database.
- Database Report 3: Total Searches and Sessions by Month and Service.
Libraries are now required to report separately those searches which go through federated (simultaneous search of multiple search engines) and other automated search engines.
If the purchase has been through a consortium, then both aggregated and individual reports are required, for both journals and databases.
As there is less COUNTER compliance than with journals, statistics are generally inconsistent and therefore less reliable.
COUNTER published its code of practice for online books and reference works in 2006, and so far this has not been updated. One problem in collecting e-metrics for online books is the lack of a standard basic unit: although most books are divided up into sections, chapters etc., the division is not as uniform as the article is for the journal. For example, some publishers digitize the whole book, whereas others provide separate files of individual chapters. There are five separate usage reports (Shepherd, 2006):
- Book Report 1: number of successful title requests by month and title.
- Book Report 2: number of successful section requests by month and title.
- Book Report 3: number of turnaways by month and title.
- Book Report 4: total searches and sessions by month and title.
- Book Report 5: total searches and sessions by month and service.
As with databases, data are often incomplete, and an additional problem is lack of standard terminology, although COUNTER tries to use standard terms, such as chapter, section, and entry (for a reference work).
Libraries may also wish to obtain statistics on other services they provide, for example workstations with access to the Internet.
An example of which is shown in the study of use of workstations to access the Internet in Conwy Public Libraries, Wales, provided as part of a UK government initiative. The method used was a self-completion questionnaire, and efforts were made to ensure participation across different age groups and a range of social classes. A significant increase in usage was reported from April 2003 to April 2004; most of the users, however, were already members of the library. The conclusion was that the library had not been totally successful in finding new users (Roberts and Evans, 2006).
How to collect statistics, and where from
The most usual source of statistics for journals, e-books and databases is the publisher or the vendor. Statistics are provided on a monthly basis and obtained through password controlled access to a website. As inevitably different sites will need to be accessed, the task of obtaining statistics involves a lot of work.
Sometimes, libraries may provide access to e-journals via a gateway service such as SWETSWise, which links with the relevant vendor site or service, or an aggregator such as ProQuest or Lexis Nexis which hosts content from multiple publishers. Such services will normally provide data on usage, giving rise to the problem mentioned above of the inconsistency with which publishers report these usage figures. The authentication service Athens also provides statistics, including ones about the user, for example whether he or she is on or off campus.
For visits to the library website, access to in-house digital documents, and electronic enquiries, web logging software is available. Library management systems will also provide usage information.
However, by far the biggest source of statistics will be those from publishers and vendors. The reliability of these has been greatly enhanced by the work of COUNTER, whose international standards for journals, databases and books are described in part 2, "What statistics should be collected?".
COUNTER’s work built on a number of initiatives. The International Coalition of Library Consortia was set up in 1996 by a group of library consortia in North America, Australia, Asia and Africa to develop guidelines for statistical measures. The Association of Research Libraries e-metrics project had similar aims for the USA, as did the e-measures project.
As more publishers have become COUNTER compliant, usage statistics have become more reliable: as of May 2008, 94 publishers, from both sides of the Atlantic, have signed up to JR1. Far fewer, however, have signed up to the book or database standards.
The Codes of Practice also specify how the data should be cleaned up (for example, only intended use must be recorded), and how often to report and in what form.
SCONUL provides a list of databases, serials and e-books, as well as a guide to the sources of statistics: http://www.sconul.ac.uk/topics_issues/performance_improvement/.
Obtaining existing statistics is not the only possible data collection method: some studies have used deep log analysis (records of when, and what, was viewed from the log data generated by the server), or surveys. Observe the following examples:
- The Lairah project looked at use of web-based resources by humanities researchers, using both deep log analysis and a questionnaire, the latter’s object being to compare what people thought they were doing with what the log reported that they did (Warwick et al., 2008)
- A web-based survey was carried out at the University of Denver Penrose library to assess knowledge about and usage of the latter’s extensive collection of e-books, and generated 2,067 responses (Levine-Clark, 2007).
- A survey of over 9,000 library users was carried out in four disparate academic health science libraries in the USA between 1999 and 2002, to establish among other things usage of electronic resources (Franklin and Plum, 2002).
Analysis of usage data
Statistics on their own can illustrate certain trends, but in order to gain a fuller picture, variables need to be analysed in greater depth, and compared against information from other sources. The NESLi2 study, funded by the UK Joint Information Systems Committee (JISC), is a good example of ways of carrying out such analysis. This project took usage data from four publishers participating in the NESLi2 license initiative, for 17 UK libraries in the years 2003 and 2004. The libraries were a representative selection of small, medium and large, from both pre- and post-1992 universities.
The basic unit of measure was the COUNTER JR1 report (the number of successful full-text article requests). In addition, further information was obtained about the cost of the deal, the list of subscribed titles, the number of full-time equivalent users, and the total library serials budget, and used to analyse the following variables (Conyers, 2005, 2006a and 2006b):
Usage range: title requests were sorted into the following groups, which enabled identification of high and low use titles:
- nil and low range (under 10 requests)
- medium range (10-99 requests)
- high range (more than 100 requests).
Price band: grouped as below, which enabled the relationship between usage and price band to be identified:
- low price (under £200)
- medium price (£200-£399)
- high price (£400-£999)
- very high price (over £1,000).
Subject category: these were divided into STM (science, technology and medicine), and HSS (humanities and social sciences).
Subscribed or unsubscribed: by identifying to which category titles belong, it is possible to find which are most used.
By manipulating these variables it was possible to assess value for money and in particular to look at the average cost of request (Conyers, 2006a, 2006b).
Unsurprisingly, the NESLi2 study found that STM titles were the ones with the highest use (Conyers, 2005). Incidentally, other studies show that humanities researchers tend to rely more on books, and on physical information retrieval methods – visiting the library in person, browsing the shelves and the catalogue (Levine-Clark, 2007, and Warwick et al., 2008).
Other findings from the NESLi2 study (many of which were mirrored in the study conducted at Newcastle University Library, see Taylor-Roe and Spencer, 2005) were:
- the number of full-text requests was higher in the larger, older universities;
- the cost per request was similar overall, and low in relation to interlibrary loans;
- a small percentage of titles generated the highest usage, and these were generally in the highest priced band, while the less used titles were normally in the low-priced or unpriced band;
- subscribed titles attracted a greater degree of use than unsubscribed titles;
- usage was affected by the time of the academic year, by promotional activities undertaken by the library, by use of MetaLib and SfX, and by publishing factors such as delay in acquiring a title;
- although there was a long tail of little used journals, many of these were in fact titles that might appear in publishers’ lists, but which had in fact either been deleted or would only be available in the future.
Perhaps the greatest value of detailed, and intelligently manipulated, usage statistics, is that they give a good picture across title and subject area, and can therefore be used to analyse critically the whole purchasing deal.
Newcastle University Library came to the conclusion that they needed an onion-shaped arrangement comprising a small core of big deal packages, the next layer being desirable subject clusters, then individual subscriptions, and finally, pay as you go. They also reported concern about the large tail of little used journals, speculating that a subject approach might suit them better. E-metrics also enabled them to analyse the success of piloted degrees, as when they are able to record levels of use of resources.
Other applications of usage data
The potential value of usage data lies beyond helping assess library performance and value for money. There have even been suggestions that they replace citations as a measure of journal esteem. The measure would be the usage factor, calculated as total usage from JR1 data divided by total number of articles published online.
Response to a UKSG sponsored study (Shepherd, 2007) was fairly positive to this suggestion, providing usage statistics were reliable (interestingly, the main argument in favour of citations is that the method is well established). Reedijk and Leiden (2008) also mention usage factors as a possible alternative, but one that needs more research.
Data about usage can also be used to enrich metadata, or descriptions of content which librarians use to facilitate search. Ferran et al. (2007) describe an experiment in which they examined raw server log data generated from students’ interactions with learning objects, using studies of information and navigational behaviour to generate data to describe content.
Conclusion and references
Usage data, when intelligently used and analysed, can provide highly useful information which can help libraries plan resources and services, assess value for money, and promote their services. However, usage data come at a cost: preparing usage reports from various publishers’ websites is very resource intensive, and requires quite a high level of IT skills.
One recommendation of the NESLi2 study was that "JISC should consider setting up a portal site for NESLi2 publishers to deposit their national NESLi2 COUNTER compliant usage statistics" (Conyers, 2005). While this has not happened, JISC has been fully supportive of COUNTER for NESLi2 deals, which has helped consistency.
There are a number of organizations providing support to libraries, notably:
- ScholarlyStats, which offers a single point of access to vendor usage statistics.
- The SUSHI protocol – Standardized Usage Statistics Harvesting Initiative – designed to help automate the data transfer from one system to another, see http://www.niso.org/workrooms/sushi, or https://www.scholarlystats.com/sstats/default.htm.
- Evidence Base, at Birmingham City University and the people responsible for the NESLi2 study, runs Measuring up: Analysing publisher deals 2, which helps libraries analyse usage statistic from some of the big deals.
So, there is help available – which is a good thing as usage statistics are likely to be a definite part of the library landscape for the foreseeable future.
Conyers, A. (2006a), "Building on sand? Using statistical measures to assess the impact of electronic services", Performance Measurement and Metrics, Vol. 7 No. 1, pp. 37-44.
Conyers, A. (2006b), "Usage statistics and online behaviour", The E-Resources Management Handbook, available from http://www.uksg.org/serials#handbook [accessed May 8 2008].
Conyers, A. and Dalton, P. (2005), "NESLI2 analysis of usage statistics", Evidence Base, available from http://www.ebase.bcu.ac.uk/projects/NESLi2.htm [accessed May 13 2008].
Ferran, N., Casadesús, J., Krakowska, M. and Minguillón, J. (2007), "Enriching e-learning metadata through digital library usage analysis", The Electronic Library, Vol. 2 No. 2, pp. 148-165.
Franklin, B. and Plum, T. (2002), "Networked electronic services usage patterns at four academic health sciences libraries", Performance Measurement and Metrics, Vol. 3 No. 3, pp. 123-133.
Levine-Clark, M. (2007), "Electronic books and the humanities: a survey at the University of Denver", Collection Building, Vol. 26 No. 1, pp. 7-14.
Reedijk, J. and Leiden, J. (2008), "Is the impact of journal impact factors decreasing?", Journal of Documentation, Vol. 64 No. 2, pp. 183-192.
Roberts, R. and Evans, G. (2006), "Users’ experiences of the People’s Network workstations in Conwy Public Libraries", Aslib Proceedings, Vol. 58 No. 6, pp. 537-552.
Shepherd, P. (2006), "COUNTER: Usage statistics for performance measurement", Performance Measurement and Metrics, Vol. 7 No. 3, pp. 142-152.
Shepherd, P. (2007), "The feasibility of developing and implementing journal usage factors: a research project sponsored by UKSG", Serials, Vol. 20 No. 2, pp. 117-123.
Taylor-Roe, J. and Spencer, C. (2005), "A librarian’s view of usage metrics: through a glass darkly?", Serials, Vol. 18 No. 2, pp. 124-131.
Warwick, C., Terras, M., Galina, I., Huntington, P. and Pappa, N. (2008), "Library and information resources and users of digital resources in the humanities", Program: electronic library and information systems, Vol. 42 No. 1, pp. 5-27.