An interview with David Nicholas

Interview by: Margaret Adolphus

Options:

Professor David Nicholas is the director of the School of Library, Archive and Information Studies at University College London (UCL), where he is also professor of library and information studies and director of the UCL Centre for Publishing and the research group Centre for Information Behaviour and the Evaluation of Research (CIBER).

His chief research interests are the digital consumer, the digital transition and the virtual scholar. Recently he has been involved with two major research projects that use deep log analysis (DLA) to provide detailed information-seeking portraits of user communities. He has also been editor of Emerald journal, Aslib Proceedings, since 1996. Margaret Adolphus interviewed Professor Nicholas at the Online Information 2007 conference.

From your talk at Online Information 2007, I detected a certain pessimism about libraries. What do you consider to be the main threats to librarians?

E-books are a big threat, because storing, conserving and issuing books is the big, high profile thing that librarians do. Once you take away books, especially textbooks for which there is the biggest demand, what do you do with all those huge spaces largely justified on the need for books and as somewhere to read them?

E-books are likely to have a much greater impact on the virtual scholarly community than e-journals because the biggest sector of that community – students – are more likely to use books, particularly textbooks, than journal articles, which are mostly used at postgraduate level and above. Undergraduates want consolidated information, not the latest research.

In the past libraries had a captive market, but now staff don’t go to libraries because they can get information at their desktop, and if students can get e-books on tap, they are not going to queue up in the library for textbooks on short loan, reference, etc. They hate all that, they want an open information world.

Another threat is proprietary databases: people read journals in Ebsco Host or other databases – my footprints appear in a publisher’s space such as Elsevier and not in a library space. OK, so librarians still negotiate licences and control access to databases, but I think most would want to do a bit more than negotiate deals, give out ID numbers and generally police the situation.

About ten years ago, everybody would have been talking about disintermediation at this conference. You don’t hear that any more, yet it has happened. There is a real danger that things will dissolve into a culture of Google Scholar and Amazon.

What I’m basically pointing out is a change in paradigm: have libraries woken up to the fact that most information has become totally virtual, and what happens to those huge physical spaces where information was stored, do they want them to become just a memorial to the past?

And we are in an environment where everyone is worried about costs. Vice chancellors make decisions based on outcomes, they are running a whole university and have many competing demands to consider.

How can librarians influence this situation?

By keeping close to their users, seeing what they do. I don’t think there’s one department of user studies in any university library in the UK. Why not? I bet there are departments of digital curation, institutional repositories and the like. Librarians have always been bad at user studies, it’s a familiar theme in the literature, almost as if admitting it means that you don’t have to do it. But everyone is saying, you can’t stop the train, you’ve got to watch where it’s going. Libraries are part of a consumer marketplace, they need to respond to their users. Can you imagine another industry, or even the Government, having so little information on its users? Look at supermarkets, Tesco for example, they find out whatever they can about the customer, and respond.

What about the role of librarians as teachers of information literacy?

That sounds good, and we would all like to believe that was a rich area to explore. But the message coming from all libraries is that kids think that finding information is simple, also they want simple information, served up in bite-sized chunks. So the task of libraries is to demonstrate that if people really do crack the literature, go through literacy programmes, know about the sources, what is authoritative and what isn’t, then they will get a better degree. There has to be a correlation, that’s why we are desperately in need of outcomes data, hard information which says that, if you attend this literacy programme, if you really search the library’s databases, and don’t just use Google, it will make a difference and you will end up with a higher grade. We need this data because you are not going to get funding for resources just because it sounds like a good idea.

Students too are interested in outcomes – will I get a cheaper deal? A better degree? Students ask me those sorts of questions all the time: they say, we are thinking of doing your module, how many people pass it? (Well, it depends how hard you work!) These are top students, but they are also consumers. Librarians have to fit into this consumer mindset, and not just maintain the library as a quiet place for study.

You said earlier that librarians need data. Can you tell us about your work on deep log analysis (DLA) and why it is better than other methods of data collection?

Together with colleagues from the Universities of Tennessee and North Carolina, CIBER took part in a three-year project,"Maximizing library investments in digital collections through better data gathering and analysis (MaxData)", which looked at three methods of data collection: library data analysis (obtained from vendors’ reports and standard library systems such as link resolver), DLA and surveys.

DLA uses data provided in raw transaction logs from library servers or publishers, and analyses it by SPSS, thus processing a huge volume of usage and search data, which can be related to user demographics to provide an extraordinarily rich portrait of the user and the way he or she behaves. Log reports provided by vendors and library systems such as link resolvers are summaries, rather than actual reports of log data, and can only show fairly standard analyses of page views and full text downloads, for example.

For the MaxData project, we led on the DLA, and analysed the usage logs from the OhioLink consortium (of libraries in the state of Ohio), gathering information on the number of pages viewed and when, number of sessions conducted per day, number of pages (site penetration) and journal viewed per session, time spent viewing online, type of page viewed (e.g. article, abstract, TOCs, etc.), top journals viewed, publication year of pages viewed, navigational approach used, subject of journals viewed, and type of user (i.e. staff, student, etc.)

What we are trying to do – and in this it’s similar to the evidence-driven tests that they do in medicine – is collect information on people’s behaviour. This leads to greater understanding and helps us ask the right questions. For example, why are departments not using this resource, is it because the library is not convenient? Why are other departments whacking the resources, is it because students have had digital literacy training?

Another question it helps to answer is, should we be spending money on resources such as the Emerald database? Vice provosts have this dilemma, should they spend money on the car park, or on providing more access at the library? They know what the car park delivers in raw terms, so the question becomes, what would the Emerald database deliver to my students that they wouldn’t otherwise get? How can we measure that those resources are better?

Has anybody ever done a study which says if you had no electronic resources, you would actually be ten times worse off? There’s a temptation to get into a sort of wishful thinking kind of scenario which says, it’s good for you. But now in education, they measure us. I can’t just say, hey, we’re UCL, we must be good, because they are looking for measurable outputs and inputs.

A study that we might be doing in future for the Research Information Network (RIN) will involve looking at the usage of a number of, say, departments of economics in universities throughout the UK who all use the same platform. We will then profile people’s information seeking, whether they are active or not, how many people are involved, do they look deeply, do they look through lots of journals, is their behaviour what you would expect? We could end up with a set of very different behaviours in a number of different departments. We might find one that conformed exactly to publishers’ and librarians’ expectations. We would then look at that department in more detail, and try and tie down a model of the best information seeking behaviour that can then be replicated in other departments. That way we can help people learn and not just leave things to the market.

What sort of information behaviour have you observed from your use of DLA?

We are trying to sort people into behavioural groups. For example, a bouncer is someone who lands on a page, and then bounces straight out again, doesn’t come back, and spends virtually no time on that page. The largest group of users in the population are bouncers. They are bouncing in to a site just to go somewhere else; they are promiscuous users, who keep wandering round the site, because they can’t make up their mind, or because Google has served them up with rubbish after they only put in one search term.

The opposite are persistent users, these are the people that go back to a resource, whack it, spend a lot of time on it, and do a lot of downloads. Persistent users are what most people would call core users. We can find out exactly who these core users are, whether they are students, or staff, whether they are physicists, social scientists etc., whether they are younger people or older people, women or men.

This is the sort of research that could make librarians very useful, empower them. They can no long claim to access the widest range of information – Google swallows them up on this count. OK, literacy training is fine, but it has to be proven to work!

How can a librarian start to use DLA?

Librarians already have access to usage data by means of COUNTER [a set of standards and protocols governing the recording and exchange of online data]. This data is very thin, it doesn’t tell you who uses the resources, but just that there’s some activity, so it’s a bit like saying we can feel a pulse, but we don’t know whose it is.

Hopefully DLA will receive some publicity through the SuperBook or the JISC national e-books project – seeing what we do with the data, other librarians will think, we need that data, can we do it ourselves? It’s not technically difficult; it’s just having that concentrated ability to look at usage data and to understand it.

Moving from an information providing model to an evaluation one allows the librarian to gain more powerful territory than simply trying to (poorly) replicate what Google Scholar does. But, to do this you have to have collaborative relations with publishers which extend beyond COUNTER compliance, and to remember that we’re all working together in this same crowded space, along with Google. The adversarial model which exists between publishers and librarians has to go, what we are trying to do is to break down barriers, as those barriers no longer exist.

You mentioned the SuperBook project earlier. Can you tell us about this?

SuperBook is a research project, funded by Emerald and Wiley, that looked at usage of e-books (Oxford Scholarship Online, Wiley and Taylor and Francis) in the UCL information environment, using DLA to observe user behaviour. Over a period of three months we measured number of pages viewed, number of sessions, view time, printing of pages, and looked at differences in patterns of use between different types of users and e-books, for example whether they accessed on or off campus, how they arrived at the site, where their computers were located, and the subject of the e-book. The main findings were that usage was high, with nearly 11,000 pages viewed, most users viewing a relatively high number of pages (higher than for e-journals), over two-thirds of usage taking place on site and concentrated on a few titles (over 12 per cent on two books and 43 per cent on 20 books), subject usage varied, possibly according to module timetables, older (three to six years from date of publication) e-books were as popular as current ones, and catalogued titles were most viewed.

What were the main conclusions of the SuperBook project?

The main conclusion is that we should do the study on a bigger scale! E-books will go like wildfire. I know, as an academic, that the first issue at the staff-student meeting is that there are not enough textbooks. That alienates the students immediately, they think, what kind of university are you running, you call yourself global, you are sixth in the world… We might say to them, hang on guys, you all want the same thing at the same time, but consumers don’t see it that way. They are e-shoppers and want to read the book at the time they want.

There’s an interesting theory at the moment that needs to be tested: the only serious reading anybody does in hard copy any more is novels on the beach on holiday or at Christmas. But at the moment, we are only looking at academic books and there’s no doubt it’ll go wild. At the moment, there’s a bit of a hold off because publishers are worried that if they publish e-books, people won’t buy the hard copies. But that will get sorted when people see what can be done with those books, the sheer explosion of learning and knowledge discovery through links to other books, websites and journal articles. So potentially, the impact of e-books will be even greater than that of e-journals. If I were a librarian, I would be getting close to e-books, that’s where the learning agenda is.

There’s also a lot of evidence of power browsing in the scholarly world: people collect, pick and mix. They see the Internet as a shopping mall, where they can move around, that’s why they are called navigators. All the data is pointing to people searching horizontally, not spending a lot of time, just picking up chunks, not reading deeply. That’s a total change in behaviour because in the past you didn’t have the choice, you could only go one way. Now, all the evidence suggests students don’t like going deep down.

I’ve been trying to puzzle this out, because it looks like dumbing down. There’s anecdotal evidence that students are now less information literate than they were. I’m reminded of watching my 16-year-old daughter watching the TV, and channel-hopping with the remote. I said to her, can’t you make up your mind what you are watching? She said, dad, I’m watching it all. She argues that she’s really good at picking what she wants, and when her attention span goes, she will move on to something else, so she’s always watching something that’s relevant to her. People behave in a similar way in the online environment, they want to see lots of different things, but they avoid reading.

And then there’s the whole question of plagiarism, people grabbing something and using it. Some people are bailing out completely, and saying, if they can find it, they must be good. I’m thinking, do I represent a different set of values, are the values I believe in defendable any more? I think it comes down to, OK, people can pick and mix, flick and bounce, but you are losing something, that deeper use.

Maybe people flick through until they find what they want, and deep read then?

Maybe this is what’s happening, but the jury’s out as to whether or not it’s happening with the student population. It’s the sort of question that we shall be looking at over the next couple of years with both e-journals and e-books. Librarians are worried that they could be out of a job, but some publishers have succeeded in the new virtual environment. An example of such success, and one which I would use as a model, is Oxford Scholarship Online: OUP has created a cross searchable full-text database from over 18,000 of its scholarly books, organized according to subject.

In the 1990s the issue was disintermediation, but now it’s e-books and we are close to the tipping point. Just to show you how serious it is, I went to the Frankfurt Book Fair in October of 2007 and talked to 200 chief executive officers of STM publishers about SuperBook. There was not one seat empty and not one person fell asleep. I thought, I don’t have to tell them it’s the tipping point, they know it is!

So things are happening but I don’t know what exactly. The only way to survive is to stay close to the user!

Editorial notes

Comparitech: guide to search engines for academic research can be found at http://compari.tech/academicresearch

Oxford Scholarship online can be seen at http://www.oxfordscholarship.com/oso/public/index.html.

More information about the SuperBook project can be found at http://www.publishing.ucl.ac.uk/superbook.html.

Visit http://www.jiscebooksproject.org/ to learn more about the JISC national e-books observatory project.