By Margaret Adolphus
What are institutional repositories?
When asked about the future of libraries, David Nicholas, professor of library and information science at University College London, and editor of the journal Aslib Proceedings, suggested that librarians are moving into publishing to offset what he saw as their shrinking conventional role (Nicholas, 2007). Institutional repositories (IRs) can be seen as an aspect of that trend.
An IR is a collection, in digital form, of the research of a scholarly institution such as (commonly) a university: an ongoing archive of its intellectual capital.
Here are some definitions of IRs:
" ... a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution" (Lynch, 2003, quoted in Barwick, 2007).
" ... a digital archive of intellectual product created by the faculty, research staff, and students of an institution and accessible to end users both within and outside of the institution, with few if any barriers to access" (Rajshekar, 2005, quoted in Doctor, 2008).
IRs enhance the visibility both of the university and the individual scholar, who thereby stands to increase the amount of times he or she is cited, an all important measure of academic performance. The whole culture is one of openness, and making knowledge as freely and widely available as possible.
Concern that scholarly communication is dysfunctional, restricting access to the institutions who can afford to pay subscriptions to publishers' databases, has led to the open access movement, self-archiving and the growth of open access journals. IRs are a logical development and one that provides greater structure and searchability than just uploading documents to departmental web pages.
IRs contain a wide variety of document types, depending on the policy of the institution. Most common are the outputs of research: journal articles, pre-prints, conference papers, technical reports, working papers, theses, book chapters, computer programs, presentations, technical manuals, etc. Grey literature is as important as published outputs.
Some will also contain other items such as convocation addresses, student handbooks, as well as teaching materials: in fact, Tjiek (2007) quotes sources which suggest that a repository should be integrated with the university's course management system and display e-learning features. In practice, however, most institutions will probably be content to provide a basic repository which concentrates on research.
While most repositories are institution specific, some are subject-based, for example Saarland University's SciDok for science and PsyDok for psychology, see Herb and Muller (2008). Others include, or specifically concentrate, on local resources. Desa Informasi (Information Village) is an IR created by Petra Christian University in Surabaya, Indonesia, part of whose remit is to collect information about the locality (Tjiek, 2007).
IRs contain not only textual material (although on the whole, that predominates), usually in PDF format, but also digital images (for example, Desi Informasi contains photographs of old Surabaya) as well as audio and video.
Where are institutional repositories?
IRs are one of the biggest and fastest growing developments in scholarly communications, a worldwide movement. The project OpenDOAR seeks to provide a directory of repositories, and its database has greatly increased in the past couple of years, from approximately 1,000 at the end of 2007 to over 1,400 currently (July 2009).
Figure 1 below shows the distribution of IRs by continent. Almost half are in Europe; North America accounts for just over a quarter; while Asia is the largest of the other regions. In Europe, the country with the largest number of repositories is the UK, followed by Germany; in Asia, Japan has the most, followed by India; China has only seven, most of which are in Hong Kong (figures taken from the OpenDOAR website, July 2009)
Figure 1. IRs by continent
According to Chen and Hsiang (2009), and quoting research from 2005, over 90 per cent of universities in the US have either taken steps or are contemplating steps to construct a repository system, while almost every university in Germany, the Netherlands and Norway has some sort of IR.
Creating an institutional repository
Shirley Yearwood-Jackman, who is IR librarian at the University of Liverpool, maintains that her role is both technical, dealing with the look and functionality of the repository, and marketing, persuading people to supply content to upload.
As with all pieces of software, it is important first to analyse the requirements:
- Policies must be established: from whom will you collect material?
- What type of document – just research output, or other documents?
- Published items only, or grey literature as well?
- Will you require full text or will you accept metadata only in some instances, as when an author does not have permission for reuse of an article from the publisher? (Note that in the latter instance some software provides for a button to request a copy of the paper from the author.)
Examples of IR policies
At Hong Kong University of Science and Technology, the policy is to limit coverage to published material and grey literature, and eschew that which is ephemeral, such as course notes, popular works, or newspaper articles (Lam and Chan, 2008).
UPSpace is the IR of the University of Pretoria Library in South Africa. It accepts all kinds of research output created by researchers who are members of the University. Where a faculty/department is the publisher of a journal (for example, Verbum at Ecclesia), they also host these journals in e-format, including articles by non-UP members in order to keep the online version complete, along with a letter of consent from the editor. The same applies to conferences hosted by faculties/departments within the University of Pretoria. All types of document are accepted: for example, research articles (in line with publishers' policies), mini-dissertations, dissertations, theses, research reports, images/photos, video clips/vidcasts, sound clips/podcasts, e-books (born digital and originally print), conference papers, proceedings, posters, 3D images, and data sets.
The DRIVER (Digital Repository Infrastructure Vision for European Research) Project is a European-wide initiative which seeks to create a European-wide infrastructure for repositories. It will only accept full text content.
Each data type will, according to Shirley Yearwood-Jackman, need its own business process:
"You need to match the content to the process, and understand how that particular content needs to work for the organization. All stakeholder requirements must be satisfied."
For example, a thesis will require approval by the university before going live.
Acquiring and loading content
Many repositories will begin small, perhaps running a pilot with a few departments, or picking "low hanging fruit" from departmental web pages.
When Hong Kong University of Science and Technology launched its IR in 2003, librarians took a proactive approach, harvesting research papers on personal, departmental or campus-based research institute web pages, as well as searching open access databases (Lam and Chan, 2008).
The responsibility for content, however, must rest with the academic: getting their involvement is a matter of advocacy, which is explored in Part 4.
Choice of software
Part of the impetus behind the IR movement is the existence of open source software that is relatively easy to install, maintain and customize. The three main players are:
- DSpace, originally developed by MIT and Hewlett-Packard,
- EPrints, originally developed by the University of Southampton, and
- Fedora, originally developed at Cornell University.
DSpace is probably the most popular piece of open source software, appreciated for its good web interface, and the way it allows for hierarchical organization, by community (department) and collection (subset of the department).
According to Ina Smith, digital research repository (UPSpace) manager and e-application specialist at the University of Pretoria Library, DSpace:
" ... allows you to group your research output according to your institution's needs. In our case UPSpace is the umbrella IR, which contains an individual IR for each faculty which they can link to, and within each faculty each department has its own little IR to which they can link".
Figure 2. Screenshot of the home page of UPSpace
Smith also mentions DSpace's focus on preservation, and its strong search function, down to an individual PDF.
Also popular is EPrints, particularly in British universities. For the universities of Liverpool and Bournemouth, a major advantage was that they could get support from its developers at the University of Southampton.
The University of Bournemouth Library found it useful to be able to work with EPrints' developers to create customized features in their repository, Bournemouth University Research Online (BURO). For example, they created a browse by author feature, which enables authors to find themselves easily. They also created a tool to help academic staff become autonomous users and update their own items.
Figure 3. Screenshot of BURO's browse by author feature
There are, however, some pieces of IR software that are not open source, for example, Digital Commons, which has been used by a number of academic institutions, including the University of California.
The Catherwood Library at Cornell's School of Industrial and Labor Relations (ILR) used Digital Commons for its repository, [email protected], despite the fact that Cornell was already using DSpace, because of the good training and technical backup from ProQuest (Cohen and Schmidle, 2007), and the need to match the branding of the ILR School. The cleanness of the interface is impressive.
Figure 4. Screenshot of the home page of [email protected]
Metadata and search
For any database, whether of an institution or a publisher, to be of any use, its items must be easily retrievable. To ensure this, each item entered must have the relevant metadata – for example, title, date created, author/creator, type (article, pre-print, podcast, thesis, etc.), keywords, language, etc.
In order to maximize search and retrieval, metadata conventions need to be compliant with the Open Access Initiative Protocol for Metadata Harvesting, or OAI-PMH, in order to be searchable in collections of repositories, and Dublin Core, which is a more general set of interoperability conventions.
IRs need to be searchable in academic search engines such as Scirus, Elsevier's free search engine for scientific information, Google Scholar, and OAIster. They should also be registered with the directories: OpenDOAR, the Register of Open Access Repositories (ROAR), as well as DRIVER.
According to Ina Smith of the University of Pretoria, the metadata in their repository is of very high quality since metadata editors also form part of the electronic workflow assigning Library of Congress subject headings and adding more value to each record. Thus complementing the work already done by publishers, and making the item even more retrievable.
Figure 5. Results from a Google Scholar search, showing the repository entry for the item listed higher than the same item in the publisher's database
One of the difficulties for those maintaining institutional repositories is that many of the most valuable items, i.e. articles published in peer review journals, are owned not by the academic, but the publisher. Authors are usually required to sign a copyright transfer agreement, which defines and restricts their rights to re-use their work.
Recognizing the inevitable move to open access, publishers have become more compliant about allowing authors to archive in institutional repositories, but the problem lies in the variation of their policies. SHERPA RoMEO is a useful source of information about publishers' policies. It divides publishers into colours:
- white for publishers who do not encourage archiving,
- yellow for those who only allow archiving of pre-print,
- blue for post-print,
- green for pre- and post-print. (Pre-print is generally taken to refer to pre-refereed papers, post-print those revised after being refereed.)
Emerald is RoMEO green, but requests that the author use their own version rather than the publisher's PDF.
Institutional repositories have considerable resource implications, which must be considered when making a case for funding. Not only is there the initial technical expertise needed to set up the repository, there is also the ongoing work of liaising with faculty, acquiring and processing the documents, including checking permissions, and inputting data.
The Catherwood Library serves the needs of Cornell's School of Industrial and Labor Relations (ILR). Its repository, [email protected], required a large investment of library staff resources. A web and digital projects manager chose the software platform and then oversaw all aspects of the repository, with the support of 1.5 full-time equivalent support staff, one to upload the documents and enter the metadata, and one (part-time) to check copyright permissions. In addition, library staff were assigned to different ILR school groups, and the ILR web team (separate from the Library) advised on technical and design issues (Cohen and Schmidle, 2007).
Some libraries outsource the more complex technical aspects of the work, others use the expertise of their web team, others may hire in a person with the necessary knowledge, on a short- or long-term basis.
One way around the cost of getting an IR up and running is to band together with other institutions. Many libraries have formed themselves into consortia, and this way they can get better deals with publishers, and also provide shared services such as a central location for digital projects.
The University of New Orleans (UNO) is in Louisiana, a fairly poor state even before the devastation of Hurricane Katrina, and funding digital platforms for libraries is challenging. However, the libraries were members of LOUIS, the Louisiana Library Network, a consortium of nearly 30 academic libraries. LOUIS had its own digital platform run from a server in Louisiana State University, and UNO was able to use this platform for their repository (Kelly, 2007).
Even more economies of scale can be gained by consortia building a joint IR. Guidelines, policies and expertise can be shared, and the repository will have more varied content and create possibilities of knowledge sharing across institutions. The drawback can be the difficulty of each institution maintaining its own identity.
ALADIN Research Commons is the shared IR of the Washington Research Library Consortium. It hosts scholarly or educational material from all member institutions. Figure 6 below shows the home page, with links to the various member institutions (Hulse et al., 2007).
Figure 6. Screenshot of ALADIN Research Commons home page
Despite the hype about open access, some academics may hold back through fear of upsetting good relationships with their publishers, and reluctance to take on yet another administrative task. So librarians must not only build the IR, but also sell it across their institution.
Emma Crowley is the IR manager of The Sir Michael Cobham Library's research repository at Bournemouth University. Playing a major part in the winning of the outstanding library team award at the Times Higher Education (THE) Leadership and Management Awards in 2009, BURO is the 12th largest repository in the UK out of a total of 85 listed in OpenDOAR, with over 7,400 items currently – "we punch well above our weight".
A library initiative, BURO was extensively marketed to Bournemouth University's various schools. The deans and deputy deans of research and enterprise were approached to act as advocates, and there were also roadshows and hands-on workshops, and library staff were employed to help enter material for the Research Assessment Exercise and the Research Excellence Framework.
Most IR librarians find it important to work closely with individual departments, which means that the best model of content submission can be developed on a local basis.
Shirley Yearwood-Jackman currently works closely with five departments in the University of Liverpool: she finds the partnership model works best because each department has its own challenges, and therefore needs individual solutions:
"We create a dialogue with the department about e-content. We talk to them about how they manage their own research both from a departmental and an individual perspective. In so doing, we can make them aware at what point in the scholarly communications cycle you will need to get access to the article so that they keep the right version. We can also make them aware of open access funding, and of the funding councils' policies on open access – and help them to see that this is a worldwide movement, and not one just confined to Liverpool."
Having a single point of contact between the repository and the library is helpful: it could be a departmental administrator and research assistant, for example, who can be trained to upload material.
If there is a form for submission, it helps if it is as simple as possible. Emma Crowley reckons that submitting an item to BURO should not take more than ten minutes. It requires the item's title, the author's name, e-mail address and department, and, for a journal article, basic metadata information and the all important digital object identifier (DOI).
However, Crowley finds it important for a library staff member to check the item to ensure bibliographic accuracy and consistency, and Jackman always checks copyright transfer agreements to ensure compliance with publishers.
Getting academics to self-archive can be easier if there is a mandate, but even here compliance cannot be assumed, and persuasion and tact is necessary at all times.
When Queensland University of Technology in Australia mandated staff to put material in its repository, the Library offered sessions on the new policy, advised on copyright, and explained the process of depositing material. As an incentive, when a researcher's total reaches a major milestone, he or she is sent a congratulatory e-mail (Cochrane and Callan, 2007).
The benefits of repositories
Visibility and administrative convenience are key benefits of repositories. Having an IR showcases the institution's research. Jayakanth et al. (2008) point out that one of the benefits of this increased visibility is that research is available more widely, and quickly than by the normal publishing route. This in turn increases impact, and citations – a key metric in the research league tables.
Bournemouth University Library invested in BURO in order to support the University's research, one of its four core missions, and such was its success that it was chosen as a case study on data collection for the forthcoming Research Excellence Framework. This fact contributed to the Library's THE award.
Ina Smith of the University of Pretoria library believes that UPSpace can possibly result in an increase in citations:
"better rankings for our individual researchers (H-index, etc.), as well as for our University on Webometrics, the Shanghai University list of top universities, etc."
Studies still need to be conducted to prove this, though.
However, there is also a moral incentive behind the idea of repositories: increased visibility and open access. In some cases, this can just mean making the research available earlier, and providing another opportunity to view. But to poorer institutions, who cannot afford publishers' subscriptions, it can often provide the only means of access.
According to Ina Smith:
"Often researchers publish in journals/databases to which we don't have access. The research is often funded by the University and tax payers' money. It is not fair that they do not have access to their own research. We want to take back ownership of our research output. Research output is much more visible since we provide open access, it helps increase the profile of researchers and the University. Open access results in our research being used by more people (even poor institutions who cannot subscribe to the very expensive databases can have access), and this results in our research being cited even more."
Similarly, researchers in emerging economies may be poorly represented in scholarly journals, so IRs provide a vital window for them to showcase their research (Jayakanth et al., 2008).
And the visibility of repositories can also help with outreach and connecting the library with its community, as in the case of the Surabaya Memory collection at Desa Informasi in Indonesia, to which local people have supplied items (Tjiek, 2007).
IRs also provide researchers with a central location in which they can deposit their research, along with its metadata, so that they can readily extract data in an appropriate format, and have the latest version available.
According to Shirley Yearwood-Jackman:
"Many of us believe that the way forward for IRs is single deposit, multiple use, being able to deposit your content in a repository and use it in varying ways: the information is there and can be extracted in an appropriate format."
At Queensland University of Technology, researchers can have their own page in the repository. This not only provides them with their own personal showcase, but also a central location to store their research so that they can easily extract information for grant applications or research reporting requirements (Cochrane and Callan, 2007).
Do repositories make publishers redundant?
"Scientists cannot and do not want to abandon the publication of their articles in scientific journals. The main reason for this is the peer review process guaranteed by the publishers. This quality control and the attendant gain in reputation linked to this process provides an incentive for academic authors to publish preferably in such journals. Repositories – especially the multidisciplinary repositories of universities – cannot offer a comparable mechanism for quality control" (Herb and Muller, 2008).
While institutional repositories may appear on the SWOT (strengths, weaknesses, opportunities and threats) threats of publishers, the above quote makes it clear that the threat cannot be that great, simply because the former is a different animal to the latter.
Publishing is essentially about quality control, which is the expertise of publishers. The situation is summed up as follows by Rebecca Marsh, publishing director of Emerald:
"Institutional repositories seek to be a complete holding of an institution's research: everything from reports and conference papers to journal articles and book chapters, and so there is a lot of grey literature as well as published content. What publishers add in the information chain is the management of the peer review, the administration of the publication process, and continued investment in new ways of delivering and disseminating knowledge. The core activities for an academic are research and teaching; therefore, the coordination of the quality control process is not their main priority. For us as publishers, it is our business; we invest in it and constantly look at ways to innovate and develop the publishing processes. So I don't think repositories present a great threat to publishers and we can work alongside one another quite effectively."
We have seen how institutional repositories can provide an elegant solution, both to the wider academic community by providing a timely distribution mechanism which can increase impact, and to that of the institution by offering a central place to park research in a way that can easily be retrieved. But academic journals are by their very nature communities of researchers working across institutions. Without peer review, the quality would not be assured, and the whole point of the enterprise lost. The scholarly world needs publishers to work together with IRs to ensure that research has the impact it deserves.
Barwick, J. (2007), "Building an institutional repository at Loughborough University: some experiences", Program: electronic library and information systems, Vol. 41 No. 2, pp. 113-123.
Chen, K. and Hsiang, J. (2009), "The unique approach to institutional repository: Practice of National Taiwan University", The Electronic Library, Vol. 27 No. 2, pp. 204-221.
Cochrane, T. and Callan, P. (2007), "Making a difference: implementing the eprints mandate at QUT", OCLC Systems & Services, Vol. 23 No. 3, pp. 262-268.
Cohen, S. and Schmidle, D. (2007), "Creating a multipurpose digital institutional repository", OCLC Systems & Services, Vol. 23 No. 3, pp. 287-296.
Doctor, G. (2008), "Capturing intellectual capital with an institutional repository at a business school in India", Library Hi Tech, Vol. 26 No. 1, pp. 110-125.
Herb, U. and Muller, M. (2008), "The long and winding road: Institutional and disciplinary repository at Saarland University and State Library", OCLC Systems and Services, Vol. 24 No. 1, pp. 22-29.
Hulse, B., Cheverie, J.F. and Dygert, C.T. (2007), "ALADIN Research Commons: a consortial institutional repository", OCLC Systems & Services, Vol. 23 No. 2, pp. 158-169.
Jayakanth, F., Minj, F., Silva, U. and Jagirdir, S. (2008), "[email protected]: India's first and fastest growing institutional repository", OCLC Systems and Services, Vol. 24 No. 1, pp. 59-70.
Kelly, J.C. (2007), "Creating an institutional repository at a challenged institution", OCLC Systems & Services, Vol. 23 No. 2, pp. 142-147.
Lam, K-T, and Chan, D.L.H. (2008), "Building an institutional repository: sharing experiences at the HKUST Library", OCLC Systems & Services; Vol. 23 No. 3, pp. 310-323.
Nicholas, D. (2007), "Meet the editor of ... Aslib Proceedings", Emerald, UK.
Tjiek, L.T. (2007), "Desa Informasi: a virtual village of 'new' information resources and services in Indonesia", Program: electronic library and information systems, Vol. 41 No. 2, pp. 276-290.
I would like to thank the following people for their help with this article:
- Emma Crowley, subject librarian – conservation sciences, BURO (institutional repository) manager, The Sir Michael Cobham Library, University of Bournemouth.
- Rebecca Marsh, publishing director, Emerald Group Publishing Limited.
- Ina Smith, digital research repository (UPSpace) manager and e-application specialist, Department of Library Services, University of Pretoria.
- Shirley Yearwood-Jackman, institutional repository librarian at the University of Liverpool.