Product Information:-

  • Journals
  • Books
  • Case Studies
  • Regional information
Request a service from our experts.

Institutional repositories

Options:     PDF Version - Institutional repositories  Print view

Creating an institutional repository

Shirley Yearwood-Jackman, who is IR librarian at the University of Liverpool, maintains that her role is both technical, dealing with the look and functionality of the repository, and marketing, persuading people to supply content to upload.

Requirements analysis

As with all pieces of software, it is important first to analyse the requirements:

  • Policies must be established: from whom will you collect material?
  • What type of document – just research output, or other documents?
  • Published items only, or grey literature as well?
  • Will you require full text or will you accept metadata only in some instances, as when an author does not have permission for reuse of an article from the publisher? (Note that in the latter instance some software provides for a button to request a copy of the paper from the author.)

Examples of IR policies

At Hong Kong University of Science and Technology, the policy is to limit coverage to published material and grey literature, and eschew that which is ephemeral, such as course notes, popular works, or newspaper articles (Lam and Chan, 2008).

UPSpace is the IR of the University of Pretoria Library in South Africa. It accepts all kinds of research output created by researchers who are members of the University. Where a faculty/department is the publisher of a journal (for example, Verbum at Ecclesia), they also host these journals in e-format, including articles by non-UP members in order to keep the online version complete, along with a letter of consent from the editor. The same applies to conferences hosted by faculties/departments within the University of Pretoria. All types of document are accepted: for example, research articles (in line with publishers' policies), mini-dissertations, dissertations, theses, research reports, images/photos, video clips/vidcasts, sound clips/podcasts, e-books (born digital and originally print), conference papers, proceedings, posters, 3D images, and data sets.

The DRIVER (Digital Repository Infrastructure Vision for European Research) Project is a European-wide initiative which seeks to create a European-wide infrastructure for repositories. It will only accept full text content.

Each data type will, according to Shirley Yearwood-Jackman, need its own business process:

"You need to match the content to the process, and understand how that particular content needs to work for the organization. All stakeholder requirements must be satisfied."

For example, a thesis will require approval by the university before going live.

Acquiring and loading content

Many repositories will begin small, perhaps running a pilot with a few departments, or picking "low hanging fruit" from departmental web pages.

When Hong Kong University of Science and Technology launched its IR in 2003, librarians took a proactive approach, harvesting research papers on personal, departmental or campus-based research institute web pages, as well as searching open access databases (Lam and Chan, 2008).

The responsibility for content, however, must rest with the academic: getting their involvement is a matter of advocacy, which is explored in Part 4.

Choice of software

Part of the impetus behind the IR movement is the existence of open source software that is relatively easy to install, maintain and customize. The three main players are:

  1. DSpace, originally developed by MIT and Hewlett-Packard,
  2. EPrints, originally developed by the University of Southampton, and
  3. Fedora, originally developed at Cornell University.

DSpace is probably the most popular piece of open source software, appreciated for its good web interface, and the way it allows for hierarchical organization, by community (department) and collection (subset of the department).

According to Ina Smith, digital research repository (UPSpace) manager and e-application specialist at the University of Pretoria Library, DSpace:

" ... allows you to group your research output according to your institution's needs. In our case UPSpace is the umbrella IR, which contains an individual IR for each faculty which they can link to, and within each faculty each department has its own little IR to which they can link".

Image: Figure 2. Screenshot of the home page of UPSpace.

Figure 2. Screenshot of the home page of UPSpace


Smith also mentions DSpace's focus on preservation, and its strong search function, down to an individual PDF.

Also popular is EPrints, particularly in British universities. For the universities of Liverpool and Bournemouth, a major advantage was that they could get support from its developers at the University of Southampton.

The University of Bournemouth Library found it useful to be able to work with EPrints' developers to create customized features in their repository, Bournemouth University Research Online (BURO). For example, they created a browse by author feature, which enables authors to find themselves easily. They also created a tool to help academic staff become autonomous users and update their own items.

Image: Figure 3. Screenshot of BURO's browse by author feature.

Figure 3. Screenshot of BURO's browse by author feature


There are, however, some pieces of IR software that are not open source, for example, Digital Commons, which has been used by a number of academic institutions, including the University of California.

The Catherwood Library at Cornell's School of Industrial and Labor Relations (ILR) used Digital Commons for its repository, DigitalCommons@ILR, despite the fact that Cornell was already using DSpace, because of the good training and technical backup from ProQuest (Cohen and Schmidle, 2007), and the need to match the branding of the ILR School. The cleanness of the interface is impressive.

Image: Figure 4. Screenshot of the home page of DigitalCommons@ILR.

Figure 4. Screenshot of the home page of DigitalCommons@ILR


Metadata and search

For any database, whether of an institution or a publisher, to be of any use, its items must be easily retrievable. To ensure this, each item entered must have the relevant metadata – for example, title, date created, author/creator, type (article, pre-print, podcast, thesis, etc.), keywords, language, etc.

In order to maximize search and retrieval, metadata conventions need to be compliant with the Open Access Initiative Protocol for Metadata Harvesting, or OAI-PMH, in order to be searchable in collections of repositories, and Dublin Core, which is a more general set of interoperability conventions.

IRs need to be searchable in academic search engines such as Scirus, Elsevier's free search engine for scientific information, Google Scholar, and OAIster. They should also be registered with the directories: OpenDOAR, the Register of Open Access Repositories (ROAR), as well as DRIVER.

According to Ina Smith of the University of Pretoria, the metadata in their repository is of very high quality since metadata editors also form part of the electronic workflow assigning Library of Congress subject headings and adding more value to each record. Thus complementing the work already done by publishers, and making the item even more retrievable.

Image: Figure 5. Results from a Google Scholar search, showing the repository entry for the item listed higher than the same item in the publisher's database.

Figure 5. Results from a Google Scholar search, showing the repository entry for the item listed higher than the same item in the publisher's database



One of the difficulties for those maintaining institutional repositories is that many of the most valuable items, i.e. articles published in peer review journals, are owned not by the academic, but the publisher. Authors are usually required to sign a copyright transfer agreement, which defines and restricts their rights to re-use their work.

Recognizing the inevitable move to open access, publishers have become more compliant about allowing authors to archive in institutional repositories, but the problem lies in the variation of their policies. SHERPA RoMEO is a useful source of information about publishers' policies. It divides publishers into colours:

  • white for publishers who do not encourage archiving,
  • yellow for those who only allow archiving of pre-print,
  • blue for post-print,
  • green for pre- and post-print. (Pre-print is generally taken to refer to pre-refereed papers, post-print those revised after being refereed.)

Emerald is RoMEO green, but requests that the author use their own version rather than the publisher's PDF.

Resource issues

Institutional repositories have considerable resource implications, which must be considered when making a case for funding. Not only is there the initial technical expertise needed to set up the repository, there is also the ongoing work of liaising with faculty, acquiring and processing the documents, including checking permissions, and inputting data.

The Catherwood Library serves the needs of Cornell's School of Industrial and Labor Relations (ILR). Its repository, DigitalCommons@ILR, required a large investment of library staff resources. A web and digital projects manager chose the software platform and then oversaw all aspects of the repository, with the support of 1.5 full-time equivalent support staff, one to upload the documents and enter the metadata, and one (part-time) to check copyright permissions. In addition, library staff were assigned to different ILR school groups, and the ILR web team (separate from the Library) advised on technical and design issues (Cohen and Schmidle, 2007).

Some libraries outsource the more complex technical aspects of the work, others use the expertise of their web team, others may hire in a person with the necessary knowledge, on a short- or long-term basis.

One way around the cost of getting an IR up and running is to band together with other institutions. Many libraries have formed themselves into consortia, and this way they can get better deals with publishers, and also provide shared services such as a central location for digital projects.

The University of New Orleans (UNO) is in Louisiana, a fairly poor state even before the devastation of Hurricane Katrina, and funding digital platforms for libraries is challenging. However, the libraries were members of LOUIS, the Louisiana Library Network, a consortium of nearly 30 academic libraries. LOUIS had its own digital platform run from a server in Louisiana State University, and UNO was able to use this platform for their repository (Kelly, 2007).

Even more economies of scale can be gained by consortia building a joint IR. Guidelines, policies and expertise can be shared, and the repository will have more varied content and create possibilities of knowledge sharing across institutions. The drawback can be the difficulty of each institution maintaining its own identity.

ALADIN Research Commons is the shared IR of the Washington Research Library Consortium. It hosts scholarly or educational material from all member institutions. Figure 6 below shows the home page, with links to the various member institutions (Hulse et al., 2007).

Image: Figure 6. Screenshot of ALADIN Research Commons home page.

Figure 6. Screenshot of ALADIN Research Commons home page