Product Information:-

  • Journals
  • Books
  • Case Studies
  • Regional information
Request a service from our experts.

The Semantic Web – a new tool for libraries?

Options:     PDF Version - The Semantic Web – a new tool for libraries?  Print view

By Margaret Adolphus

The Web is enormous – there are currently (as of December 2009) more than 20 billion pages, 230 million web servers and 681 million hosts (Hall and Shadbolt, 2009). It pervades every area of life: we use it for keeping up with old friends, buying presents, booking holidays, browsing library catalogues, reading academic journals, and much, much more.

And it keeps evolving, to the point where we can do more and more with it.

Initially, it was a collection of documents, so we used it to look up information and make purchases. Then along came Web 2.0 and we could upload our own content in the form of blogs, social networking sites, etc.

Now there is potential for the Web to be an even greater source of information. Imagine, for example, that I want to find out the names and locations of special schools near my home. This is what I can find by going to the portal, EduBase:

Image: Figure 1. Map of schools linked to postcode © Crown copyright all rights reserved (Ordnance Survey Licence number 1000384332009).

Figure 1. Map of schools linked to postcode © Crown copyright all rights reserved (Ordnance Survey Licence number 1000384332009)

Data from EduBase have been merged with that from the Ordnance Survey so I can tell exactly where the schools are in relation to my home; back on EduBase I can consult data on individual schools.

The Web has now evolved to the point where it is possible to extract information in a meaningful way to meet our immediate needs. This is what is meant by the Semantic Web – semantics being the science of meaning.

This is what Tim Berners-Lee said about the Semantic Web, at the turn of the millennium:

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and 'understand' the data that they merely display at present" (Berners-Lee et al., 2001; quoted in Macgregor, 2008).

The basic goal of the Semantic Web is to tighten the Web's structure and so create an even more powerful fabric of information. For example, if you are planning a trip to Paris, you will probably need to search several different sites for airlines, hotels, museums and other tourist attractions. Imagine if an intelligent software agent could do that for you – wouldn't that make the task simpler?

And the Semantic Web application EduBase helps you to consult nearby institutions in one operation, not two.

What makes all this possible is that the Web has evolved from being a collection of documents to one of data, so one can search over several databases, and combine datasets.

There is nothing new in combining different data: in 1854, London surgeon John Snow mapped data on cholera to that of water sources and discovered that the disease was waterborne, not airborne. What is new is the immediacy with which the Web is able to retrieve a large amount of information.

Data can be collected from anywhere on the Web, from any type of resource (publications, multimedia, databases or scientific study workflow, for example) and in any format (text, html, xml, Excel, etc.). A standardized structured syntax renders documents in different formats machine readable.

Content can be extracted because it is possible to retrieve not just pages, but objects, whether these be restaurants, schools, people, books, etc. In other words, immediate access to the "thing" itself, rather than to the page where the "thing" occurs.

All this has the potential for a greatly improved search experience. At the moment, search engines such as Google reach only the tip of the Web's iceberg; much lies hidden behind authentication walls in publishers' databases, at low levels in deep hierarchies, or in difficult to search formats such as text documents. (Read more about the invisible Web.)

Traditional information retrieval is also haphazard: appearance of sites depends not on their real relevance, but on their popularity, i.e. the number of hits and links. Moreover, search is based on keywords which may mean different things. For example, a search on "libraries in Brazil" may bring up a public library in the US rather than the country in South America.

The Semantic Web attempts to take some of the drudgery out of human search and hand it over to machines. Code enables pages to be read by machines and software agents search over multiple databases to extract information relevant to a very specific query.

Some of the principles of the Semantic Web are similar to that of Web 2.0; the latter uses informal "folksonomies" to tag objects, and you can create your own mashups of data. The difference, however, is that the Semantic Web uses standard language and vocabulary, whereas mashups may group different sites with different types of data, and personally selected tags are replaced by formal ontologies, ensuring consistency of use.

Semantic search is also less haphazard than keyword search, in that if terms with similar meanings are grouped together under an ontology, the results are more likely to be relevant.

Take for example the search query "telecom company" Europe director: a semantic search engine would search not only the actual words of the search string, but also other related terms, for example, different sorts of telecom companies, cities that were in Europe, and different sorts of director. So pages describing the appointment of a chief technical officer of a mobile company in London would be retrieved (Davies, 2009).