Information seeking: the advantages and disadvantages of web tracking
Web information seeking behaviour is considered to be a subsection of the umbrella term information behaviour, which includes information seeking, information searching and information retrieval. Web tracking, also known as web log analysis, web logging, and web log file analysis seems to be very popular for information seeking.
Web tracking is machine-generated records of users' activities when interacting with a web space such as a web page, website, database, library catalogue, discussion group, search engine, intranet, electronic book, electronic journal, electronic newspaper or portal. Activities may include information searching, information seeking, the capturing of information, and the beginning and ending of search sessions.
The actual information collected by the logs depends partly on the software used and how the server was configured. Such information may include the length of search sessions, referral sources (URLs), search queries, rate that web pages were visited, how users visited old and new web pages, distance in terms of URLs between repeated web page visits, frequency of web page visits, extent of browsing in one cluster of web pages, repeated sequences of “path-following behaviour”, total number of hits, the most frequently requested documents, total number of unique addresses making requests, peak usage, number of user sessions and the system's responses. User activity is recorded on a real-time continuous basis.
Value of web tracking
Web tracking collects large amounts of data unobtrusively so that data can be analyzed at a later stage. It can collect data 24/7 without the researcher's presence and without interrupting the information seeker. It can collect data to generate a variety of statistics, for example on referral URLs, navigation moves, number of visits to a website, visitors to a website, general information seeking/searching behaviour, information channels used, and the users' interaction with the system.
Web tracking can:
- Help to identify areas for database or website maintenance. Failed searches for common terms may also reveal data entry errors in the bibliographic records.
- Yield information on the types of semantic relationships that users posit among terms and headings, which can add to the body of cataloguing and classification research. Subject access, and also the use of metadata can thus be improved.
- Be used to offer feedback to users on their use of the information retrieval system, which is especially important when information searching is part of their professional activities and when they need to improve their skills.
- Help to discover user needs by collecting data on the subjects and topics, and can get an idea of user searching patterns.
- Be used to reveal repetitive problems in patterns of searching.
- Collect data on the search length, the number of search terms used, the search techniques used.
Web tracking can also:
- Support the development and improvement of the user interface and software by noting navigation and browsing behaviour.
- Be used to test the efficacy of changes to the system, as well as user preferences for experimental changes.
- Be used to anticipate the evolution of system use and demands. Shifts in users' searching behaviour over time can be picked up. It can give an aggregate view of the use of system resources.
- Act as a decision-making tool for networks and consortia.
- Be used to monitor usage of particular resources/channels, or moves.
- Be used to improve the human (and system) understanding of how the systems are used by the information seekers.
- Give information on the use of menus and help screens.
- Give information on system response time and user “think time”.
- Provide useful information to site owners about sources of new visitors.
Whilst transaction log analysis has considerable value as a data collection method, it also has its limitations and it is best used in conjunction with a method which captures data regarding users' real information needs, comments and reactions whilst using a system.
“Existing metrics really just skate the surface and rarely deliver the quality information that web managers and sponsors are looking for to appraise their investments.”
Web tracking limitations
The main disadvantage of web tracking is that it only sheds light on the actual physical moves made by the information seeker. It cannot offer any information on the information seeker's real intentions, motivation, rationale for decisions, emotional experiences, or any background on personal characteristics, learning and personality styles. Existing metrics really just skate the surface and rarely deliver the quality information that web managers and sponsors are looking for to appraise their investments.
In addition to the value of web tracking, there are also limitations, such as:
- User groups are often undefined, without distinction between novice users and information intermediaries trained to use a system.
- Users' levels of information literacy, education and experience with the system or the subject domain of the search strategy are not indicated.
- Reasons for the search or search strategies are not indicated.
- Users' beliefs about the information retrieval system that are logical preconditions for a search, such as that the system is a possible place to find what is wanted, are not explored.
- The social aspects of search behaviour are not revealed.
- Transaction logs sometimes do not correlate with the users' observations of their behaviour. Logs may show results from the point of view of the system, but may not accurately capture the users' experience and perceptions. They can help to identify only certain types of errors.
- It is hard (or even impossible) to identify users' real information needs.
- Actual uses of search results are unknown.
- Boundaries of searches (starting and ending points) are unclear if public computer workstations are used. It is also difficult to identify individual search sessions.
- Users' perceptions of and satisfaction with their searches are not recorded, and the logs cannot measure the information needs that users are unable to express in their search statements.
- Statistics/data gathered should be interpreted with great care. A number of factors that can influence and skew the data have been discussed (e.g. internal site architecture).
Evaluating web tracking software
There are many commercial programs available, some of which are very expensive and can be difficult and sophisticated to handle and/or manage, for example, programs such as WebTrends live enterprise edition which can cost as much as a thousand US dollars a month. Others such as SurfSpy are inexpensive and easy to use, almost invisible, but the retrieved information is also a minimum. Software that has been noted as potentially useful for web information seeking studies include, amongst others:
- Lotus' ScreenCam (www.lotus.com/home.nsf/welcome/screencam).
- NetTracker (www.sane.com).
- SpeedTracker (www.speedtracker.de).
- Spotfire (www.spotfire.com).
- NetSnitch (www.netsnitch.com).
- SurfSpy (www.bysoft.se/sureshot/surfspy/index.html).
- WebTrends (www.netiq.com/webtrends/default.asp).
- Spector Pro 5.0 (www.spectorsoft.com).
- TheCounter.com (www.thecounter.com).
- DeepMetrix (www.deepmetrix.com).
When evaluating web tracking software you need to distinguish between software that monitors activities on individual computers, including computers connected to a LAN or company network (e.g. Spector Pro 5.0) and software monitoring activities on a specific website or browser server (e.g. TheCounter.com) or DeepMetrix). In the case of the latter the data is sometimes referred to as visitor intelligence.
Before applying criteria, the researcher should consider:
- What does he/she wants to do with the data collected (e.g. increase website use, increase visitors to the website)?
- The target group.
- Experiential situation (e.g. real-life or a controlled situation).
- Can all the data be collected through web tracking, or should it be complimented by other methods?
- Which problems are foreseen that might be addressed by the software?
- How much funds are available for purchasing the software? (Perhaps this should be the first question to ask).
This is a shortened version of “Information seeking: an overview of web tracking and the criteria for tracking software”, which originally appeared in Aslib Proceedings: New Information Perspectives, Volume 59 Number 3, 2007.
The authors are Ina Fourie and Theo Bothma.