WIT Press


Statistical Analysis Of Pageviews On Web Sites

Price

Free (open access)

Volume

28

Pages

Published

2002

Size

550 kb

Paper DOI

10.2495/DATA020941

Copyright

WIT Press

Author(s)

P H A J M van Gelder, G Beijer & M Berger

Abstract

Pageview statistics are useful to describe and predict the behaviour of clients on internet sites. From a theoretical point of view, the number of pageviews during a day should be Poisson distributed. However, violation of stationarity assumptions causes other distribution types to fit pageviews data usually much better. In this paper a procedure is described that explains how to homogenise the data (with detrending techniques) and allows several distribution functions as possible candidates. A goodness-of-fit test will select the optimal distribution for the given dataset. In particular attention will be paid to the occurence probabilities of large numbers of pageviews on different types of slightly correlated websites. The paper furthermore presents models for giving forecasts on the number of pageviews during the rest of the day (given a number of pageviews earlier that day) and for giving uncertainty intervals with that forecast. 1 Introduction Pageview statistics are useful to describe and predict the behaviour of clients on internet sites. Typical questions that are related to visitor behaviour are the frequency and length of visits during a certain time period, the entrance and exit locations of visitors, the percentage of visitors who reach key pages (such as a sign-up page, cash register, etc), the paths they take, the traffic trend, the prediction of traffic spikes, the accommodation of server space for increased traffic, the adjustment for browser technology, the evaluation of behaviour variations among subsets of customers and the change during sales, etc, etc. However, these questions are difficult to answer because of the existence of several boundary conditions: human behaviour is very stochastic and data can be incomplete or noisy caused by the existence of proxy servers, fire walls, caching,

Keywords