Description

We created the data by sampling and processing the www.microsoft.com logs. The data records the use of www.microsoft.com by 38000 anonymous, randomly-selected users. For each user, the data lists all the areas of the web site (Vroots) that user visited in a one week timeframe. Users are identified only by a sequential number, for example, User #14988, User #14989, etc. The file contains no personally identifiable information. The 294 Vroots are identified by their title (e.g. "NetShow for PowerPoint") and URL (e.g. "/stream"). The data comes from one week in February, 1998.

Related Papers

  • Kristin P. Bennett and Erin J. Bredensteiner. Geometry in Learning. Department of Mathematical Sciences Rensselaer Polytechnic Institute. [link]
  • W. Nick Street and Yoo-Hyon Kim. A streaming ensemble algorithm (SEA) for large-scale classification. KDD. 2001. [link]
  • Dmitry Pavlov and Darya Chudova and Padhraic Smyth. Towards scalable support vector machines using squashing. KDD. 2000. [link]
  • Dmitry Pavlov and Jianchang Mao and Byron Dom. Scaling-Up Support Vector Machines Using Boosting Algorithm. ICPR. 2000. [link]
  • [link]
  • [link]

Related datasets