WEBSRC401 Dataset


Description

WEBSRC401 is a SRC dataset based on the ClueWeb09 Category B text collection (CCB) and TREC Web Track 2012.
Instead of retrieving relevant Web pages, we are interested in obtaining relevant clusters. So, we transformed the data available in the TREC Web Track 2012 into a typical SRC format, following the steps:
Content

WEBSRC401 is released as a package that contains five files and one folder:
Download

WEBSRC401.zip

Evaluation Tools

To replicate the experiments, we suggest the following evaluation tools: Paper