WEBSRC401 Dataset


WEBSRC401 is a SRC dataset based on the ClueWeb09 Category B text collection (CCB) and TREC Web Track 2012.
Instead of retrieving relevant Web pages, we are interested in obtaining relevant clusters. So, we transformed the data available in the TREC Web Track 2012 into a typical SRC format, following the steps:

WEBSRC401 is released as a package that contains five files and one folder:


Evaluation Tools

To replicate the experiments, we suggest the following evaluation tools: Paper