Data Set Generators - Validation On Artificial DataΒΆ

ClustEval can generate new data sets using archetypes of data sets, so called data set generators.

Each generator exposes a certain set of parameters that can be set by the user and that affect how the instance of the data set is generated in detail. For instance, each data set generator has to have a parameter for the number of objects that the resulting data set should contain. Another parameter, that is contained in some generators is the dimensionality of the data set.

Depending on the generator implementation, generators can also generate gold standards corresponding to a generated data set.

Furthermore, a generator should create appropriate configuration files for the generated data set and optionally the gold standard.

For a list of all available generators head over here.

Check Writing Data Set Generators for more information on how to extend ClustEval by generators.