c++ - How to create n-dimensional test data for cluster analysis? -
i'm working on c++ implementation of k-means , therefore need n-dimensional test data. beginning 2d points sufficient, since can visualized in 2d image, i'd prefer general approach supports n dimensions.
there an answer here on stackoverflow, proposed concatenating sequential vectors of random numbers different offsets , spreads, i'm not sure how create those, without including 3rd party library.
below method declaration have far, contains parameters should vary. can changed, if necessary - exception of data
, needs pointer type since i'm using opencl.
auto populatetestdata(float** data, uint8_t dimension, uint8_t clusters, uint32_t elements) -> void;
another problem came mind efficient detection/avoidance of collisions when generating random numbers. couldn't performance bottle neck, e.g. if one's generating 100k numbers in domain of 1m values, i.e. if relation between generated numbers , number space isn't small enough?
question how can efficiently create n-dimensional test data cluster analysis? concepts need follow?
it's possible use c++11 (or boost) random stuff create clusters, it's bit of work.
std::normal_distribution
can generate univariate normal distributions 0 mean.using 1. can sample normal vector (just create n dimensional vector of such samples).
if take vector n 2. , output a n + b, you've transformed center b away + modified a. (in particular, 2 , 3 dimensions it's easy build a rotation matrix.) so, repeatedly sampling 2. , performing transformation can give sample centered @ b.
choose k pairs of a, b, , generate k clusters.
notes
you can generate different clustering scenarios using different types of a matrices. e.g., if a non-length preserving matrix multiplied rotation matrix, can "paraboloid" clusters (it's interesting make them wider along vectors connecting centers).
you can either generate "center" vectors b hardcoded, or using distribution used x vectors above (perhaps uniform, though, using this).
Comments
Post a Comment