ee.Clusterer.wekaKMeans

Cluster data using the k means algorithm. Can use either the Euclidean distance (default) or the Manhattan distance. If the Manhattan distance is used, then centroids are computed as the component-wise median rather than mean. For more information see:

D. Arthur, S. Vassilvitskii: k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.

UsageReturns
ee.Clusterer.wekaKMeans(nClusters, init, canopies, maxCandidates, periodicPruning, minDensity, t1, t2, distanceFunction, maxIterations, preserveOrder, fast, seed)Clusterer
ArgumentTypeDetails
nClustersIntegerNumber of clusters.
initInteger, default: 0Initialization method to use.0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first.
canopiesBoolean, default: falseUse canopies to reduce the number of distance calculations.
maxCandidatesInteger, default: 100Maximum number of candidate canopies to retain in memory at any one time when using canopy clustering. T2 distance plus, data characteristics, will determine how many candidate canopies are formed before periodic and final pruning are performed, which might result in exceess memory consumption. This setting avoids large numbers of candidate canopies consuming memory.
periodicPruningInteger, default: 10000How often to prune low density canopies when using canopy clustering.
minDensityInteger, default: 2Minimum canopy density, when using canopy clustering, below which a canopy will be pruned during periodic pruning.
t1Float, default: -1.5The T1 distance to use when using canopy clustering. A value < 0 is taken as a positive multiplier for T2.
t2Float, default: -1The T2 distance to use when using canopy clustering. Values < 0 cause a heuristic based on attribute std. deviation to be used.
distanceFunctionString, default: "Euclidean"Distance function to use. Options are: Euclidean & Manhattan
maxIterationsInteger, default: nullMaximum number of iterations.
preserveOrderBoolean, default: falsePreserve order of instances.
fastBoolean, default: falseEnables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances
seedInteger, default: 10The randomization seed.