\(k\)が低い場合は、異なる初期値で k 平均法を複数回実行し、最良の結果を選択することで、この依存関係を軽減できます。 \(k\)が増加すると、より優れた初期重心を選択するためにk 平均法シードが必要になります。k 平均法シードについて詳しくは、M.Emre Celebi、Hassan A. Kingravi、Patricio A. Vela。
[null,null,["最終更新日 2025-02-25 UTC。"],[[["\u003cp\u003eK-means clustering is generally efficient and useful for large datasets, but it has drawbacks regarding its sensitivity to initial centroid values and difficulty handling varying data densities and outliers.\u003c/p\u003e\n"],["\u003cp\u003eGeneralizing k-means can improve performance on complex datasets with varying cluster characteristics, though this requires more advanced techniques.\u003c/p\u003e\n"],["\u003cp\u003eChoosing the optimal number of clusters (k) remains a manual process and significantly impacts the results.\u003c/p\u003e\n"],["\u003cp\u003eHigh-dimensional data can pose challenges for k-means due to the "curse of dimensionality," which can be mitigated using dimensionality reduction techniques like PCA and spectral clustering.\u003c/p\u003e\n"],["\u003cp\u003eOutliers can distort k-means results, suggesting pre-processing steps like outlier removal or clipping for improved performance.\u003c/p\u003e\n"]]],[],null,["# Advantages and disadvantages of k-means\n\nK-means is useful and efficient in many machine learning contexts, but has\nsome distinct weaknesses.\n\nAdvantages of k-means\n---------------------\n\nRelatively simple to implement.\n\nScales to large data sets.\n\nAlways converges.\n\nAllows warm-starting the positions of centroids.\n\nSmoothly adapts to new examples.\n\nCan be generalized to clusters of different shapes and sizes, such as elliptical clusters.\n\n### Generalizing k-means\n\nA straightforward implementation of k-means can struggle with clusters of\ndifferent densities and sizes. The left side of Figure 1 shows the clusters\nwe'd expect to see, while the right side shows the clusters proposed by k-means.\n**Figure 1: Ungeneralized k-means example.**\n\nFor better performance on imbalanced clusters like the ones shown in Figure 1,\nyou can generalize, that is, adapt, k-means. Figure 2 shows three different\ndatasets clustered with two different generalizations. The first dataset shows\nk-means without generalization, while the second and third allow for clusters to\nvary in width.\n**Figure 2: k-means clustering with and without generalization.**\n\nThis course doesn't cover how to generalize k-means, but those interested\nshould see [Clustering -- k-means Gaussian mixture\nmodels](http://www.cs.cmu.edu/%7Eguestrin/Class/10701-S07/Slides/clustering.pdf)\nby Carlos Guestrin from Carnegie Mellon University.\n\nDisadvantages of k-means\n------------------------\n\n\\\\(k\\\\) must be chosen manually.\n\nResults depend on initial values.\n\nFor low \\\\(k\\\\), you can mitigate this dependence by running k-means several\ntimes with different initial values and picking the best result. As \\\\(k\\\\)\nincreases, you need **k-means seeding** to pick better initial\ncentroids For a full discussion of k-means seeding, see\n[\"A Comparative\nStudy of Efficient Initialization Methods for the K-means Clustering\nAlgorithm,\"](https://arxiv.org/abs/1209.1960) by M. Emre Celebi, Hassan A. Kingravi, and Patricio A. Vela.\n\nDifficulty clustering data of varying sizes and densities without generalization.\n\nDifficulty clustering outliers.\n\nCentroids can be dragged by outliers, or outliers might get their own cluster\ninstead of being ignored. Consider removing or clipping outliers before\nclustering.\n\nDifficulty scaling with number of dimensions.\n\nAs the number of dimensions in the data increases, a distance-based similarity\nmeasure converges to a constant value between any given examples. Reduce\ndimensionality either by using\n[**PCA**](https://wikipedia.org/wiki/Principal_component_analysis)\non the feature data or by using **spectral clustering** to modify the clustering\nalgorithm.\n\n### Curse of dimensionality and spectral clustering\n\nIn these three plots, notice how, as dimensions increase, the standard deviation\nin distance between examples shrinks relative to the mean distance between\nexamples. This\nconvergence means that k-means becomes less effective at distinguishing between\nexamples as the dimensionality of the data increases. This is referred to as\nthe **curse of dimensionality**.\n**Figure 3: A demonstration of the curse of dimensionality. Each plot shows the pairwise distances between 200 random points.**\n\nYou can avoid this diminishment in performance with **spectral clustering**,\nwhich adds pre-clustering steps to the algorithm. To perform spectral\nclustering:\n\n1. Reduce the dimensionality of feature data by using PCA.\n2. Project all data points into the lower-dimensional subspace.\n3. Cluster the data in this subspace using your chosen algorithm.\n\nSee [*A Tutorial on Spectral\nClustering*](https://github.com/petermartigny/Advanced-Machine-Learning/blob/master/DataLab2/Luxburg07_tutorial_4488%5B0%5D.pdf) by Ulrike von Luxburg for more information on spectral\nclustering.\n| **Key terms:**\n|\n| \u003cbr /\u003e\n|\n| - spectral clustering\n|\n| \u003cbr /\u003e\n|"]]