數值資料:結論
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
機器學習 (ML) 模型的健康狀況取決於其資料。提供正確資料給模型,模型就能發揮效用;提供錯誤資料給模型,模型的預測結果就會毫無價值。
處理數值資料的最佳做法:
- 請注意,機器學習模型會與特徵向量中的資料互動,而非資料集中的資料。
- 將大部分正規化
數值特徵。
- 如果第一項正規化策略失敗,請考慮採用其他
更適合將資料正規化
- 分箱 (也稱為分桶) 有時比歸一化更適合。
- 考量資料「應」呈現的樣貌,撰寫驗證測試來驗證這些預期。例如:
- 緯度的絕對值不得超過 90。您可以撰寫
測試,看看資料中是否出現大於 90 的緯度值。
- 如果資料僅限於佛羅里達州,您可以撰寫測試,檢查緯度是否介於 24 到 31 之間 (含兩端)。
- 使用散佈圖和直方圖以視覺化方式呈現資料。查看是否有異常。
- 收集的不僅是整個資料集的資料,也可收集規模較小的統計資料
資料集子集這是因為匯總統計資料有時會掩蓋資料集較小部分的問題。
- 記錄所有資料轉換作業。
資料是最有價值的資源,請謹慎處理。
後續步驟
恭喜您完成本單元!
我們鼓勵您按照自己的步調和興趣,探索各種 MLCC 模組。如果您想按照建議的順序學習,建議您接著前往下一個單元:表示分類資料。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2024-11-10 (世界標準時間)。
[null,null,["上次更新時間:2024-11-10 (世界標準時間)。"],[[["\u003cp\u003eA machine learning model's predictive ability is directly dependent on the quality of data it's trained on.\u003c/p\u003e\n"],["\u003cp\u003eNumerical features often benefit from normalization or binning to improve model performance.\u003c/p\u003e\n"],["\u003cp\u003eData validation through verification tests and visualizations is crucial for identifying and addressing potential issues.\u003c/p\u003e\n"],["\u003cp\u003eUnderstanding data distribution through statistics on both the entire dataset and its subsets is essential for identifying hidden problems.\u003c/p\u003e\n"],["\u003cp\u003eMaintaining thorough documentation of all data transformations ensures reproducibility and facilitates model understanding.\u003c/p\u003e\n"]]],[],null,["# Numerical data: Conclusion\n\nA machine learning (ML) model's health is determined by its data. Feed your\nmodel healthy data and it will thrive; feed your model junk and its\npredictions will be worthless.\n\nBest practices for working with numerical data:\n\n- Remember that your ML model interacts with the data in the [**feature vector**](/machine-learning/glossary#feature_vector), not the data in the [**dataset**](/machine-learning/glossary#dataset).\n- [**Normalize**](/machine-learning/glossary#normalization) most numerical [**features**](/machine-learning/glossary#feature).\n- If your first normalization strategy doesn't succeed, consider a different way to normalize your data.\n- [**Binning**](/machine-learning/glossary#binning), also referred to as [**bucketing**](/machine-learning/glossary#bucketing), is sometimes better than normalizing.\n- Considering what your data *should* look like, write verification tests to validate those expectations. For example:\n - The absolute value of latitude should never exceed 90. You can write a test to check if a latitude value greater than 90 appears in your data.\n - If your data is restricted to the state of Florida, you can write tests to check that the latitudes fall between 24 through 31, inclusive.\n- Visualize your data with scatter plots and histograms. Look for anomalies.\n- Gather statistics not only on the entire dataset but also on smaller subsets of the dataset. That's because aggregate statistics sometimes obscure problems in smaller sections of a dataset.\n- Document all your data transformations.\n\nData is your most valuable resource, so treat it with care.\n\nAdditional Information\n----------------------\n\n- The *Rules of Machine Learning* guide contains a valuable [Feature Engineering](https://developers.google.com/machine-learning/rules-of-ml/#ml_phase_ii_feature_engineering) section.\n\nWhat's next\n-----------\n\nCongratulations on finishing this module!\n\nWe encourage you to explore the various [MLCC modules](/machine-learning/crash-course)\nat your own pace and interest. If you'd like to follow a recommended order,\nwe suggest that you move to the following module next:\n**[Representing categorical data](/machine-learning/crash-course/categorical-data)**.\n\n*** ** * ** ***\n\n| **Key terms:**\n|\n| - [Binning](/machine-learning/glossary#binning)\n| - [Bucketing](/machine-learning/glossary#bucketing)\n| - [Dataset](/machine-learning/glossary#dataset)\n| - [Feature](/machine-learning/glossary#feature)\n| - [Feature vector](/machine-learning/glossary#feature_vector)\n- [Normalization](/machine-learning/glossary#normalization) \n[Help Center](https://support.google.com/machinelearningeducation)"]]