資料陷阱
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
學習目標
本單元的學習內容包括:
- 調查原始或已處理資料集的潛在問題,包括:
收集及處理品質問題
- 找出偏誤、無效推論和合理化。
- 找出資料分析的常見問題,包括關聯性、
找出相關資訊,以及關聯性和關聯性
- 檢視圖表,檢視常見問題、誤解與
誤導性的顯示與設計選項
機器學習動機
雖然不像模型架構和其他下游模型一樣精美
資料探索、說明文件及預先處理,都是企業成功的關鍵
機器學習的運作方式機器學習從業人員可以像 Nithya Sambasivan 等人打給
資料串
的2021 年 ACM 報告
如果客戶不深入瞭解:
- 收集資料的條件
- 資料的品質、特性和限制
- 哪些資料可以和不能顯示
以錯誤的資料訓練模型,成本非常高
只有在輸出內容品質偏低時才會發現
與資料互動同樣地,如果無法掌握資料限制
收集資料有偏誤,或是為原因產生誤解
可能會導致廣告過度放送或放送不足的情況,
導致失去信任
本課程將介紹機器學習和資料常見但常見的細微資料陷阱
專業執業人員可能會在工作中遭遇任何風險。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2024-07-26 (世界標準時間)。
[null,null,["上次更新時間:2024-07-26 (世界標準時間)。"],[[["\u003cp\u003eThis module teaches you to identify potential issues in datasets, including biases and invalid inferences, ultimately helping you build better ML models.\u003c/p\u003e\n"],["\u003cp\u003eUnderstanding data limitations and collection conditions is crucial to avoid "data cascades" that lead to poor model performance and wasted resources.\u003c/p\u003e\n"],["\u003cp\u003eThe module explores common data analysis pitfalls, such as mistaking correlation for causation, and emphasizes the importance of proper data exploration and preprocessing in machine learning workflows.\u003c/p\u003e\n"],["\u003cp\u003eBy recognizing common problems in charts and data visualizations, you'll be able to avoid misperceptions and ensure accurate data representation.\u003c/p\u003e\n"]]],[],null,["# Data traps\n\n\u003cbr /\u003e\n\n| **Estimated time:** 1.5 hours\n\nLearning objectives\n-------------------\n\nIn this module, you will learn to:\n\n- Investigate potential issues underlying raw or processed datasets, including collection and quality issues.\n- Identify biases, invalid inferences, and rationalizations.\n- Find common issues in data analysis, including correlation, relatedness, and irrelevance.\n- Examine a chart for common problems, misperceptions, and misleading display and design choices.\n\nML motivation\n-------------\n\nWhile not as glamorous as model architectures and other downstream model work,\ndata exploration, documentation, and preprocessing are critical to\nML work. ML practitioners can fall into what Nithya Sambasivan et al. called\n[data cascades](https://research.google/blog/data-cascades-in-machine-learning/)\nin their [2021 ACM paper](https://dl.acm.org/doi/10.1145/3411764.3445518)\nif they do not deeply understand:\n\n- the conditions under which their data is collected\n- the quality, characteristics, and limitations of the data\n- what the data can and can't show\n\nIt's very expensive to train models on bad data and\nonly find out at the point of low-quality outputs that there were problems\nwith the data. Likewise, a failure to grasp the limitations of data, human\nbiases in collecting data, or mistaking correlation for causation,\ncan result in over-promising and under-delivering results, which can lead to a\nloss of trust.\n\nThis course walks through common but subtle data traps that ML and data\npractitioners may encounter in their work."]]