實際工作環境機器學習系統:問題
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
本課程著重於說明您應針對實際工作環境系統中的資料和模型提出的問題。
每項功能是否實用?
您應持續監控模型,移除對模型預測能力貢獻不大或完全沒有貢獻的功能。如果該功能的輸入資料突然變更,模型的行為也可能會突然以不理想的方式變更。
請同時考慮下列相關問題:
我們總是會想在模型中加入更多特徵。舉例來說,假設您發現新增的功能可稍微提升模型預測結果。雖然稍微準確的預測結果確實比稍微不準確的預測結果好,但額外功能會增加維護負擔。
資料來源是否可靠?
以下是一些關於輸入資料可靠性的相關問題:
- 信號是否一律可用,或是來自不可靠的來源?例如:
- 訊號是否來自在高負載下當機的伺服器?
- 訊號是否來自每年八月休假的人類?
- 計算模型輸入資料的系統是否會變更?如果是的話:
建議您為從上游程序收到的資料建立副本。然後,只有在您確定可以安全地進行時,才升級至上游資料的下一個版本。
您的模型是否會影響回饋迴路?
有時候,模型可能會影響自己的訓練資料。舉例來說,某些模型的結果會直接或間接成為該模型的輸入特徵。
有時一個模型可能會影響其他模型。舉例來說,請考量兩種預測股價的模型:
由於模型 A 有錯誤,因此誤判決定購買股票 X。這些購買行為會推升股票 X 的價格。模型 B 會使用股票 X 的價格做為輸入特徵,因此可能會對股票 X 的價值得出錯誤結論。因此,Model B 可以根據 Model A 的錯誤行為,購買或出售 Stock X。而模型 B 的行為又會影響模型 A,可能會觸發鬱金香水狂熱或公司 X 的股票下滑。
練習:檢查您的理解程度
下列哪三個模型容易產生迴圈?
這項流量預測模型會使用海灘人潮數量等特徵,預測海灘附近高速公路出口的壅塞情形。
部分海灘遊客可能會根據交通預測資料來安排行程。如果海灘人潮眾多,且預測車流量會變得擁擠,許多人可能會改變計畫。這可能會降低海灘人潮,導致預測車流量減少,進而增加人潮,如此循環往復。
書籍推薦模型,可根據書籍的熱門程度 (即書籍的購買次數),為使用者推薦可能喜歡的長篇小說。
書籍推薦內容可能會促進購買,而這些額外銷售量會作為輸入內容回饋至模型,讓系統日後更有可能推薦相同的書籍。
大學排名模式,部分依據學校的選拔性 (即錄取的申請學生百分比) 評分。
模型的排名可能會讓學生對評價最高的學校產生額外興趣,進而增加學校收到的申請數量。如果這些學校持續招收相同人數的學生,錄取率就會提高 (錄取學生的百分比會降低)。這麼做可提升這些學校的排名,進而進一步提高潛在學生的興趣,如此類推。
選舉結果模型:在投票結束後,透過對 2% 的選民進行調查,預測市長選舉的勝出者。
如果模型在民調結束後才發布預測結果,則其預測結果不會影響選民行為。
房屋價值模型,會使用房屋大小 (以平方公尺為單位)、臥室數量和地理位置做為特徵,來預測房價。
無法快速變更房屋的位置、大小或臥室數量,以因應價格預測,因此不太可能產生回饋迴路。不過,房屋大小和臥室數量之間可能存在關聯 (較大的房屋可能有更多房間),因此可能需要分開計算。
臉部特徵模型,可偵測相片中的人物是否在微笑,並定期以每月自動更新的版權圖片資料庫進行訓練。
這裡沒有任何回饋迴路,因為模型預測不會對相片資料庫造成任何影響。不過,輸入資料的版本管理是這裡的疑慮,因為這些每月更新可能會對模型產生意料之外的影響。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-27 (世界標準時間)。
[null,null,["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eContinuously monitor models in production to evaluate feature importance and potentially remove unnecessary ones, ensuring prediction quality and resource efficiency.\u003c/p\u003e\n"],["\u003cp\u003eData reliability is crucial; consider data source stability, potential changes in upstream data processes, and create local data copies to control versioning and mitigate risks.\u003c/p\u003e\n"],["\u003cp\u003eBe aware of feedback loops where a model's predictions influence future input data, potentially leading to unexpected behavior or biased outcomes, especially in interconnected systems.\u003c/p\u003e\n"],["\u003cp\u003eRegularly assess your model by asking if features are truly helpful and if their value outweighs the costs of inclusion, aiming for a balance between prediction accuracy and maintainability.\u003c/p\u003e\n"],["\u003cp\u003eEvaluate if your model is susceptible to a feedback loop and take steps to isolate it if you find it is.\u003c/p\u003e\n"]]],[],null,["# Production ML systems: Questions to ask\n\nThis lesson focuses on the questions you should ask about your data\nand model in production systems.\n\nIs each feature helpful?\n------------------------\n\nYou should continuously monitor your model to remove features that contribute\nlittle or nothing to the model's predictive ability. If the input data for\nthat feature abruptly changes, your model's behavior might also abruptly\nchange in undesirable ways.\n\nAlso consider the following related question:\n\n- Does the usefulness of the feature justify the cost of including it?\n\nIt is always tempting to add more features to the model. For example,\nsuppose you find a new feature whose addition makes your model's predictions\nslightly better. Slightly better predictions certainly seem better than\nslightly worse predictions; however, the extra feature adds to your\nmaintenance burden.\n\nIs your data source reliable?\n-----------------------------\n\nSome questions to ask about the reliability of your input data:\n\n- Is the signal always going to be available or is it coming from an unreliable source? For example:\n - Is the signal coming from a server that crashes under heavy load?\n - Is the signal coming from humans that go on vacation every August?\n- Does the system that computes your model's input data ever change? If so:\n - How often?\n - How will you know when that system changes?\n\nConsider creating your own copy of the data you receive from the\nupstream process. Then, only advance to the next version of the upstream\ndata when you are certain that it is safe to do so.\n\nIs your model part of a feedback loop?\n--------------------------------------\n\nSometimes a model can affect its own training data. For example, the\nresults from some models, in turn, become (directly or indirectly) input\nfeatures to that same model.\n\nSometimes a model can affect another model. For example, consider two\nmodels for predicting stock prices:\n\n- Model A, which is a bad predictive model.\n- Model B.\n\nSince Model A is buggy, it mistakenly decides to buy stock in Stock X.\nThose purchases drive up the price of Stock X. Model B uses the price\nof Stock X as an input feature, so Model B can come to some false\nconclusions about the value of Stock X. Model B could, therefore,\nbuy or sell shares of Stock X based on the buggy behavior of Model A.\nModel B's behavior, in turn, can affect Model A, possibly triggering a\n[tulip mania](https://wikipedia.org/wiki/Tulip_mania) or a slide in\nCompany X's stock.\n\n### Exercise: Check your understanding\n\nWhich **three** of the following models are susceptible to a feedback loop? \nA traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features. \nSome beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats. \nA book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased). \nBook recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future. \nA university-ranking model that rates schools in part by their selectivity---the percentage of students who applied that were admitted. \nThe model's rankings may drive additional interest to top-rated schools, increasing the number of applications they receive. If these schools continue to admit the same number of students, selectivity will increase (the percentage of students admitted will go down). This will boost these schools' rankings, which will further increase prospective student interest, and so on... \nAn election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed. \nIf the model does not publish its forecast until after the polls have closed, it is not possible for its predictions to affect voter behavior. \nA housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features. \nIt is not possible to quickly change a house's location, size, or number of bedrooms in response to price forecasts, making a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be teased apart. \nA face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly. \nThere is no feedback loop here, as model predictions don't have any impact on the photo database. However, versioning of the input data is a concern here, as these monthly updates could potentially have unforeseen effects on the model. \n[Help Center](https://support.google.com/machinelearningeducation)"]]