组建机器学习团队
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
机器学习项目需要由具备机器学习相关技能、专业知识和职责的团队成员组成。以下是典型机器学习团队中常见的角色:
角色 |
知识和技能 |
主要交付成果 |
机器学习产品经理 |
机器学习产品经理对机器学习的优势和劣势以及机器学习开发流程有深入的了解。他们会直接与机器学习团队、最终用户和其他利益相关方合作,将业务问题与机器学习解决方案相结合。他们负责制定产品愿景、定义用例和要求,以及规划和确定项目优先级。
|
产品要求文档 (PRD)。
|
工程经理 |
工程经理通过设定、传达和实现团队优先事项来实现业务目标。与机器学习产品经理一样,他们负责将机器学习解决方案与业务问题相结合。
他们会为团队成员设定明确的期望,开展绩效评估,并协助团队成员实现职业和专业发展。
|
设计文档、项目计划和效果评估。
|
数据科学家 |
数据科学家使用定量和统计分析从数据中提取洞察和价值。它们有助于识别和测试特征、构建模型原型,并有助于提高模型的可解释性。
|
报告和数据可视化图表,可通过统计分析回答业务问题。
|
机器学习工程师 |
机器学习工程师负责设计、构建、部署到生产环境和管理机器学习模型。
他们是出色的软件工程师,对机器学习技术和最佳实践有深入的了解。
|
部署的模型具有足够的预测质量,可满足业务目标。
|
数据工程师 |
数据工程师构建数据流水线,以存储、汇总和处理大量数据。他们开发基础架构和系统,用于收集原始数据并将其转换为有用的格式,以便进行模型训练和服务。数据工程师负责整个机器学习开发流程中的数据。
|
完全正式投入使用的具有必要监控和提醒功能的数据流水线。
|
开发者运维 (DevOps) 工程师 |
DevOps 工程师负责开发、部署、扩缩和监控机器学习模型的服务基础架构。
|
一种自动化流程,用于提供、监控、测试和提醒模型行为。
|
成功的 ML 项目的团队中,每个角色都有充分的代表。在较小的团队中,个人需要承担多个角色的职责。
建立团队做法
由于机器学习开发中的角色、工具和框架差异很大,因此通过出色的流程文档制定通用做法至关重要。例如,某位工程师可能认为,只需获取正确的数据即可开始训练模型,而更负责任的工程师会验证数据集是否已正确匿名化,并记录其元数据和来源。确保工程师共享流程和设计模式的通用定义,有助于减少混淆并提高团队的开发速度。
流程文档
流程文档应定义团队将用于机器学习开发的工具、基础架构和流程。优质的流程文档有助于新团队成员和现有团队成员达成共识。它们应回答以下类型的问题:
- 如何为模型生成数据?
- 我们如何检查、验证和直观呈现数据?
- 如何修改训练数据中的输入特征或标签?
- 如何自定义数据生成、训练和评估流水线?
- 如何更改模型架构以适应输入特征或标签的变化?
- 如何获取测试示例?
- 我们将使用哪些指标来评判模型质量?
- 如何在生产环境中发布模型?
- 如何判断模型是否有问题?
- 我们的模型依赖于哪些上游系统?
- 如何使 SQL 可维护且可重复使用?
更多潜在问题
型号
培训
如何检查模型对手动创建的示例的预测结果?
如何查找、检查和直观呈现模型出错的示例?
如何确定给定预测结果中哪个特征的贡献最大?
如何了解在给定样本中哪些特征对预测有最大影响?
如何计算或绘制针对所选数据集或样本的模型预测结果?
如何计算模型针对所选数据集的预测的标准指标?
如何开发和计算自定义指标?
如何离线将我的模型与其他模型进行比较?
我可以在单个开发环境中对多个模型评估执行元分析吗?
我可以将当前模型与 10 个月前的模型进行比较吗?
正式版发布、监控和维护
流水线
如何自定义数据生成/训练/评估流水线?
我应该在什么时候以及如何创建全新的流水线?
SQL
基础架构
我们的模型服务是如何运作的?有图表吗?
我应该了解我的模型依赖于哪些上游系统?
通信
注意事项
不同公司、团队和个人对“机器学习最佳实践”的定义可能有所不同。例如,有些团队成员可能会将实验性 Colab 作为主要交付成果,而其他成员则希望使用 R 进行工作。有些人可能热衷于软件工程,有些人认为监控是最重要的事项,还有些人知道良好的功能生产化实践,但想使用 Scala。从各自的角度来看,每个人都是“正确的”,如果能正确引导,这种组合将会发挥强大作用。否则,可能会出现混乱。
在编写一行代码之前建立团队将使用的工具、流程和基础架构,这可能会决定项目是两年后失败,还是提前一个季度成功发布。
由于机器学习固有的模糊性和不确定性,人力资源经理需要尽早设定明确的期望并确定交付成果。
确定预期和交付成果时,请考虑如果项目或方法不成功,将如何评估它们。换句话说,团队成员的表现与项目的成败没有直接关系,这一点非常重要。例如,团队成员花费数周时间研究最终未能成功的解决方案的情况并不少见。即使在这些情况下,他们优质的代码、详实的文档和有效的协作应该也会对他们的评估产生积极影响。
检查您的理解情况
制定完善的流程文档并建立通用做法的主要原因是什么?
提高项目速度。
正确。编写良好的流程文档并建立常见做法有助于减少混淆并简化开发流程。
在整个公司范围内建立最佳实践。
由于机器学习开发因项目而异,因此团队通常会制定自己的一套最佳实践,以便高效工作并提高速度。
确保团队中的所有工程师都拥有相同的专业水平。
机器学习团队通常由拥有各种技能和知识的工程师组成。流程文档有助于工程师遵循最佳实践,从而加快工作速度。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eMachine learning projects necessitate diverse teams with specialized roles like ML product managers, data scientists, and ML engineers, to address various aspects of development and deployment.\u003c/p\u003e\n"],["\u003cp\u003eComprehensive process documentation is crucial for ML teams to establish common practices, ensure smooth collaboration, and enhance project velocity by reducing confusion and streamlining workflows.\u003c/p\u003e\n"],["\u003cp\u003eProcess documentation should cover key questions regarding data handling, model development, training, evaluation, and productionization to guide the team's approach and decision-making.\u003c/p\u003e\n"],["\u003cp\u003eEstablishing clear expectations, deliverables, and evaluation criteria for team members is essential, emphasizing contributions beyond project success due to the inherent uncertainties in ML development.\u003c/p\u003e\n"],["\u003cp\u003eSuccessful ML teams foster a collaborative environment where diverse perspectives and expertise are valued, enabling efficient problem-solving and innovative solutions.\u003c/p\u003e\n"]]],[],null,["# Assembling an ML team\n\nML projects require teams with members who have a range of skills, expertise,\nand responsibilities related to machine learning. These are the most common\nroles found on typical ML teams:\n\n| Role | Knowledge and skills | Main deliverable |\n|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|\n| ML product manager | ML product managers have a deep understanding of ML strengths and weaknesses and the ML development process. They align business problems to ML solutions by working directly with the ML team, end-users, and other stakeholders. They create the product vision, define use cases and requirements, and plan and prioritize projects. | Product requirements document (PRD). |\n| Engineering manager | Engineering managers achieve business goals by setting, communicating, and achieving team priorities. Like ML product managers, they align ML solutions to business problems. They set clear expectations for team members, conduct performance evaluations, and assist with career and professional development. | Design docs, project plans, and performance evaluations. |\n| Data scientist | Data scientists use quantitative and statistical analysis to extract insights and value from data. They help to identify and test features, prototype models, and help with model interpretability. | Reports and data visualizations that answer business questions through statistical analysis. |\n| ML engineer | ML engineers design, build, productionize, and manage ML models. They are strong software engineers with a deep understanding of ML technologies and best practices. | Deployed model with sufficient prediction quality to meet business goals. |\n| Data engineer | Data engineers build data pipelines for storing, aggregating, and processing large amounts of data. They develop the infrastructure and systems for collecting and transforming raw data into useful formats for model training and serving. Data engineers are responsible for the data across the entire ML development process. | Fully productionized data pipelines with the necessary monitoring and alerting. |\n| Developer operations (DevOps) engineer | DevOps engineers develop, deploy, scale, and monitor the serving infrastructure for ML models. | An automated process for serving, monitoring, testing, and alerting on a model's behavior. |\n\nSuccessful ML projects have teams with each role well\nrepresented. In smaller teams, individuals will need to handle the\nresponsibilities for multiple roles.\n\n\nEstablish team practices\n------------------------\n\nBecause the roles, tools, and frameworks vary widely in ML\ndevelopment, it's critical to establish common practices through\nexcellent process documentation. For example, one engineer might\nthink that just getting the right data is sufficient to begin training a model,\nwhile a more responsible engineer will validate that the dataset is anonymized\ncorrectly and document its metadata and provenance. Making sure engineers share\ncommon definitions for processes and design patterns reduces confusion and\nincreases the team's velocity.\n\n### Process documentation\n\nProcess docs should define the tools, infrastructure, and processes the team\nwill use for ML development. Good process docs help align new and current\nteam members. They should answer the following types of questions:\n\n- How is the data generated for the model?\n- How do we examine, validate, and visualize the data?\n- How do we modify an input feature or label in the training data?\n- How do we customize the data generation, training, and evaluation pipeline?\n- How do I change the model architecture to accommodate changes in input features or labels?\n- How do we obtain testing examples?\n- What metrics will we use to judge model quality?\n- How do we launch our models in production?\n- How will we know if something is wrong with our model?\n- What upstream systems do our models depend on?\n- How do I make my SQL maintainable and reusable?\n\n#### More potential questions\n\n**Model**\n\n-\n Can I train models on different datasets in the same\n pipeline, like for fine-tuning?\n\n-\n How do I add a new test dataset to my pipeline?\n\n**Training**\n\n-\n How do I check the model's prediction on a hand-crafted example?\n\n-\n How do I find, examine, and visualize examples where the model made\n mistakes?\n\n-\n How do I determine which feature was most responsible for a given\n prediction?\n\n-\n How do I understand which features have the most impact on\n predictions within a given sample?\n\n-\n How do I compute or plot model predictions on a chosen dataset or\n sample?\n\n-\n How do I compute standard metrics for my model's predictions on a\n chosen dataset?\n\n-\n How do I develop and compute custom metrics?\n\n-\n How do I compare my model with other models offline?\n\n-\n Can I perform meta-analysis for multiple model evaluations in a single\n development environment?\n\n-\n Can I compare the current model with the one from 10 months ago?\n\n**Productionization, monitoring, and maintenance**\n\n-\n I think I created a good model. How can I launch it in production?\n\n-\n How do I verify that my new model is running in production correctly?\n\n-\n Can I get the history of model evaluations over time?\n\n-\n How will I know when something is wrong with the model?\n\n-\n I got assigned a page/bug mentioning something about the model.\n What should I do?\n\n**Pipelines**\n\n-\n How could I customize the data generation/training/evaluation\n pipeline?\n\n-\n When and how should I create a completely new pipeline?\n\n**SQL**\n\n-\n I need SQL to generate some data. Where should I put it?\n\n**Infrastructure**\n\n-\n How does our model serving work? Is there a diagram?\n\n-\n What upstream systems does my model depend on that I should be\n aware of?\n\n**Communication**\n\n-\n I can't figure something out. Who (and how) should I contact?\n\n### Keep in mind\n\nWhat constitutes \"ML best practices\" can differ between companies, teams, and\nindividuals. For\nexample, some team members might consider experimental Colabs as the main\ndeliverable, while others will want to work in R. Some might have a passion for\nsoftware engineering, someone else thinks monitoring is the most important\nthing, yet someone else is aware of good feature productionization practices but\nwants to use Scala. Everyone is \"right\" from their own perspective and if\nsteered correctly, the mix will be a powerhouse. If not, it can be a mess.\n\nEstablishing the tools, processes, and infrastructure the team will use before\nwriting a line of code can be the difference between the project failing after\ntwo years or successfully launching a quarter ahead of schedule.\n\nPerformance evaluations\n-----------------------\n\nDue to the ambiguity and uncertainty inherent in ML, people managers need to set\nclear expectations and define deliverables early.\n\nWhen determining expectations and deliverables, consider how they'll be\nevaluated if a project or approach isn't successful. In other words, it's\nimportant that a team member's performance isn't directly connected to the\nsuccess of the project. For example, it's not uncommon for team members to spend\nweeks investigating solutions that are ultimately unsuccessful. Even in these\ncases, their high-quality code, thorough documentation, and effective\ncollaboration should contribute positively toward their evaluation.\n\n### Check Your Understanding\n\nWhat is the primary reason for having excellent process documentation and establishing common practices? \nIncrease project velocity. \nCorrect. Having good process documentation and establishing common practices reduces confusion and streamlines the development process. \nEstablish best practices across a company. \nBecause ML development varies from project to project, teams typically establish their own sets of best practices to work effectively and increase their velocity. \nEnsure all engineers on the team have the same level of expertise. \nML teams typically have engineers with a variety of skills and knowledge. Process documentation helps engineers align on best practices to increase their velocity."]]