Assembling an ML team

ML projects require teams with members who have a range of skills, expertise, and responsibilities related to machine learning. These are the most common roles found on typical ML teams:

Role	Knowledge and skills	Main deliverable
ML product manager	ML product managers have a deep understanding of ML strengths and weaknesses and the ML development process. They align business problems to ML solutions by working directly with the ML team, end-users, and other stakeholders. They create the product vision, define use cases and requirements, and plan and prioritize projects.	Product requirements document (PRD).
Engineering manager	Engineering managers achieve business goals by setting, communicating, and achieving team priorities. Like ML product managers, they align ML solutions to business problems. They set clear expectations for team members, conduct performance evaluations, and assist with career and professional development.	Design docs, project plans, and performance evaluations.
Data scientist	Data scientists use quantitative and statistical analysis to extract insights and value from data. They help to identify and test features, prototype models, and help with model interpretability.	Reports and data visualizations that answer business questions through statistical analysis.
ML engineer	ML engineers design, build, productionize, and manage ML models. They are strong software engineers with a deep understanding of ML technologies and best practices.	Deployed model with sufficient prediction quality to meet business goals.
Data engineer	Data engineers build data pipelines for storing, aggregating, and processing large amounts of data. They develop the infrastructure and systems for collecting and transforming raw data into useful formats for model training and serving. Data engineers are responsible for the data across the entire ML development process.	Fully productionized data pipelines with the necessary monitoring and alerting.
Developer operations (DevOps) engineer	DevOps engineers develop, deploy, scale, and monitor the serving infrastructure for ML models.	An automated process for serving, monitoring, testing, and alerting on a model's behavior.

Successful ML projects have teams with each role well represented. In smaller teams, individuals will need to handle the responsibilities for multiple roles.

Establish team practices

Because the roles, tools, and frameworks vary widely in ML development, it's critical to establish common practices through excellent process documentation. For example, one engineer might think that just getting the right data is sufficient to begin training a model, while a more responsible engineer will validate that the dataset is anonymized correctly and document its metadata and provenance. Making sure engineers share common definitions for processes and design patterns reduces confusion and increases the team's velocity.

Process documentation

Process docs should define the tools, infrastructure, and processes the team will use for ML development. Good process docs help align new and current team members. They should answer the following types of questions:

How is the data generated for the model?
How do we examine, validate, and visualize the data?
How do we modify an input feature or label in the training data?
How do we customize the data generation, training, and evaluation pipeline?
How do I change the model architecture to accommodate changes in input features or labels?
How do we obtain testing examples?
What metrics will we use to judge model quality?
How do we launch our models in production?
How will we know if something is wrong with our model?
What upstream systems do our models depend on?
How do I make my SQL maintainable and reusable?

Keep in mind

What constitutes "ML best practices" can differ between companies, teams, and individuals. For example, some team members might consider experimental Colabs as the main deliverable, while others will want to work in R. Some might have a passion for software engineering, someone else thinks monitoring is the most important thing, yet someone else is aware of good feature productionization practices but wants to use Scala. Everyone is "right" from their own perspective and if steered correctly, the mix will be a powerhouse. If not, it can be a mess.

Establishing the tools, processes, and infrastructure the team will use before writing a line of code can be the difference between the project failing after two years or successfully launching a quarter ahead of schedule.

Performance evaluations

Due to the ambiguity and uncertainty inherent in ML, people managers need to set clear expectations and define deliverables early.

When determining expectations and deliverables, consider how they'll be evaluated if a project or approach isn't successful. In other words, it's important that a team member's performance isn't directly connected to the success of the project. For example, it's not uncommon for team members to spend weeks investigating solutions that are ultimately unsuccessful. Even in these cases, their high-quality code, thorough documentation, and effective collaboration should contribute positively toward their evaluation.

Check Your Understanding

What is the primary reason for having excellent process documentation and establishing common practices?

Increase project velocity.

Correct. Having good process documentation and establishing common practices reduces confusion and streamlines the development process.

Establish best practices across a company.

Because ML development varies from project to project, teams typically establish their own sets of best practices to work effectively and increase their velocity.

Ensure all engineers on the team have the same level of expertise.

ML teams typically have engineers with a variety of skills and knowledge. Process documentation helps engineers align on best practices to increase their velocity.

Development phases

Working with stakeholders