Efficiently Structure Data Science Teams to Achieve Company Goals
Mitra Goswami (PagerDuty Senior Director Data Science) is a machine learning professional with experience working in Astrophysics, Media, Martech, and the Financial Services Industry. Her expertise and interests lie in building scalable data-centric teams, creating cost-effective data engineering solutions to empower data science and machine learning in organizations, and leading machine learning innovation driving business value.
Data science continues to be an evolving field that encourages diversity in educational backgrounds, team structures, and the process of software development. Given this, there are multiple industry approaches for how to efficiently structure data science teams within organizations. This is not an easy task, and requires introspection of company goals, a realization of the data infrastructure’s maturity, and an understanding of the nuances of data science feature development. In this blog, I will explain how to efficiently structure data science teams to realize data science features in the product based on the aforementioned key factors.
Differences in Feature Development
So what’s the difference between typical feature development and data science feature development? This question is natural as data science feature development is very different from regular feature development. For example, while regular feature development can be driven entirely by the engineering team with a product collaborator, data science feature development requires engineering to closely collaborate with the data science team. In most cases, a huge share of the core work is driven by the data scientists.
Another differentiator is the absence of a proper infrastructure that favors data unification, cleaning, and mining. In this scenario, the data science feature development life cycle can take almost two or three times as long in comparison to regular feature development cycles. Finally, a data science feature is often layered with complexities and has an experimental component to it. This type of feature typically requires customer feedback, A/B testing, etc., which bridges the gap between accuracy and ease-of-use for a successful adoption by customers.
In the light of all these collaborations between teams and commitments on infrastructure, and the roadmap for launching data science features into the product, my questions for leaders are:
- What do you want to do with data science in your organization? Do you want to disrupt and innovate to make data science your core engine, or do you want sequential data science feature launches aligned to your product growth? Check out the centralized & embed model below.
- Where are you in the data science journey? Do you have the infrastructure (end-to-end pipeline from data storage -> cleaning -> training -> deployment) to perform machine learning, or do you have an expedient solution to run models, which can be improved by an incremental implementation of a long-term vision?
Let’s dive into the centralized & embed models below.
Embed Model
In this model, data scientists are embedded into the product teams, although data scientists from different engineering teams can also report to a central data science head. In collaboration with the data science team, each engineering manager is in charge of scoping out the requirements necessary for a data science project that the engineering team has the bandwidth to support (e.g., release in production).
If you are still in the infrastructure development phase and value sequential data science feature launches aligned to the product, the embed model is the way to go; however, single-person embed models are generally not very efficient. In most organizations, there is a mix of senior and junior data scientists and engineers. I define a senior data scientist as someone who can understand business requirements, code algorithms, and describe results that create a story. Similarly, a senior data engineer is someone who can understand the algorithm and can “productionize,” keeping in mind real-time vs. weekly/monthly training, cost efficiency, and performance.
As a rule of thumb, one senior data scientist and one senior data engineer is a good team that can work together within the embed model. What we want in the embed model is a small team with all the aforementioned skill sets who are aligned to the engineering team and the product counterpart. With this approach, the embed team can develop the model in production, creating a “handshake point” to be integrated with the engineering team simultaneously, and the engineering team can work on their end towards a successful integration.
If you are keen on increasing the velocity of data science feature development and making data science an innovation hub in the future, make sure you invest in building infrastructure with data engineers simultaneously as the embed teams develop product features.
Centralized Model
In a centralized model, the data science team takes the lead, while product teams perform market research, identify big bets, and prototype the solution. Once the prototype or proof of concept is ready, the engineering team implements the prototype into the product.
If your company has the proper infrastructure, you can develop an innovation hub internally. In this scenario, a centralized model would provide you and your teams the most value. Data scientists working together as a unit—with a product counterpart collaborating to build competitive research—is a dream job for almost any data scientist.
Incremental, product-driven feature development also becomes super easy at this point. However, I have also seen the centralized model with a robust architecture struggle at some organizations. Product-driven feature development requires product managers and data scientists to work in tandem to explore the possibilities and nuances of data science, and the exercise demands clarity in the roadmap for engineering and data science to collaborate on the implementation side. This is also a critical piece to ensure the success of various data science investments.
In summary, if you would like to increase your ROI on data science investments, go “EASY:”
- Establish infrastructure
- Access multiple service datasets efficiently and in a unified manner
- Socialize among the product managers & engineers and incentivize to collaborate on market research for data science features
- Yield clarity in the organization-wide roadmap to deliver data science jointly with the engineering organization
Are you facing similar challenges in your organization? Are you able to map your location in the data science journey? Stay tuned for our next blog on data science.