Skip to main content

How to build a data science and machine learning roadmap in 2022

Shot of a group of programmers working together on a computer code at night
Image Credit: Jay Yuno // Getty Images

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Closing the gap between their organization’s choice to invest in a data science and machine learning (DSML) strategy and the needs that business units have for results, will dominate data and analytics leaders’ priorities in 2022. Despite the growing enthusiasm for DSML’s core technologies, getting results from its strategies is elusive for enterprises.

Market forecasts reflect enterprises’ early optimism for DSML. IDC estimates worldwide revenues for the artificial intelligence (AI) market, including software, hardware, and services will grow 15.2% year over year in 2021 to $341.8 billion and accelerate further in 2022 with 18.8% growth, reaching $500 billion by 2024. In addition, 56% of global enterprise executives said their adoption of DSML and AI is growing, up from 50% in 2020, according to McKinsey.

Gartner notes that organizations undertaking DSML initiatives rely on low-cost, open-source, and public cloud service provider offerings to build their knowledge, expertise, and test use cases. The challenge remains of how best to productize models to be deployed and managed at scale.

DSML is delivering uneven value in enterprises today

Data scientist teams in financial services, health care, and manufacturing tell VentureBeat their enterprise’s DSML strategies are the most effective when they anticipate and plan for uneven initial results by business unit. The teams also say producing models at scale using MLOps is fundamentally different from producing mainstream internal apps with DevOps. They add that the more complex the operating model of a business unit, the steeper the MLOps learning curve. DSML’s contributions to business units vary by the availability of reliable data and how clearly defined problem statements are.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

O’Reilly found that “enterprise AI won’t have matured until development and operations groups can engage in practices like continuous deployment until results are repeatable (at least in a statistical sense), and until ethics, safety, privacy, and security are primary rather than secondary concerns.

Kaggle indicated that 80.3% of respondents use linear or logistic regression algorithms, followed by decision trees and random forests (74.1%) and gradient boosting machines (59.5%). Enterprises are just scratching the surface of DSML’s potential, with adoption slowed by several factors that need to improve in 2022.

How and where DSML will improve in 2022

Getting the foundational elements of a DSML platform right accelerates the accuracy, speed, and quality of decision-making. As the latest Gartner Magic Quadrant shows, DSML platform providers are making strides in providing more flexible, scalable infrastructures that have governance designed to support multiple personas’ needs at scale combined with extensibility. Enterprises that McKinsey considers to be “high performers” use cloud infrastructure much more than their peers do, with 64% of their AI workloads running on public or hybrid cloud, compared with 44% of their peers. In addition, McKinsey notes that this group relies on public cloud infrastructure to access a wider range of AI capabilities and techniques.

DSML strategies are going to see growing adoption across organizations in 2022, and the following are areas where organizations and platform providers can work together to improve outcomes by having these areas covered on their roadmaps for 2022:

  • Adaptive ML shows potential for improving cybersecurity, remote site security, quality management in manufacturing, and fine-tuning industrial robotics systems.

Look for Adaptive ML to find increased adoption across a spectrum of use cases defined by how rapidly changing their contextual data, conditions, and actions are. For example, combining cyber risk and remote site risk assessments in an adaptive ML model is a use case that utility companies are using in production today. Adaptive ML’s greatest gains could come from manufacturing, where combining telemetry data from visual IoT sensors with adaptive ML-based applications can identify defective products immediately and pull them from the production line. Saving customers the hassle of returning defective products can increase customer loyalty while reducing costs. Given the chronic labor shortage manufacturers face, combining Adaptive ML techniques with robotics can help manufacturers still meet customers’ needs for products consistently. Adaptive ML is also the basis of autonomous self-driving vehicle systems and collaborative, smart robots that quickly learn how to complete simple tasks together through iteration. DSML platform vendors known for their expertise include Cogitai, Google, Guavus, IBM, Microsoft, SAS, Tazi, and others.

  • Collaborative workflow support in DSML platforms becomes table stakes for competing in the market.

Data scientists tell VentureBeat that workarounds to DSML platforms not designed in collaboration workflows to flex and adapt to their needs can cost weeks of model development time. Collaboration tools and workflows need to get beyond simple question-and-answer forums and provide more effective cross-modal data and code repositories that each collaborator can securely use across an enterprise. There also needs to be support for data and model visualization and the option for exporting models. The must-haves for collaboration to meet data scientist requirements include communication and code sharing across each step in the modeling process, data lineage and model tracking, and version control and model lineage analysis. DSML platform vendors offering collaborative workflow support include Domino, Dataiku, Google, Microsoft, SAS, TIBCO, RapidMiner, and others.

  • MLOps will have a breakout year as organizations gain more experience scaling models for deployment faster while tracking business outcomes for greater results.

Reducing the cycle times for creating and launching new models is one of the key metrics of how DSML projects are evaluated in enterprises today. Every DSML platform vendor offers its version of MLOps support. Enterprises considering a DSML strategy need to review how each platform of interest handles model creation, management, maintenance, model and code reuse, updates, and governance. Look for every DSML platform vendor to continue fine-tuning how they modify MLOps to provide greater model scalability and security in 2022. DSML platform vendors will rely on MLOps differentiators, including model taxonomies, version control, model maintenance, monitoring, and code and model reuse. The best DSML platforms also ensure their MLOps workflows have the option of tying back to measuring business outcomes using metrics and key performance indicators (KPIs) relevant to financial decision-makers and line-to-business owners.

  • Privacy concerns will force every organization creating sensor-connected products and the services supporting them to use synthetic data to build, test, and refine models.

The current and next generation of connected devices with embedded sensors to capture biometric data are among the most challenging machine learning models to create today. Startups creating AI-based worker safety systems are finding it necessary to create and fine-tune synthetic data so they can predict, for example, when, where, and how accidents can potentially occur. The Wall Street Journal provides a fascinating glimpse into how effective synthetic data is and how pervasive it’s becoming in AI and ML models development. The article explains how American Express improves its fraud prediction models using generative adversarial networks, a much-used technique for creating synthetic data of randomized fraud patterns. Autonomous vehicle companies are also relying on synthetic data to train their models, including Aurora, Cruise, and Waymo, all of which use synthetic data to train the perception systems that guide their cars.

  • DSML platform providers need to scale up and automate the entire ML workflow at scale.

Providers have multiple generations of model development tools, and their experience shows in the maturity of the workflows they can support. The goal for 2022 is to improve model deployment and management and to integrate zero trust into MLOps workflows while retaining the flexibility of customizing workflows. AutoML will see greater adoption as enterprises look to accelerate their ML workflows, with data scientists skilled with its techniques in high demand. Automating ML workflows will deliver greater reusability of ML code components, trim cycle times for model testing and validation, and increase the productivity of data science teams in the process.

  • Transfer learning will see rapid adoption across enterprises with DSML strategies operating at scale and in production today.

The essence of transfer learning is reusing existing trained machine learning models to get a head start on new model development. It’s particularly useful for data science teams working with supervised machine learning algorithms that require labeled data sets to deliver accurate analyses. Instead of starting over on a new supervised machine learning model, data scientists can use transfer leveling to customize models for a given business goal quickly. In addition, transfer learning modules are becoming more relevant across process-centric industries that rely on computer vision because of the scale it provides for labeled data. Leading DSML platform providers who offer transfer learning include Alteryx, Google, IBM, SAS, TIBCO, and others.

  • Organizations need to focus on use cases and metrics first and realize that exceptional model accuracy may not deliver business value.

One of the most common challenges when building supervised machine learning models, especially when there is an abundance of telemetry data from sensors and endpoints, is the tendency to keep tweaking models for one more degree of accuracy. Telemetry data from manufacturing shop floors can be sporadic and varies by cycle count, frequency, and the speed of a given machine, among many other factors. It’s easy to get caught up on what real-time telemetry data from the shop floor says about the machines, but pulling back to see what the data is saying about shop floor productivity and its impact on margins needs to stay in focus as the primary goal.

DSML strategies must be grounded on business outcomes

Organizations pursuing DSML strategies need to go into 2022 with a clear roadmap of what they want to accomplish from a business case perspective first, anchored in measurable customer outcomes. The speed and variety of innovations that DSML platform providers plan to announce in the next twelve months will revolve around five key areas. These include democratizing ML model creation and making model building and fine-tuning available to more business professionals. Second, DSML platforms’ multi-persona support will improve in the next twelve months, further supporting greater adoption. Third, automating ML workflows end-to-end will help accelerate MLOps cycles in 2022, driving the fourth factor of an improved line of business reporting tied to model performance. Fifth, enterprises want much faster time-to-value for their DSML investment, and the DSML platform vendor landscape will need to quantify their value with greater precision and real-time insights to hold onto customers and attract new ones.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.