Change Location EUR
 
Mouser Europe - Flag Mouser Europe

Incoterms: DDP is available to customers in EU Member States.
All prices include duty and customs fees on select shipping methods.

Incoterms: DDU applies to most non-EU customers.
Duty, customs fees and taxes are collected at time of delivery.


Please confirm your currency selection:

Euros
Euros are accepted for payment only in EU member states these countries.

US Dollars
USD is accepted in all countries.

Other currency options may also be available - see
Mouser Worldwide.

Bench Talk for Design Engineers

Bench Talk

rss

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics


POC Post-Production: Tracking for ML Observability Becks Simpson

Moving Your ML Proof of Concept to Production Part 6: Ensuring Smooth Sailing Post-Production

(Source: TensorSpark/stock.adobe.com); generated with AI

After fortifying the machine learning (ML) proof of concept (POC) and launching it into production, the next important steps are planning for its future and ensuring its continued success. Although the bulk of the effort and resources for a project like this will go into the first steps that we covered previously in the series—including determining metrics and objectives, building the dataset, establishing an experiment environment, and developing and deploying the POC model and code—more tasks remain after production.

This final blog will cover what comes right at the “end,” once the project seemingly should be finished. In particular, as the model begins to be used, various elements should be tracked such as input and output data, performance, and any relevant metrics like latency or compute usage. These should then help identify performance issues or data drift, which can be corrected when necessary, by retraining and redeploying the model after establishing that it performs as well as or better than previous versions. This blog highlights important elements to track post-production, some useful tools for this type of monitoring, and strategies for deciding when and how to upgrade model versions as well as for launching them successfully.

Tracking Models

Once a model is in production, as with any piece of software, the typical discussion occurs around what to monitor, particularly in terms of performance and failures. However, with ML, monitoring goes further than just collecting information about things like resource usage or latency. Monitoring for post-production ML that goes beyond these low-level metrics is termed “observability” and is vital for determining not only when and how a model has gone astray but also how to fix it. Tracking for ML observability gives developers the ability to probe the underlying reasons why a model isn’t performing as expected in production. In addition to identifying model drift (when the model’s performance drops or gradually becomes worse over time), tracking under observability can help uncover causes—in particular, data drift. This occurs when the input data in production no longer match the expected data on which the model was trained. For example, a vision model trained to select certain information from scanned documents with the same format suddenly fails because newer documents have a different format.

Beyond data collected for observability and model troubleshooting, lower-level software-type metrics as well as higher-level business key performance indicator (KPI)–based metrics exist. Examples of each type include the following:

  • Software health—disk usage, error rates, memory, latency, and compute usage
  • Business value—linked to initial metrics chosen at the start of the POC development (e.g., number of purchases, loan approval rates, and cost savings)
  • ML observability—typically data-related metrics (e.g., percentage of missing values, type mismatches, or changes in value distributions) or model-related metrics (e.g., precision and recall for classification, mean absolute error or root mean squared error for regression, or top-k accuracy for ranking)

Note that to track model performance or accuracy, the true label or "correct answer" for what the model was trying to predict is also needed. Hence, incorporating a method for users to rate or correct the model's output is vital for capturing that information and ensuring that decisions about model retraining and redeployment can be made.

This may seem like a lot to keep track of post-production, but, fortunately, these monitoring or observability features no longer need to be built from scratch. Many solutions exist across the spectrum of levels of monitoring. For the software-type metrics, tools like Grafana or Datadog integrate well and collect necessary data seamlessly, typically with a helpful user interface on top. Other platforms, like Neptune, Evidently AI, and Arize, cover the more complex ML observability metrics from tracking model performance to providing features that help uncover issues like data drift.

Updating Models and Launching Changes

Updating ML models and launching changes in production requires a strategic approach to ensure optimal performance. Triggers for model updates include significant shifts in input data patterns, declining model accuracy, or the introduction of new features that could enhance predictions. When working with pre-trained or foundation models, like large language models, retraining strategies might involve fine-tuning the model with a smaller, domain-specific dataset or using transfer learning to adapt to new tasks. To safeguard against performance regressions, A/B testing and canary rollouts are effective methods for evaluating the new model. By gradually exposing a small subset of users to the updated model, teams can monitor its performance closely against the existing version, ensuring that the new model meets or exceeds benchmarks before a full deployment. This systematic approach not only mitigates risk but also fosters confidence in the reliability of the ML system. Additionally, monitoring KPIs throughout the process and having a rollback plan in place to revert quickly to the previous model if needed is crucial.

Conclusion

As you transition your successful ML POC to a fully operational production model, establishing a comprehensive strategy for ongoing monitoring and updates is essential. In particular, the key elements to monitor post-production include various metrics across several levels, from software performance to ML observability to business value, with particular attention toward ML-critical aspects such as data drift and model performance. Leveraging existing observability tools will accelerate the post-production work as well. Strategies for determining when to retrain models and best practices for testing new versions will ensure that your ML system remains effective and resilient over time.

Throughout this series, we’ve explored the critical steps necessary for a successful ML project, from setting goals and metrics to preparing datasets and developing the initial POC model. With this final blog, you should have all the knowledge and guidance you need to bring your ML ideas to fruition.



« Back


Becks Simpson is a Machine Learning Lead at AlleyCorp Nord where developers, product designers and ML specialists work alongside clients to bring their AI product dreams to life. In her spare time, she also works with Whale Seeker, another startup using AI to detect whales so that industry and these gentle giants can coexist profitably. She has worked across the spectrum in deep learning and machine learning from investigating novel deep learning methods and applying research directly for solving real world problems to architecting pipelines and platforms to train and deploy AI models in the wild and advising startups on their AI and data strategies.


All Authors

Show More Show More
View Blogs by Date

Archives