Master Data Management Insights: PIM, MDM, & Data Governance

Trust the Machine: Making AI Automation Reliable in MDM

Written by Jesper Grode | Apr 4, 2025 8:00 AM

At any large organization, if you’re involved with master data management (MDM), you’re constantly wrestling with supplier data flows, regulatory compliance, sustainability metrics, marketplace sharing... while you need to keep costs down and satisfaction up.

With machine learning (ML), you can automate repetitive, time-consuming tasks at incredible speeds, but trusting AI predictions gives you a real challenge.

When your ML model suggests incorrect categories for products or misidentifies relationships, those errors create downstream problems that can erase your efficiency gains.

After all, if you need to manually verify every AI decision, are you really saving resources?

At Stibo Systems, we’re combining the speed of ML with mathematical verification, enabling you to achieve true automation with the level of accuracy your business needs.

And in this blog post, I’ll share how we arrived there and how it works. You’ll get the context AND the takeaways.

Why do we need machines to help with MDM?

Your data management workload probably feels like it's expanding faster than your ability to handle it. You're not alone. Today's data managers navigate a complex landscape where accuracy, speed and volume all compete for priority.

  • Onboarding massive product assortments from suppliers with inconsistent data formats and quality
  • Maintaining governance processes that keep your data trustworthy across systems
  • Staying compliant with constantly evolving regulatory requirements
  • Managing an increasing array of sustainability and ESG data points
  • Distributing clean, consistent data to various channels and marketplaces

Traditional approaches to these challenges often rely heavily on manual processes, business rules and human verification.

This can work on a smaller scale, but when your data volumes grow – you get bottlenecks. A product manager who could once verify 50 new items per day now faces batches of thousands, making manual review impractical.

And it’s not just about saving time – it's about maintaining competitive advantage. When your competitors can onboard new products faster, update information more quickly and distribute data more effectively, they gain crucial market advantages. Manual processes simply can't scale to meet these demands without significant resource investments.

What makes these challenges particularly suited for technological intervention is their repetitive, pattern-based nature. It’s often about recognizing similarities, applying consistent rules and making evidence-based decisions.

Exactly the kind of work where ML excels.

How ML has changed the game completely

ML aligns perfectly with data management's most labor-intensive tasks.

At its core, ML excels at pattern recognition, categorization and prediction – precisely what you need when managing large volumes of complex data. Take these, for example:

  • Product categorization

    Automatically assigning new products to the correct categories in your taxonomy.
  • Data matching and deduplication

    Identifying when different records represent the same real-world entity.
  • Attribute mapping

    Connecting supplier-specific attributes to your standardized data model.
  • Data quality scoring

    Predicting completeness and accuracy levels without manual review.
  • Anomaly detection

    Flagging unusual data patterns that might indicate errors or opportunities.

 

The efficiency gains can be remarkable. Tasks that once took days, can be done in minutes. And it’s not just speed:

ML brings consistency

Human categorization naturally varies between individuals and even by the same person at different times. ML applies the same logic consistently (statistically). You limit the variability that causes data issues downstream.

ML adapts as your business evolves

With traditional, rule-based systems, you need explicit reprogramming when your business conditions change. ML models, on the other hand, can spot shifting patterns in your data and adjust accordingly. This adaptability means your data management processes stay current with minimal intervention.

But these benefits come with an important caveat: The predictions are only as good as the model's accuracy. And that's where many data management teams hit a roadblock on their automation journey.

If you can’t trust your ML, it all falls apart

ML predictions come with an inherent uncertainty. While ML models are good at recognizing patterns, they don't give you the certainty of mathematical proof. This creates a trust challenge that can significantly limit automation potential.

The accuracy dilemma

Every ML model produces some level of inaccuracy, typically manifesting in two forms:

  1. False negatives, where valid matches aren't recognized
  2. False positives, where incorrect matches are made

For data management, false positives create the bigger problem. When a model incorrectly categorizes a product or incorrectly matches two different customers, these errors propagate through your systems, giving you data quality issues that can affect business operations.

Partial automation isn't enough

Many organizations respond to this challenge with a hybrid approach: using ML for initial processing, then manually reviewing the results. While it still beats fully manual processes, this approach:

  • Creates review bottlenecks during high-volume periods
  • Still needs significant human resources
  • Limits the scalability advantages you’re looking for with ML
  • Delays time-sensitive processes

The verification gap

The core issue isn't that ML makes mistakes – it's the lack of a reliable mechanism to verify which predictions you can trust.

Without knowing which predictions are certainly correct, you're forced to verify everything or accept a level of error in your master data.

This verification gap represents the critical barrier between assisted processing and true automation. And to close the gap, you need a fundamentally different approach that combines ML's pattern recognition capabilities with methods that can provide mathematical certainty about prediction accuracy.

After all: You don't need to know that your model is 95% accurate overall. You need to know which 95% of predictions you can trust.

This is indeed a difficult problem to tackle – one that varies in nature across use cases. But at Stibo Systems we’re relentless in solving such challenges, and to tackle this one, we’re starting with verification of AI Assistance Classification Recommendations – in research also known as Ontology (classification) Mapping.

 

 

How to build mathematical verification mechanisms into your ML model

Our research with the Technical University of Denmark (DTU) – one of the leading technical universities in Europe, also named best technical university in Denmark – has led to a breakthrough approach.

It combines the speed of ML with the certainty of mathematical verification.

The verification challenges we’re solving

When using AI to map between classification systems – for example, matching "Hand tools" in one product taxonomy to "Handheld tools" in another – ML models can make impressive predictions.

But those predictions always carry statistical uncertainty that can undermine trust. And trust, in this context, is kind of binary: Either you can trust your results, or you can’t (and have to double-check).

The key issue is consistency. How can you verify that the mapping relationships make logical sense across entire classification structures?

Mathematical certainty meets practical application

In our solution, we apply formal mathematical methods to validate that recommended mappings are logically consistent. Mathematical proof confirms which predictions are correct, not just statistically likely.

For this, we use specifically propositional logic and “Horn Clauses.”

What are Horn Clauses?

They might sound complex, but they're simply logical statements that follow an "if-then" pattern. For example, "if a product is a hammer AND hammers belong to hand tools, THEN the product belongs to hand tools." By applying these logical rules across classification systems, we can mathematically verify whether mappings make sense.

But you might wonder: If formal methods provide the certainty we need, why use ML at all?

The answer lies in computational efficiency. Applying formal methods alone to large classification structures would be too computationally expensive for practical use.

By combining approaches, we get the best of both worlds:

  • Fast ML predictions narrow down the possible mappings.
  • Mathematical verification confirms which predictions are definitively correct.
  • Only uncertain predictions need human review.

The hybrid approach dramatically reduces verification workload while maintaining accuracy, making true automation possible.

Understand all this at a deeper level: This work has been published in “The Practice of Formal Methods,” an essay in the honor of Cliff Jones.

At Stibo Systems, we’re applying this in real life

The mathematical foundations are rock-solid – formal logic is either correct or incorrect, with no middle ground. The exciting part involves applying these principles to real-world MDM challenges.

Our implementation currently focuses on high-volume scenarios where verification creates significant bottlenecks.

For example, for retailers onboarding thousands of supplier items regularly, the ability to automatically validate AI-suggested categorizations creates tremendous value throughout the process.

But we work with companies in many industries, with potential use cases everywhere:

  • Financial services firms matching transaction categories across different systems
  • Healthcare organizations aligning medical coding taxonomies
  • Manufacturing companies standardizing part classifications from multiple suppliers
  • Public sector agencies harmonizing service categories across departments

Any situation involving taxonomy alignment, classification matching or standardization across systems could benefit from verified predictions.

Current status on our implementation

We're currently testing prototypes in real environments.

While still early in implementation, the mathematical verification works exactly as expected – it correctly identifies which predictions can be trusted. The next phase involves refining the user experience and expanding the verification capabilities to more data management scenarios.

Let’s sum up

Trust remains the biggest hurdle to true automation in data management. Our combination of ML with mathematical verification gives us a practical path forward that doesn't force you to choose between speed and accuracy.

The human role will always matter in data management, but we can now direct that expertise where it adds the most value – not on routine tasks that verified AI can handle reliably.

When you can trust your ML predictions, you can finally automate with confidence.