At any large organization, if you’re involved with master data management (MDM), you’re constantly wrestling with supplier data flows, regulatory compliance, sustainability metrics, marketplace sharing... while you need to keep costs down and satisfaction up.
With machine learning (ML), you can automate repetitive, time-consuming tasks at incredible speeds, but trusting AI predictions gives you a real challenge.
When your ML model suggests incorrect categories for products or misidentifies relationships, those errors create downstream problems that can erase your efficiency gains.
After all, if you need to manually verify every AI decision, are you really saving resources?
At Stibo Systems, we’re combining the speed of ML with mathematical verification, enabling you to achieve true automation with the level of accuracy your business needs.
And in this blog post, I’ll share how we arrived there and how it works. You’ll get the context AND the takeaways.
Your data management workload probably feels like it's expanding faster than your ability to handle it. You're not alone. Today's data managers navigate a complex landscape where accuracy, speed and volume all compete for priority.
Traditional approaches to these challenges often rely heavily on manual processes, business rules and human verification.
This can work on a smaller scale, but when your data volumes grow – you get bottlenecks. A product manager who could once verify 50 new items per day now faces batches of thousands, making manual review impractical.
And it’s not just about saving time – it's about maintaining competitive advantage. When your competitors can onboard new products faster, update information more quickly and distribute data more effectively, they gain crucial market advantages. Manual processes simply can't scale to meet these demands without significant resource investments.
What makes these challenges particularly suited for technological intervention is their repetitive, pattern-based nature. It’s often about recognizing similarities, applying consistent rules and making evidence-based decisions.
Exactly the kind of work where ML excels.
ML aligns perfectly with data management's most labor-intensive tasks.
At its core, ML excels at pattern recognition, categorization and prediction – precisely what you need when managing large volumes of complex data. Take these, for example:
The efficiency gains can be remarkable. Tasks that once took days, can be done in minutes. And it’s not just speed:
Human categorization naturally varies between individuals and even by the same person at different times. ML applies the same logic consistently (statistically). You limit the variability that causes data issues downstream.
With traditional, rule-based systems, you need explicit reprogramming when your business conditions change. ML models, on the other hand, can spot shifting patterns in your data and adjust accordingly. This adaptability means your data management processes stay current with minimal intervention.
But these benefits come with an important caveat: The predictions are only as good as the model's accuracy. And that's where many data management teams hit a roadblock on their automation journey.
ML predictions come with an inherent uncertainty. While ML models are good at recognizing patterns, they don't give you the certainty of mathematical proof. This creates a trust challenge that can significantly limit automation potential.
Every ML model produces some level of inaccuracy, typically manifesting in two forms:
For data management, false positives create the bigger problem. When a model incorrectly categorizes a product or incorrectly matches two different customers, these errors propagate through your systems, giving you data quality issues that can affect business operations.
Many organizations respond to this challenge with a hybrid approach: using ML for initial processing, then manually reviewing the results. While it still beats fully manual processes, this approach:
The core issue isn't that ML makes mistakes – it's the lack of a reliable mechanism to verify which predictions you can trust.
Without knowing which predictions are certainly correct, you're forced to verify everything or accept a level of error in your master data.
This verification gap represents the critical barrier between assisted processing and true automation. And to close the gap, you need a fundamentally different approach that combines ML's pattern recognition capabilities with methods that can provide mathematical certainty about prediction accuracy.
After all: You don't need to know that your model is 95% accurate overall. You need to know which 95% of predictions you can trust.
This is indeed a difficult problem to tackle – one that varies in nature across use cases. But at Stibo Systems we’re relentless in solving such challenges, and to tackle this one, we’re starting with verification of AI Assistance Classification Recommendations – in research also known as Ontology (classification) Mapping.
Our research with the Technical University of Denmark (DTU) – one of the leading technical universities in Europe, also named best technical university in Denmark – has led to a breakthrough approach.
It combines the speed of ML with the certainty of mathematical verification.
When using AI to map between classification systems – for example, matching "Hand tools" in one product taxonomy to "Handheld tools" in another – ML models can make impressive predictions.
But those predictions always carry statistical uncertainty that can undermine trust. And trust, in this context, is kind of binary: Either you can trust your results, or you can’t (and have to double-check).
The key issue is consistency. How can you verify that the mapping relationships make logical sense across entire classification structures?
In our solution, we apply formal mathematical methods to validate that recommended mappings are logically consistent. Mathematical proof confirms which predictions are correct, not just statistically likely.
For this, we use specifically propositional logic and “Horn Clauses.”
What are Horn Clauses?
They might sound complex, but they're simply logical statements that follow an "if-then" pattern. For example, "if a product is a hammer AND hammers belong to hand tools, THEN the product belongs to hand tools." By applying these logical rules across classification systems, we can mathematically verify whether mappings make sense.
But you might wonder: If formal methods provide the certainty we need, why use ML at all?
The answer lies in computational efficiency. Applying formal methods alone to large classification structures would be too computationally expensive for practical use.
By combining approaches, we get the best of both worlds:
The hybrid approach dramatically reduces verification workload while maintaining accuracy, making true automation possible.
Understand all this at a deeper level: This work has been published in “The Practice of Formal Methods,” an essay in the honor of Cliff Jones.
The mathematical foundations are rock-solid – formal logic is either correct or incorrect, with no middle ground. The exciting part involves applying these principles to real-world MDM challenges.
Our implementation currently focuses on high-volume scenarios where verification creates significant bottlenecks.
For example, for retailers onboarding thousands of supplier items regularly, the ability to automatically validate AI-suggested categorizations creates tremendous value throughout the process.
But we work with companies in many industries, with potential use cases everywhere:
Any situation involving taxonomy alignment, classification matching or standardization across systems could benefit from verified predictions.
We're currently testing prototypes in real environments.
While still early in implementation, the mathematical verification works exactly as expected – it correctly identifies which predictions can be trusted. The next phase involves refining the user experience and expanding the verification capabilities to more data management scenarios.
Trust remains the biggest hurdle to true automation in data management. Our combination of ML with mathematical verification gives us a practical path forward that doesn't force you to choose between speed and accuracy.
The human role will always matter in data management, but we can now direct that expertise where it adds the most value – not on routine tasks that verified AI can handle reliably.
When you can trust your ML predictions, you can finally automate with confidence.