We have never fully committed to core data fundamentals. Until they’re fixed, AI adoption and success will underwhelm.
Kevin Roose in the NYT recently posted a lengthy article on AGI’s likely near-term arrival. AI models are continually improving. Reasoning layers are increasing model value. And the pace of change is accelerating. Those closest to AI are sounding the alarm that we’re not ready from a society or regulatory standpoint.
While those concerns are certainly valid, there’s other key issues that we still haven’t addressed. Specifically, we’re missing strong data management foundations in data quality, governance, and compliance. These are essential to wrangling and leveraging unstructured and structured data together, the key to trusted AI outputs. If we don’t address these issues, AI in all its forms will continually underdeliver.
In conversation after conversation, I’m hearing from data leaders that they are really struggling to consistently understand, access or manage the unstructured data that exists just within their four walls. Add the complexity of third-party data like firmographics for marketing and sales, supplier data for procurement, or recruitment data for HR, and the situation quickly spirals out of control
Missing the Boat on Unstructured Data
We’ve been wrestling with these issues for decades. I recently revisited an article I wrote almost 10 years ago (!), and the first thing mentioned is mapping data usage, which is still a huge gap when it comes to leveraging unstructured data.
Organizations need a better foundational approach to capturing, evaluating, and leveraging unstructured data that scales with the business. In a hoped-for future of widespread automated actions and most jobs having a co-pilot, our quality, governance, and compliance gaps significantly increase risk. And organizations are fully aware. Consider a garden variety AI use case….
- A bank uses machine learning and AI to comb through defaults. (structured data)
- They then analyze associated social media accounts and see if someone who defaulted went on vacation. (unstructured data)
- They then blend this data to create a score to act on.
This simple example is full of potential risks that keep lawyers up at night.
Can we trust the data sources? Is the data valid? Do we trust how we evaluated and transformed the data in making the score? Is it ethical to even use this data based on our customer and business policies? What other data (likely somewhere in a silo) is related to this client should we consider before acting?
If an AI agent automatically acted on this score without proper guardrails or context, the bank likely creates bias against a customer segment. This can impact relationships, reputation, and potentially expose the organization to regulatory actions and lawsuits.
Or consider how AI could be used in a procurement process. You might decide to pull the key information from MSAs and SOWs and push it into a vector database with a chatbot experience to help automate processes like the comparison of net payment terms or looking at renewal contracts and comparing the terms with old contracts.
But what if the vendor master data is outdated or incorrect? Is the most recent data from the MSA (unstructured) being reconciled to the vendor master (structured)? How do you know if the information you’re pulling from this unstructured data is different than your structured reference data or master data? And as additional entities and attributes are extracted from the documents, can you write back into your reference data or reference records?
Further, do we track the cost of getting the terms wrong?
These simple examples are exactly why legal departments and compliance teams are taking longer to move POCs into production. Without clear and consistent data management that ensures quality, governance and policy compliance, there’s just no way to know if the company is acting against its best interests or being exposed to legal and ethical implications.
This is a major root cause of why AI right now ends up being siloed to productivity-oriented use cases and less to revenue generating use cases.
The Data Wakeup Call
Companies need to look at the lack of unstructured data availability for AI as a wakeup call. They need coordinated councils at all levels to enable consistent data and AI literacy and policy enforcement. There needs to be ongoing monitoring and continual fine tuning since conditions change rapidly.
They also need to extend quality and cataloging processes beyond unstructured data. It must go back to your structured data as well so there is harmony across both types of data. You also need an ongoing learning system that can learn critical information from the unstructured data and feed it back into your reference sources that you’re building out.
We also need to ensure that the interaction data from any agent interfaces, along with the feedback data about the quality of results & answers of those agents, are captured and become a part of the data foundation. This is the true opportunity to create a Business 360 data strategy and platform. This is the utopia moment we have been speaking about for the last 25+ years.
Enabling a true Business 360 view will allow us to look at our business from outside in, inside out and beyond. This includes strategic analytics to see forward, operational analytics to know today’s performance, and process insights to tweak execution and meaningfully steer an organization towards its ambitions.
That’s a tall order for even the most experienced data team. Luckily, there’s a way forward.
The Path to Trust – Data and AI Control Towers
A data and AI control tower approach streamlines the ability to scale data and AI with confidence. It provides crucial visibility and scalable automation for quality, compliance, and policies that keep pace with data. With a control tower approach, teams have a timely understanding of how all data is being moved and transformed, its quality score, how it’s being used across the business and if/how policies are being implemented. With this visibility, teams can:
- Design and execute strategies that automatically enforce policies
- Flag and remediate data quality and governance issues against those policies
- Be alerted to model performance anomalies so humans can immediately intervene
- Enable real-time monitoring of model performance for fine tuning
- Design business impact driven controls and alerts to create visibility of the end-to-end data journey, regardless of whether its stitched together or held together with duck-tape!

Data management strategies, especially for unstructured data, haven’t kept up with AI’s pace of change. Companies need to take a step back and understand where, how and why data is being used, and be able to track the changes along the way. Only with those foundations in place will everybody in the business feel comfortable using AI in their daily work.