Is Your Data Good Enough for Agentic AI?

Subscribe to the Guest Blogs blog

ABSTRACT: Most organizations believe their data needs to be perfect before they can start with agentic AI. That belief is slowing them down. This article argues that the standard most organizations are holding themselves to — complete, clean, and fully modeled data — is reasonable in principle but becomes a blocker when applied too strictly. It introduces the concept of "good enough" data, a threshold defined not by perfection but by whether the data is interpretable enough for an agent to act on reliably.

Can your existing data conditions support agentic AI? This is a question data leaders like you are asking right now.

Organizations have always managed by working around the gaps. Leaders and their teams make decisions with incomplete data, and more often than not, they get it right. Analysts reconcile numbers across systems, data engineers trace where figures diverge, and teams fall back on colleagues who carry the institutional knowledge the data never captured. It is that collective effort of stitching together data and context from across the organization that keeps things moving.

But today, enterprises want to embrace agentic AI. The stakes for data quality are higher than ever. In this race, leaders face a dilemma — they hold themselves to a standard of perfect data before they begin, and that standard blocks their ambitions of automating data analysis and business processes with agents. Unlike traditional BI tools, dashboards, and early generative AI assistants, which surface information or answer questions but leave the acting to humans, agents take the automation even further. They can act autonomously without human intervention.

For example, rather than telling a procurement manager that inventory is running low, an agent identifies the shortfall, evaluates suppliers, and initiates a purchase order without waiting for instructions. Agents execute multi-step workflows, interact with external systems, and make decisions to deliver a significant ROI to the business.

For agents to do that reliably, the data they work with needs to meet a certain threshold. That threshold is what we call 'good enough' data. This article explores what that actually means in practice.

Common Assumptions About Data Readiness

When planning agentic systems, organizations believe that certain foundational elements need to be in place before meaningful work can begin. Business leaders and sponsors set the expectations, they define what success looks like and the conditions they need to feel confident investing in the resources necessary to make agentic AI successful in their organization. Data engineers, AI engineers, and data architects set the requirements. And they scope the data conditions and technical standards to build a reliable data foundation. These positions are reasonable. The challenge arises when they are applied so strictly that they become blockers rather than guideposts.

The data must be complete and accessible

Before committing resources to an agentic system, business leaders and sponsors want to ensure the agent has access to every relevant data source. For example, having access to order management, inventory, and customer records. For them, incomplete access is a risk. If the agent is working from a partial view, the outcomes cannot be trusted and accountability for those outcomes becomes difficult to establish. The expectation, therefore, is that all relevant systems must be connected and available before the agent can operate meaningfully. Incomplete access genuinely does limit what an agent can do. But data sources are always changing, expanding, and evolving. Waiting for total access before starting means the project may never get off the ground.

The data must be clean and standardized

This requirement runs across both roles in an organization but for different reasons. Business stakeholders want outputs they can trust and present with confidence. Data engineers know from experience that building on inconsistent data creates compounding problems downstream. Duplicate records and missing values that are manageable in a report become critical failure points for an agent. Both arrive at the same position: the data needs to be clean and standardized before implementation begins. No production data environment ever fully reaches that standard though, and waiting for it means waiting indefinitely.

The data must be modeled and structured

Data architects and engineers drive this requirement from a technical standpoint. Before they build, they want clear definitions and well-connected relationships between data elements locked down. Without that structure, the system will interpret the same data differently depending on where it pulls from. In turn, the system will produce inconsistent results that are difficult to diagnose, and even harder to fix, once the system is in production. Data models are never fully complete though, they evolve as the business evolves. Treating the model as a prerequisite rather than a parallel workstream can stall an initiative before it begins.

Rethinking Data Readiness

The question is not whether the standards for data readiness are worth pursuing. They are. The question is whether they need to be fully met before an organization can start. It is a distinction Matt Gordon and Carlos Bossy drew directly in our recent Datalere webinar, "Pitfalls to Avoid When Planning Agentic AI Products."

Matt Gordon acknowledged that data environments are always messy, incomplete, inconsistent, and rarely fully visible. Building on that, Carlos Bossy offered a practical test: “Can you semantically define your data well enough so that a human could use it and get the right answer in a simple use case? If yes, it is ‘good enough’ for an agent as well. If a human can understand it, the agent can understand it too.”

This introduces a shift in how organizations evaluate data readiness:

The three questions on the right are not a new checklist. They are a way of testing whether the data you already have passes the “good enough” threshold. Here are some examples of what that looks like in practice:

Making Data Interpretable

In a SaaS organization, a customer success team and a finance team both track churn but define it differently. Customer success marks a customer churned when they stop logging in to the platform or drop below a minimum usage threshold for its core features. Finance records churn when a contract is formally canceled. At once, a customer can be disengaged for months, yet still paying, and appear active in one system and churned in another. A team member pulling a churn report already knows which definition applies to their purpose. They navigate that ambiguity intuitively. Semantically defining that distinction for the agent is what makes the data interpretable and gives the agent a consistent basis to work from.

Defining the Context

A finance team wants an agent to support their month-end close process. Their financial reporting tool pulls revenue figures directly from a transaction system that records every sale the moment it processes, including pending fulfillment, initiated returns, and adjustments awaiting approval. A finance analyst knows immediately that those revenue figures do not reflect realized revenue for month-end close. They know which data source captures only cleared and verified transactions and why that is the right input for this specific process. Semantically defining that distinction — which version of revenue, for which process, from which source — is what makes the data actionable for the agent.

Establishing Meaning Through Relationships

A hotel operations team wants an agent to identify why revenue per available room — RevPAR — is underperforming at a specific property. A senior analyst does not look at RevPAR in isolation. They know that a low RevPAR could reflect heavily discounted rates pushed through third-party booking channels, a shift in the mix of room types being sold, or a drop in average length of stay. Each of those explanations lives in a different data set: rate data, booking channel data, reservation records. The analyst already knows how those data sets connect and what each one explains. Semantically defining those relationships of how rate data, channel data, and reservation records combine to explain RevPAR gives the agent the same navigational ability the analyst already has.

Making “Good Enough” Work in Practice

Having the right data foundation is only part of the equation. How agentic systems are built on top of that foundation determines whether “good enough” data actually delivers reliable outcomes.

The starting point is understanding how large language models behave by default. As Matt Gordon observed, these models are not naturally going to consider a broad range of data or multiple points of view. They will look at the smallest possible sample set to give a response. Left unchecked, that tendency leads to hallucination, outright errors, and outputs that are overly optimistic in tone. Building agentic systems means engineering around those risks from the start.

In practice, that engineering takes several forms:

Internal validation — “True agentic systems,” as Matt described them, implement what he called “chain of debate” and “multi-model generation and audit.” These are mechanisms that do internal data validation and ask whether an answer makes sense before it ever comes out of the agentic tool. Rather than returning the first plausible result it finds, the system interrogates its own output across multiple models before surfacing a response. That internal scrutiny is what separates a reliable agentic system from one that confidently returns the wrong answer.

Test-driven design — Matt’s advice was direct: “Have your problem defined and build your tests first.” By that he meant knowing the anticipated outcome before writing a single line of code, such as how the process will be instrumented and measured, and where the observability sits so the team knows whether they are getting the right results and why they are not. Enterprises, he noted, have spent years over-promising and under-delivering on technology by chasing what is new rather than what solves a defined business problem with a measurable outcome. “You have to measure before you build, ”Agentic AI is no different.

Matt summed up the relationship between data quality and system design plainly: “Get your data in shape, and you will spend less time engineering around the next wrong thing your AI does.”

Where This Leaves Organizations

Strong data foundations continue to shape how effectively systems operate. As data becomes more complete, consistent, and well-structured, both interpretation and outcomes improve.

“Good enough” is not a compromise, nor is it the ultimate goal. It marks the point where data already supports meaningful outcomes, even as teams continue to refine definitions, extend relationships, and optimize systems. For organizations introducing agentic AI, it is the right starting point.

In some cases, a limited set of clearly defined variables is all a team needs to begin with to support a reliable decision. In others, gaps in context or inconsistencies in meaning will limit how far the system can go without additional structure. The threshold is not fixed. It moves with the use case and the level of clarity in the data itself.

The data does not need to be perfect to start. What matters is where it already supports meaningful action, and building a thoughtful system around it while continuing to strengthen the underlying foundation.

Want to go deeper? Watch our webinar 'Pitfalls to Avoid When Planning Agentic AI Products' featuring Matt Gordon and Carlos Bossy."

This piece was written by the author with support from AI tools for drafting and refinement. The perspectives and interpretations are based on the webinar discussion and the author’s own editorial judgment.

Abdul Fahad Noori

Fahad enjoys overseeing all marketing functions ranging from strategy to execution. His areas of expertise include social media, email marketing, online events, blogs, and graphic design. With more than...

Talk to Us

No Results Found