AI Pilots Are Easy. Decisions Are Hard. What the Gartner Supply Chain Symposium Made Clear.
May 7, 2026
•
7
mins

When Gartner's research and three years of enterprise deployments point to the same gap, the gap is real.
We were on the main stage at Gartner's Supply Chain Symposium in Orlando twice this week. May 4th with Suketu Gandhi from Kearney, May 5th with Ken Kodger and Suja Chandrasekaran. Gartner's program team wrote the session titles. They were: 'AI Pilots Are Easy. Decisions Are Hard. Here's the Infrastructure Gap.' And: 'The Exception Is the Rule: AI Teams That Handle the 80% No One Wants.'
Those titles are precise. AI pilots are easy because they are bounded — you pick a workflow, define success narrowly, control the inputs. An AI that handles the full invoice population, including the exceptions, including the verbal agreements your ops team made eighteen months ago that nobody wrote down — that is the decision problem. Most enterprise AI deployments are nowhere near it.
“More than 50% of generative AI projects fail. The failure mode is not the model. It is everything the model was supposed to work with.”
What Gartner's research confirms
Gartner's Predicts 2026 report is direct: by 2031, 60% of supply chain disruptions will be resolved without human intervention. By 2028, 40% of supply chains will use AI for proactive, data-driven cost management — up from roughly 5% today. The CSCO Essentials report names why those trajectories are hard: most organizations are dominated by ongoing pilots, poor data quality, and misaligned processes that limit the ability to scale AI. The ambition-to-action gap.
That language is exactly right. And it is not a model gap. The models are good enough. It is an information architecture gap. Most enterprises have not built the infrastructure that AI needs to take decisions rather than surface them.

When AI pilots fail, what actually failed
The May 4th session with Suketu was built around one diagnostic question: when a supply chain AI pilot works in the controlled test but fails to scale to production, what failed? After enough of these post-mortems, the answer is almost never the model. It is one of three things.
Data fragmentation: the AI was trained on clean test data and deployed against fifteen ERP instances, three TMS systems, and a carrier base that invoices in six formats. The model was fine. The data architecture was not. Missing context: the AI reads the invoice against the rate card but cannot read the email where ops agreed to a different detention calculation for this carrier during peak season. The error rate looks like a model problem. It is a context problem. No accountability architecture: when an AI agent takes a decision that costs real money, who is accountable? What is the audit trail? Most deployments were not designed to answer this. The enterprise reverts to human-in-the-loop because the alternative is operationally untenable.

The exception is not the edge case
Ken and Suja's May 5th session was about the exceptions — the cases where AI handles the happy path cleanly and falls apart the moment something is unusual. A carrier references a rate amendment never loaded into the TMS. A multi-leg shipment has accessorial charges governed by a contract clause nobody has located in two years. The exception queue builds because the AI knows something is wrong but lacks the context to resolve it.
The insight from that session: exceptions are not the tail. In freight audit and payment at enterprise volume, the exception rate before an intelligent system is trained on your data runs between 40 and 60 percent of invoices. An AI that handles the 40% of clean invoices beautifully while routing the other 60% to a human queue has not automated the process. It has sorted the easy cases.
“The exception is not the edge case. In enterprise freight, it is most of the volume. Routing 60% to a human queue is not automation. It is triage.”
What production looks like
At the companies where Freehand is operating, the recurring exception rate drops by 70% within the first 90 days. Not because the AI gets smarter. Because it has read enough decision traces — enough of the institutional reasoning that governs how your organization handles exceptions — to apply those patterns consistently. The cases that remain are genuinely novel. Those still need a human. But the recurring exceptions stop recurring, and the queue shrinks to the size it should always have been.
What the Symposium confirmed, across two sessions and a hundred hallway conversations, is that the industry understands the destination. The path is the infrastructure gap. And closing it is not a model choice. It is an architecture decision that most enterprises have not made yet.






