Most "AI limitations" articles are written defensively, with caveats that protect the vendor more than the reader. This one is the version I wish I'd had when I started running an AI agent marketplace. Ten real failure modes, what causes them, and what to do about each. Including the 2026-specific challenges (the Amazon Agent Policy, the Perplexity lawsuit, the Ads MCP integration risks) that didn't exist a year ago.
This is the honest counterweight to the rest of this hub. Agents are genuinely useful (the rest of the hub covers that). They also fail in specific, knowable ways. Knowing the failure modes upfront is what separates sellers who scale their AI use from sellers who get burned.
The agent confidently produces output that isn't true. Invents a product feature. Cites a fake competitor. Quotes a statistic that doesn't exist. Hallucinations are the most-named AI limitation, and they're real, but they're not the most common cause of bad agent output in production.
What causes it: the model is predicting plausible text, not retrieving facts. When asked something its training data doesn't cleanly cover, it generates a confident-sounding guess.
What to do: use agents that ground facts in tool calls instead of model memory. Demand that agents cite where data came from. For high-stakes output, human-in-the-loop review.
SP-API times out. The Amazon Ads API returns rate-limit errors. A third-party data provider serves stale data. The agent treats the bad response as truth and proceeds.
What causes it: the agent doesn't know how to recognize and recover from tool failures. Poor error handling in the orchestrator.
What to do: only use agents whose vendors can explain their retry logic and graceful degradation. For Amazon SP-API specifically, vendors should handle 429 (rate limit) and 5xx errors with backoff. Ask the question explicitly.
A loop that doesn't terminate. An agent that calls the same tool 50 times because it's stuck. A retry mechanism that doesn't back off. One bad run that costs $50 in model tokens instead of $1.
What causes it: missing step limits in the orchestrator. No timeout on tool calls. Poor exit conditions.
What to do: every production agent must cap iterations (typically 10-25 per run). Vendors should publish max-iteration limits. If pricing is per-token instead of per-run, demand a hard cap.
The agent gradually optimizes for the wrong thing. Starts optimizing keyword density when it was supposed to improve conversion. Ends up gaming the metric instead of producing the outcome.
What causes it: the original goal isn't re-stated each iteration. The agent's most recent feedback drowns out the top-level objective.
What to do: ask vendors how they prevent goal drift. Strong system prompts keep the goal at the top of context in every iteration. Anthropic's Building Effective Agents covers this in detail.
The agent produces output that sounds right but is subtly wrong. The bullets are well-written. The keywords seem reasonable. Three weeks later you realize the agent confused two of your product variants and your listings have been mis-targeted.
What causes it: models are trained to produce fluent output. They aren't trained to flag uncertainty. You get plausible-sounding answers regardless of confidence level.
What to do: spot-check agent output even after the pilot ends. Schedule monthly reviews of agent runs. Don't let "the agent is working" become "the agent is invisible."
A long-running agent accumulates context until it exceeds the model's context window. Older information gets dropped silently. The agent loses track of the original goal partway through.
What causes it: no summarization between iterations. No selective retrieval to keep context lean.
What to do: for short-run agents (one-shot tasks like listing optimization), this rarely matters. For long-running agents (multi-day investigations, multi-hour audits), ask the vendor how they handle context compression.
Your brand voice document gets included in a model API call. The model provider logs it. In rare cases (mis-configured RAG), one customer's data leaks into another customer's context.
What causes it: sloppy data handling at the vendor level. Sharing model API keys across tenants. Not isolating customer data in retrieval systems.
What to do: ask vendors about data isolation. Read their privacy policy. For sensitive data (private financial models, unreleased product info), avoid agents that train on customer data or share data across tenants.
You build workflows around Vendor A's agent. Vendor A raises prices, deprecates a feature, gets acquired, or changes the product direction. Switching means re-onboarding to Vendor B, retraining yourself, possibly losing accumulated agent memory.
What causes it: proprietary agents and proprietary memory formats.
What to do: prefer vendors that export your data on request. Pay-per-run pricing (which doesn't lock you in via subscription) reduces switching cost. Don't put all your eggs in one vendor's basket, especially for critical workflows.
Amazon's Business Solution Agreement was updated effective March 4, 2026 to include explicit rules about how AI agents can act on a seller's Amazon account. The policy defines what counts as an "Agent," what permissions agents need, and what actions require explicit seller approval.
What it means in practice: agents that auto-execute actions on your behalf (auto-bidding, auto-pricing, auto-messaging) now need to comply with stricter permission models. Some older third-party tools became non-compliant overnight when the policy took effect.
What to do: ask every Amazon-seller agent vendor: "Are you compliant with the March 4, 2026 Amazon BSA Agent Policy?" If they don't know what you're talking about, walk away. If they say yes, ask how. We'll cover the policy in detail in our Amazon-specific hub (launching as Stage 3 of this rebuild).
In November 2025, Amazon filed suit against Perplexity over Comet, Perplexity's agent that browsed Amazon on behalf of users without authorization. The case is widely covered (search "Amazon Perplexity Comet lawsuit"), and it set the stage for the March 4, 2026 Agent Policy.
What it signals: Amazon is willing to legally pursue agents that act on its platform without explicit authorization. The risk isn't just account suspension. It's now potentially legal liability for vendors operating outside Amazon's policy framework.
What to do: stick with agents that use authorized API access (SP-API, Amazon Ads API, MCP-based access via the Amazon Ads MCP Server). Avoid agents that scrape Amazon pages or simulate user actions without proper authorization. The cost of "saving money with an unauthorized agent" is much higher than the savings.
This isn't a technical limitation. It's a strategic one. AI agents handle specific repetitive tasks within roles. They don't replace the human judgment that decides which products to launch, how to respond to a crisis, what brand voice to develop.
Sellers who shrink their team because "AI will handle it" tend to discover the agent's outputs need more review, not less, and the team they kept is now overworked. The right frame is "AI handles execution work, humans handle judgment work, and the team shifts toward more judgment work."
Quick reference. The ten failure modes, by severity and frequency.
| Failure | Severity | Frequency | Mitigation |
|---|---|---|---|
| Hallucinations | Medium | Medium | Ground facts in tools |
| Tool failures | High | High | Vendor retry logic |
| Cost runaway | High | Low | Step limits |
| Goal drift | High | Medium | Goal re-statement per iteration |
| Confident wrong | High | Medium | Spot-check schedule |
| Context overflow | Medium | Low (short runs) | Summarization |
| Data leaks | Very high | Low | Vetted vendors, data isolation |
| Vendor lock-in | Medium | Always present | Data export, pay-per-run |
| BSA Agent Policy | Very high | Pre-deploy check | Compliance verification |
| Unauthorized agents | Very high | Avoidable | Use authorized APIs only |
AI agents in 2026 are useful, real, and worth deploying for the right jobs. They also have failure modes that aren't going away. The right posture is "informed user," not "true believer" and not "skeptic." Use them where they work. Plan for the failures. Don't let vendors talk you out of the safety habits.
The most common production failure is not hallucinations but tool failures: SP-API timeouts, Amazon Ads API rate-limit errors, stale third-party data. Other high-severity failures include cost runaway (loops that don't terminate), goal drift (the agent optimizes for the wrong thing), confident-wrong output (sounds right, subtly mis-targeted), and context overflow on long runs. Plan for these explicitly rather than assuming they won't happen.
Hallucinations happen because the model predicts plausible text rather than retrieving facts. When asked something its training data does not cleanly cover, it generates a confident-sounding guess. Mitigate by using agents that ground facts in tool calls (SP-API, Amazon Ads MCP) instead of model memory, demanding source citations, and requiring human-in-the-loop review for high-stakes output.
Amazon's Business Solution Agreement was updated effective March 4, 2026 to add explicit rules on how AI agents can act on a seller's Amazon account. The policy defines what counts as an 'Agent,' what permissions agents need, and what actions require explicit seller approval. Some older third-party tools became non-compliant overnight. Always ask vendors directly: are you compliant with the March 4, 2026 BSA Agent Policy?
In November 2025 Amazon filed suit against Perplexity over Comet, Perplexity's agent that browsed Amazon on behalf of users without authorization. The case set the stage for the March 4, 2026 BSA Agent Policy. The signal: Amazon is willing to legally pursue agents that act on its platform without explicit authorization. Stick with agents that use authorized API access (SP-API, Amazon Ads API, the Amazon Ads MCP Server), not page-scraping or unauthorized browser automation.
Every production agent must cap iterations, typically 10-25 per run. Tool calls need timeouts. The orchestrator must have explicit exit conditions. Ask vendors to publish their max-iteration limits. If pricing is per-token instead of per-run, demand a hard spend cap so a single broken loop cannot turn a $1 run into a $50 run.
Every SellerShorts agent has step limits, capability scoping, observability, and Amazon-policy-aware design. The failure modes on this page are what the marketplace's infrastructure is built to protect against.
Browse SellerShorts agents