Who Gives a @!#$ About Schema?

Why structured data still matters in the age of AI search.

A recent article titled The Whole Point Was the Mess makes a compelling argument against the growing wave of “GEO” and “AI optimization” snake oil flooding the SEO industry.

On a lot of points, it’s right.

Large language models were trained on the chaos of the internet. They don’t need perfectly structured markup to understand human language. They can interpret slang, fragmented sentences, terrible formatting, and the beautifully unhinged mess that is the modern web.

That’s true.

But somewhere between “LLMs can read messy text” and “schema doesn’t matter,” the conversation went off the rails.

Because nobody serious ever claimed schema markup teaches ChatGPT how to read. OR that schema was a magic bullet for “GEO” success.

That was never the point.

The problem with the anti-schema argument is that it treats AI systems exist in a vacuum. But modern search and AI discovery systems are much larger than the LLM itself.

And while the model may not care about your schema, the systems feeding the model absolutely might.

AI Search Is Not Just “The Model”

The current discourse around AI search tends to flatten everything into one giant black box called “the LLM.”

But AI discovery systems are actually stacks made up of multiple layers:

  • Crawlers
  • Parsers
  • Entity extraction systems
  • Retrieval pipelines
  • Ranking systems
  • Vector databases
  • Knowledge graphs
  • Citation systems
  • Metadata layers
  • Context assembly systems
  • Safety and verification systems  

This matters because structured data can influence many of those layers.

Schema may not directly affect token prediction inside GPT-4. But it can help machines classify, organize, and connect information before the LLM generates a response.

That distinction is important.

Because the debate shouldn’t be:

“Does the transformer literally read JSON-LD?” 

The debate should be:

“Does structured information help machines retrieve, classify, trust, and organize content?”

And the answer to that is obviously yes.

Schema Was Never About Teaching Machines Language

This is where the anti-schema argument feels strangely reductive.

Schema isn’t valuable because machines are incapable of reading paragraphs. Machines can already do that.

Schema is valuable because ambiguity is expensive.

Take a simple example:

  • Apple the company
  • apple the fruit

Humans infer meaning instantly through context. Machines can too, but not always consistently, efficiently, or accurately at scale.

Structured data helps reduce uncertainty.

It helps systems distinguish:

  • products from articles
  • recipes from blog posts
  • events from local businesses
  • people from organizations
  • reviews from editorial content  

That becomes increasingly useful in:

  • search indexing
  • knowledge graph construction
  • shopping experiences
  • local discovery
  • citation systems
  • entity reconciliation
  • retrieval pipelines  

Schema is less about “understanding language” and more about understanding context.

Those are not the same thing.

The Web Was Messy. The Industry Is Trying to Organize It Anyway.

One of the central themes of “The Whole Point Was the Mess” is that transformers succeeded precisely because they learned from unstructured, messy data.

Fair enough.

But if messiness were truly the ideal state, why is every major AI company aggressively trying to structure the web afterward?

Why are they building:

  • knowledge graphs
  • entity layers
  • grounding systems
  • citation systems
  • product feeds
  • vector databases
  • metadata pipelines
  • retrieval frameworks
  • structured commerce ecosystems  

The models may have been trained on chaos.

But the products being built around them are obsessed with organization.

That’s not a contradiction. It’s an acknowledgment that retrieval quality improves when systems can better classify and connect information.

The internet is messy because humans are messy.

Machines still prefer structure.

Retrieval Systems Benefit From Structured Signals

This becomes especially relevant in retrieval-based AI systems.

Modern AI answers are increasingly powered by retrieval pipelines that:

  • locate documents
  • score relevance
  • identify entities
  • assemble context
  • determine source confidence
  • attribute citations  

Structured data can assist these systems by providing cleaner signals about what a page actually represents.

This is particularly important for:

  • ecommerce
  • medical information
  • recipes
  • events
  • local businesses
  • product specifications
  • reviews
  • authorship
  • FAQs  

Could a model figure all of this out from raw text alone?

Probably.

But “probably” becomes a dangerous word at internet scale.

Structured signals reduce friction.

They reduce ambiguity.

And they help machines organize information more reliably.

The Real Problem Is Fake GEO Certainty

Now to be fair, the original article correctly calls out a growing problem in the SEO industry: fake precision.

There’s a wave of vendors claiming things like:

  • “Schema increases AI visibility by 37%”
  • “This formatting boosts citation likelihood”
  • “Optimize your chunking for LLMs”
  • “AI readability scores”  

Much of this is recycled SEO malarky wrapped in AI terminology.

Generative systems are probabilistic. Outputs vary by:

  • prompts
  • retrieval state
  • personalization
  • timing
  • model updates
  • grounding systems
  • randomness  

Anyone claiming deterministic control over AI citations is overselling.

But that still doesn’t mean structure is irrelevant.

There’s a massive difference between:

  • “schema guarantees AI visibility”
    and 
  • “structured data improves machine interpretability”  

One claim is marketing nonsense.

The other is basic information architecture.

AI Search Is Still a Tiny Percentage of Traffic

There’s another practical problem with dismissing schema outright:

Most websites still get the overwhelming majority of their traffic from traditional search.

And traditional search absolutely relies on structured data.

Schema continues to support:

  • rich results
  • product listings
  • review stars
  • recipe carousels
  • local SEO
  • video indexing
  • merchant feeds
  • event visibility
  • knowledge panels  

Even if AI search becomes dominant eventually, that transition has not happened yet.

For most businesses:

  • Google still drives revenue
  • Shopping feeds still matter
  • Rich snippets still matter
  • Local visibility still matters  

AI search may be the future.

Traditional search still pays the bills.

The False Binary

The strangest part of the anti-schema narrative is the implied binary:

  • either LLMs read schema directly
  • or schema is useless  

But technology ecosystems don’t work that way.

Nobody argues that schema replaces:

  • authority
  • expertise
  • usefulness
  • quality writing
  • original insights  

Good content still wins.

But good content plus machine-readable structure is usually better than good content alone.

That’s not some radical “GEO” philosophy.

That’s just how information systems evolve.

So… Who Gives a @!#$ About Schema?

Search engines do.

Shopping systems do.

Knowledge graphs do.

Retrieval systems do.

Entity resolution systems do.

Local discovery systems do.

Citation systems probably do more than people realize.

And increasingly, the infrastructure surrounding AI systems does too.

No, schema is not magic.

It won’t force ChatGPT to cite you.

It won’t override weak content.

And it certainly won’t save bad websites.

But pretending structured data no longer matters because transformers can interpret messy language is like saying databases became obsolete because humans can read paragraphs.

The web may be messy.

Machines are still trying to organize it.

Get the latest trends, expert insights, and actionable strategies delivered directly to your inbox.

Sign up now and stay ahead of the competition.

Speaker requests

If you represent a conference, event, webinar, or podcast and are seeking our experts for your programming, please include topic(s), date and location of event, compensation, and details. We will pair you with the right expert if this is a good fit for us. 

Hear about Arc in 60 seconds.

Just as we leverage multiple mediums for client successes, this Arc audio promotion is another way to get to know us.

Give it a listen

Thought Leadership from Arc’s Experts