GEO: how to be cited by ChatGPT, Perplexity, Claude, and Google AI Overviews in 2026

TL;DRSearch has split. There is still a traditional search engine returning ten blue links to a query, and that traffic is not zero. But increasingly, the answer to the query never opens a tab.

Search has split. There is still a traditional search engine returning ten blue links to a query, and that traffic is not zero. But increasingly, the answer to the query never opens a tab. It is generated inside the AI surface — a conversational answer with citations. The question that matters now is not "how do I rank for this term?" but "how do I get cited inside the answer?". That discipline is Generative Engine Optimisation, and it is meaningfully different from SEO as it was practised for the last fifteen years.

This piece lays out what we've learned running GEO across our own domain network — the heuristics, the structural moves, the schema discipline, and the things that turn out to be either over-rated or counter-productive. It assumes you know the basics of SEO; if you don't, GEO is a much harder lift on top of unsolved fundamentals.

What an AI search engine is actually doing

Strip the marketing away and a generative AI search engine does a fairly mechanical thing on each query:

Decompose the user's question into sub-queries.
Run those sub-queries against a search index — sometimes a custom one, sometimes a major search engine's API.
Retrieve a handful of pages or page sections per sub-query.
Synthesise an answer from the retrieved material, ideally with citations back to source pages.

Your job in GEO is to be the page that gets retrieved at step 3, and the page that gets cited at step 4. These are related but separable problems. You can be retrieved without being cited (the model decides your section isn't useful), and you can be cited without being retrieved (the model already "knows" your content from its training data).

The optimisation surface is therefore: be in the index, be retrievable for the relevant queries, and be the most useful piece of source material for the model to lean on when it composes the answer.

The structural property that matters most

Across the AI surfaces, the single property that correlates most strongly with citation-rate is extractability: how easy it is for a model to pull a clean, self-contained answer to a specific question out of your page.

That sounds obvious. Most pages still fail at it. They bury the answer in conversational prose halfway down. They cover three loosely related questions on the same page without clear delineation. They reference things "as discussed above" or "see below" — phrases that don't survive being chunked into a 500-token retrieval window.

The fix is structural. Every page should have one primary question it answers and a small set of explicit secondary questions. The answers should be in self-contained sections with strong headings. The headings should literally be the questions, or close to them. The first paragraph after each heading should give a complete answer that survives extraction. Detail comes after, not before.

Schema as the disambiguator

JSON-LD schema markup is no longer optional. AI search engines parse it heavily as a disambiguation signal. The schema types that move the needle:

Article + Person author — every long-form piece. Author profile must be substantial, with sameAs links to other authoritative surfaces about the same person.
FAQ schema — for any page that contains a meaningful Q&A block. The answers in the schema should match the answers on the page, not be summarised away.
HowTo schema — for any genuinely procedural content. Use it sparingly and accurately — it's a strong citation signal but a strong de-ranking signal when misused.
Speakable schema — call out the sections of the page most suitable for spoken answers. Voice surfaces care about this.
Organization + WebSite — site-wide entity context.

The discipline is to keep the schema honest and current. Schema that doesn't match the page is detected and penalised. Schema that goes stale because the page got updated and the schema didn't is the most common failure mode.

Authority signals in 2026

Old-school authority signals (backlinks, domain age, traffic) still matter in the underlying search index that AI surfaces query. New signals matter more for citation:

Entity binding — is your organisation, your authors, your products represented as actual entities in the knowledge graphs the AI surfaces draw from? This means consistent sameAs across LinkedIn, GitHub, Wikipedia where applicable, your own About pages, your client-facing surfaces. The more consistent, the stronger the binding.
Citation density on the open web — being mentioned by other authoritative sources on the topic, by name, with context. This is hard to game and the AI surfaces know it.
First-party data and original analysis — pages that present novel data, charts the model can quote, and conclusions not present elsewhere are disproportionately cited. The model is biased toward sources that contribute new material rather than rehearse existing material.
Author authority — the AI surfaces are getting noticeably better at evaluating who wrote a piece. Consistent author identity, authoritative author profile pages, real credentials matter.

The freshness question

Different AI surfaces handle freshness differently. Some rely heavily on training-data knowledge with sparse retrieval; others lean hard on real-time retrieval. The robust strategy is to be retrievable and to keep your content current. "Last updated" dates that are honest and recent are read as a freshness signal across most surfaces.

The discipline that pays here: publish less, update more. A small number of canonical pieces, kept genuinely current, will out-perform a sprawl of unmaintained content over a year. The unit of compounding is the page that keeps getting better, not the new page added to the pile.

What we measure

Measuring GEO is harder than measuring SEO because the AI surfaces don't expose dashboards the way Search Console does. The metrics we run with:

Citation tracking — manual queries on a panel of representative questions across the major AI surfaces every two weeks. Look for explicit citations of our domain. Track over time.
Brand-mention tracking — monitor whether AI answers reference our brand or our authors by name even without an explicit citation. "As Josh Weir notes…" is a strong signal even if no link is present.
Referrer logs from AI surfaces — these are spotty (some surfaces send referrers, some don't), but they're real signal when present.
Topic-level retrievability — for our key pillar topics, run the kind of queries a target reader would type, see what gets cited, see whether we're in the panel of sources at all.

This is more qualitative than SEO measurement. Get used to it. The trend is what matters; obsessing over the spot-reading on any given day will drive you mad.

What does not work

Spammy GEO content factories — auto-generated pages targeting AI surfaces. These get filtered fast. The AI search engines have gotten good at detecting machine-generated content; they de-rank or exclude.
Cramming keywords — irrelevant in the GEO era. The surfaces optimise for semantic relevance, not term frequency.
Hidden text or cloaking — works briefly, gets caught, then your domain takes a hit. Not worth it.
Treating GEO as separate from SEO — they share the same retrieval index foundations. Good SEO is the floor; GEO is the layer on top.
Ignoring the audio surface — voice queries are growing. Pages that work in spoken-word answers (clear, short, structured) get cited more by voice surfaces than text-dense competitors.

The minimum viable GEO checklist

Every important page targets one primary question and a small set of clearly delineated secondary questions.
Headings are the questions. First paragraphs are complete answers.
Article + Person + FAQ schema deployed and validated.
Author profiles substantial, with consistent sameAs across the open web.
Last-updated dates honest and current.
One canonical piece per topic; updated rather than duplicated.
Original data or original synthesis on each piece — not just a rephrase of consensus.
Citation tracking running every two weeks across a query panel.

How chunking actually works inside the AI surfaces

The mental model that makes all of this make sense is that the AI surface is not retrieving your page; it is retrieving chunks of your page. Each surface has its own chunking strategy — typically 200–800 tokens per chunk with some overlap — and each chunk is independently scored against the query for relevance. The chunk that wins is the one cited.

This implies a structural rule that most content authors miss: each meaningful section of your page must be self-contained at the chunk level. Pronouns that refer back to earlier sections, statements that depend on earlier definitions, conclusions that rest on premises stated three headings ago — all of these survive the human read but get destroyed by chunking.

The fix is not to repeat yourself constantly. The fix is to write each section so that a reader landing on it cold can extract a useful answer. Define terms inline when they're load-bearing. Restate the question the section answers in the first paragraph. Use specific subjects rather than "it" and "this". The same discipline that makes content extractable for AI also tends to make it more usable for human skim-readers; it's not a tax, it's better writing.

The publishing cadence question

One of the more interesting consequences of GEO is that traditional content-velocity playbooks stop working. The SEO content-factory move — publish a thousand thin pages targeting long-tail keywords — backfires. The AI surfaces actively de-rank thin content. They prefer fewer, better, more comprehensive sources.

The cadence that works in 2026 is something like: ship a small number of canonical pieces per pillar (five to ten per pillar over a year), and spend the rest of your editorial budget on keeping those pieces current and deepening their authority. Each canonical piece becomes an asset that compounds. Every update is logged via the last-updated property; every fresh data point added strengthens the piece's standing.

This is harder than the volume game, because it forces you to actually have something useful to say on each topic. The teams that win at GEO are typically the teams that already had topical authority and just needed to translate it into the new shape. The teams that lose are the ones who optimised for SEO volume and built a sprawl of mediocre pages they now need to consolidate.

The shift from SEO to GEO is real but not catastrophic. The fundamentals that made content useful to humans still make it useful to AI surfaces — clarity, structure, original insight, honest authority. What changes is the level of structural discipline required to make extraction reliable, the seriousness of the schema layer, and the new cadence of measurement.

The teams that win the GEO era will be the ones that treat each canonical piece as a living asset, optimised for both human reading and machine extraction, with the entity layer maintained as carefully as the editorial. That has been true in SEO for a long time. It is more true now.

Audit your GEO readiness Want a structural review of your top pages for AI-citation readiness — schema, extractability, authority signals? Book a content audit and we'll map the lift. Run the audit →