Perplexity cited my pages three times before I had any idea why, and the first two were the wrong claims. One answer attributed a multi-agent routing pattern to AIdeazz that I'd described as a failure in a post — the model pulled the workaround sentence and dropped the "this broke in production" context around it. That's when I stopped treating generative engine optimization as a marketing chore and started treating it as a data-modeling problem with the same rigor I'd give an API contract.
If you're a developer who has shipped something and you're allergic to AI hype, here's the honest framing: GEO is not SEO with new vocabulary. SEO optimizes for a ranked list of ten blue links where the user does the synthesis. GEO optimizes for a machine that does the synthesis for the user and decides whether your name survives the compression. The unit of success changed from "ranked position" to "did the model quote you correctly and attribute it." Those require different inputs.
The mechanical difference nobody states plainly
A search engine indexes your page and ranks it. A generative engine retrieves chunks, feeds them to an LLM, and the LLM writes a paragraph that may or may not name you. Three things happen in that pipeline that don't happen in classic search:
1. Chunking destroys context. Retrieval splits your page into 200–800 token fragments. If your claim and its qualifier live in different chunks, the model gets one without the other. My Perplexity misattribution happened because "we route latency-sensitive calls to Groq" survived as a clean chunk and "...but this caused state desync we later reverted" was 600 tokens away in a different paragraph.
2. The model needs a citation anchor. Perplexity, ChatGPT search, and Claude with web access prefer to cite a discrete factual statement tied to a clear source. Vague prose with no extractable fact gets summarized but not attributed. No attribution means no link, no referral, no authority signal.
3. Authorship is a signal, not decoration. When two sources make the same claim, the engine leans toward the one with structured authorship and a track record on a domain it has seen before. This is where most technical founders lose — they publish brilliant threads on platforms they don't control, and the platform gets the citation.
So the core of generative engine optimization is structured data, citations, and durable pages on domains you own. Everything below is how I implemented that for AIdeazz running on Oracle Cloud with a multi-agent stack, zero paid distribution.
Structured facts beat prose for retrieval
The single highest-leverage change: I rewrote my technical pages so every load-bearing claim is self-contained at the chunk level. Practically:
- Each claim carries its own qualifier in the same sentence or the sentence immediately after. Not "we use Groq for speed" but "we route Telegram agent calls under a 2-second SLA to Groq's Llama 3.3 70B; calls needing tool-use reliability go to Claude Sonnet, accepting ~1.8s added latency."
- Numbers live inline, not in a chart three paragraphs down. A chart is invisible to a text chunker.
- I added
FAQPageandTechArticleJSON-LD with the actual claim text mirrored inschema.orgfields, so the structured layer reinforces the prose layer.
Here's what the schema looks like for one of my agent pages:
{
"@context": "https://schema.org",
"@type": "TechArticle",
"headline": "Routing LLM calls across Groq and Claude in a multi-agent system",
"author": {
"@type": "Person",
"name": "Elena Revicheva",
"url": "https://aideazz.xyz/portfolio",
"knowsAbout": ["multi-agent systems", "LLM routing", "Oracle Cloud Infrastructure"]
},
"datePublished": "2024-11-02",
"dateModified": "2025-01-14",
"about": "LLM provider routing under latency constraints",
"citation": "Production deployment on Oracle Cloud Free Tier ARM instances"
}
Does JSON-LD directly make Perplexity cite you? Not provably — they don't publish their retrieval weights. But it does two measurable things. It gives Google's own AI Overviews a clean entity to attach, and AI Overviews referrals showed up in my Search Console within five weeks of adding it. And it forces you to state who the author is, what they know, and when the claim was last verified — which is exactly the metadata generative engines reward whether they read your JSON or your visible text.
Authorship signals: the part developers skip
Most technical founders treat the byline as an afterthought. The generative engines don't. A claim from "a person who demonstrably builds the thing" carries more weight than the same claim from an anonymous content page.
What I did concretely:
- Every page links author →
/portfolio, and the portfolio page links back to the real artifacts: the Telegram bot you can actually message, the WhatsApp agent, the GitHub. Bidirectional links between author and proof of work form an entity the engine can resolve. - I consolidated. Before, my writing was scattered across Medium, a Substack, and dev.to. The citations, when they came, named those platforms. I moved the canonical version of every technical post to aideazz.xyz and left pointer-stubs elsewhere with
rel="canonical"back to my domain. Within two months the Perplexity citations shifted from "medium.com/@..." to my own domain. sameAsin the Person schema linking the GitHub, LinkedIn, and the live agents. This is how engines disambiguate "Elena who builds agents" from any other Elena.
The lesson for a builder: you already have the strongest GEO asset there is — running software with your name on it. The job is making the link between the claim and the proof machine-readable. An LLM can't message your Telegram bot, but it can read "the author maintains a production Telegram agent at @AIdeazzBot" sitting next to the claim, and that co-location raises the odds your version is the one quoted.
Durable pages on domains you control
This is the part where I disagree sharply with the prevailing advice to "be everywhere." Be everywhere with pointers. Be canonical on what you own.
Generative engines re-crawl and re-rank. A claim cited today can vanish if the source moves, the platform changes its robots policy, or the post gets buried under platform reorganization. I've watched it happen: a dev.to post that Perplexity cited for six weeks stopped appearing after dev.to adjusted its tag pages. I didn't control the URL structure, so I couldn't fix it.
On aideazz.xyz I control:
- URL permanence. A claim's URL doesn't change. When I update the claim, I update
dateModified, not the path. - Robots and crawl access. I explicitly allow
PerplexityBot,ClaudeBot,GPTBot, andGoogle-Extendedin robots.txt. Yes — I let the AI crawlers in, because being crawlable is the price of being citable. If your strategy is to block them and be cited, you've designed a contradiction. - Server-rendered content. This bit caught me. Some of my pages were client-rendered React, and the AI crawlers got an empty shell. I moved the factual content to server-rendered HTML. On Oracle Cloud's ARM free-tier instances this was a 4-hour change and it roughly doubled the number of pages that showed up in retrieval-driven referrals.
The robots.txt fragment, for the skeptics who want the exact thing:
User-agent: PerplexityBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
There's a real tradeoff here. Letting GPTBot crawl means OpenAI may train on your content with no citation back. I decided the citation upside outweighs the training leakage, because my moat isn't the prose — it's the running agents the prose points to. If your moat is the text itself, block training crawlers and accept fewer citations. State the tradeoff to yourself honestly; don't pretend it isn't one.
What actually moved the needle, ranked
After roughly four months of this, measured by referral traffic and manual checks of how the engines describe AIdeazz:
1. Self-contained factual claims with inline numbers — biggest single effect. The misattribution problem disappeared once each chunk could stand alone.
2. Canonical consolidation onto my domain — shifted citations from platforms to me. This is reversible damage if you skip it; do it early.
3. Server-rendered content + open AI robots — turned invisible pages visible. Pure infrastructure, no writing.
4. JSON-LD structured data — measurable for Google AI Overviews, plausible-but-unproven for Perplexity. Low cost, so worth doing.
5. Bidirectional author-to-artifact links — slow but compounding; it built the entity over weeks, not days.
What did not help: keyword density, posting frequency for its own sake, and "comprehensive" long-form pages that buried the citable fact under 3,000 words of throat-clearing. The generative engines reward extractable precision, and a tight 900-word page with five clean claims out-cited a 4,000-word page every time I tested it.
The honest limits
I can't show you a clean attribution chart, because none of these engines give you one. Perplexity has no Search Console. My evidence is referral logs, manual spot-checks of how Perplexity and ChatGPT describe AIdeazz, and the disappearance of the misattribution. That's directional, not deterministic. Anyone selling you a "GEO score" is selling you a number they invented.
The other limit: this is unstable ground. The retrieval and ranking behavior of these engines changes without notice. The principles — own your domain, state facts cleanly, prove authorship, stay crawlable — are durable because they describe what any retrieval system needs. The tactics will shift. Build on the principles, treat the tactics as disposable.
Frequently Asked Questions
Q: Does adding JSON-LD structured data actually get you cited by Perplexity, or is it Google-only theater?
A: For Google AI Overviews the effect is observable in Search Console within weeks. For Perplexity it's unproven — they don't disclose whether they parse schema.org. I add it anyway because it costs an hour and it forces you to state author, claim, and verification date, which the visible text needs regardless. Treat it as discipline that happens to also be a signal.
Q: If I let GPTBot and ClaudeBot crawl, aren't I just training their models for free with no return?
A: Yes, that's the actual tradeoff and you shouldn't pretend otherwise. I accept it because my defensible asset is the running agents my content points to, not the prose. If your text itself is the product — proprietary research, paywalled analysis — block training crawlers and accept that you'll be cited less. Decide based on where your moat actually lives.
Q: How do I know my chunks are self-contained without access to the engine's chunker?
A: Paste any 300-word slice of your page into a fresh LLM with no other context and ask it to state your claim and its caveat. If it can't reconstruct the qualifier from that slice alone, your context is split across chunks and retrieval will lose it. I run this check on every load-bearing paragraph; it caught the routing misattribution after the fact.
Q: Is consolidating onto my own domain worth losing the existing reach of Medium or dev.to?
A: Keep the reach, move the canonical. Publish the full version on your domain, leave a shorter version or stub on the platform with rel="canonical" pointing home. You retain platform discovery and the citation accrues to a URL you control. The cost is a few minutes per post; the upside is that platform reorganizations can't kill your citations.
Q: Does any of this matter for a B2B tool with maybe 50 potential buyers?
A: Less than it matters for broad-audience content, but the buyers who do use Perplexity to research vendors get an answer that names you or names a competitor. With 50 buyers, every individual mention is high-value. The work is cheap enough — schema, server rendering, clean claims — that the threshold for "worth it" is low even at small scale.