To get cited across Perplexity AI, ChatGPT Search, and Google Gemini, you need three separate technical setups — one per platform — plus a shared content foundation. Allow PerplexityBot, OAI-SearchBot, and Google-Extended in robots.txt so each AI crawler can reach your pages. Verify your site in Bing Webmaster Tools and submit your sitemap — ChatGPT Search draws exclusively from Bing's index, not Google's. Then build the shared content layer: question-format headings followed immediately by 40–70 word direct-answer paragraphs, named author attribution with verifiable credentials, specific statistics with named sources, and FAQPage + Article schema markup. Each platform then has additional weights: Perplexity rewards factual precision and recency; ChatGPT Search rewards publisher transparency and task-completion framing; Google Gemini rewards E-E-A-T signals and Knowledge Graph entity recognition. A Princeton/Georgia Tech/Allen Institute study found proper GEO implementation boosts AI visibility by up to 40% — and AI-referred sessions grew 527% year-over-year in H1 2025.
How AI Search Engines Select Sources
The four-stage source selection process common to all three platforms
Generative Engine Optimisation (GEO) is the discipline of structuring, formatting, and signalling content to maximise its probability of being selected as a cited source in AI-generated answers produced by Perplexity AI, ChatGPT Search, Google Gemini, and other LLM-powered search systems. A Princeton, Georgia Tech, and Allen Institute of AI study published at ACM SIGKDD 2024 demonstrated that properly implemented GEO tactics can boost AI visibility by up to 40%. AI-referred sessions jumped 527% year-over-year in H1 2025 (Previsible AI Traffic Report). Sources: Aggarwal et al., ACM SIGKDD 2024; Previsible AI Traffic Report, 2025.
The four-stage AI source selection process
Intent classified as informational, comparative, transactional, or conversational. Determines which source types are eligible.
Sources fetched from platform's index. If you're not indexed, you're invisible — regardless of quality.
Candidate sources ranked by relevance, credibility, and structure. Domain traffic is the strongest single predictor (SHAP 0.63).
Top sources passed to LLM. Pages with organised headings are 2.8× more likely to earn citations (AirOps).
When I started tracking AI citation patterns in May 2024, the conventional wisdom was that ranking position 1–3 on Google was a reliable proxy for getting cited. In my first 30-site audit cohort, I found 11 out of 30 sites ranking in Google's top 5 had zero Perplexity citations — almost always because PerplexityBot was blocked in robots.txt. Technical access comes first, content quality second.
The Three AI Search Engines: How They Differ
Index sources, citation algorithms, and audience profiles compared
🔍 Perplexity AI
- Index: Independent real-time web crawl + proprietary index
- Crawler: PerplexityBot
- Monthly queries (May 2025): 780M (+239% from Aug 2024)
- Primary citation driver: Factual precision, source credibility, recency
- Schema benefit: Low direct — focus on content quality
- User base: 80% graduates, 30% senior leaders (WARC)
💬 ChatGPT Search
- Index: Bing's web index + OAI-SearchBot crawl
- Crawler: GPTBot, OAI-SearchBot, ChatGPT-User
- Weekly active users: 800M+ (late 2025, TechCrunch)
- Primary citation driver: Bing authority, OGP metadata, publisher transparency
- Schema benefit: Medium — Article + FAQPage help
- AI referral share: 87.4% of all AI traffic (Conductor 2026)
✨ Google Gemini
- Index: Google's full web index (same as organic search)
- Crawler: Googlebot + Google-Extended
- AI Overviews: 25.11% of all Google searches (Conductor Q1 2026)
- Primary citation driver: E-E-A-T signals, schema, Knowledge Graph
- Schema benefit: Very high — FAQPage, HowTo, Article all matter
- Reach: 200+ countries, 40+ languages
Technical Foundation: Crawlability & robots.txt
Before any content work — make sure each AI crawler can reach your pages
Press Gazette research in 2025 found nearly 80% of top news publishers now block at least one major AI training crawler — often creating unintended citation gaps on their public content. No amount of content work compensates for being uncrawlable.
| AI Platform | Crawler User-Agent(s) | robots.txt Action | Priority |
|---|---|---|---|
| Perplexity AI | PerplexityBot | Ensure PerplexityBot is not blocked. Add explicit Allow for priority pages. | CRITICAL |
| ChatGPT / OpenAI | GPTBot, OAI-SearchBot, ChatGPT-User | Allow OAI-SearchBot + ChatGPT-User for Search citation. GPTBot can be blocked separately to exclude training data. | CRITICAL |
| Google Gemini | Googlebot, Google-Extended | Ensure Google-Extended is NOT blocked. It specifically feeds AI Overviews — Googlebot alone is insufficient. | CRITICAL |
| Microsoft Copilot | Bingbot, Adidxbot | Ensure Bing crawlers are allowed. Submit sitemap to Bing Webmaster Tools. | HIGH |
Across 150+ robots.txt audits, the most common mistake isn't deliberate blocking — it's an old User-agent: * Disallow: / rule left from a legacy migration, or a security plugin that added an aggressive crawl block. I once audited a SaaS company with genuinely excellent content that had zero Perplexity citations simply because Disallow: /blog/ under User-agent: * from a dev environment was never cleaned up. Five minutes to fix; months of lost citations.
Bing indexing: The hidden ChatGPT prerequisite
ChatGPT Search draws from Bing's index. Pages not indexed by Bing are invisible to ChatGPT Search regardless of content quality. ChatGPT accounts for 87.4% of all AI referral traffic (Conductor 2026) — Bing indexing gaps are an outsized strategic problem.
🔧 Bing Webmaster Tools — ChatGPT Search Setup
- Create a Bing Webmaster Tools account at webmaster.bing.com and verify ownership
- Submit your XML sitemap — monitor submission status (indexing within 1–2 weeks)
- Check Index Coverage for crawl errors, blocked pages, and soft 404s
- Use Diagnostics → Robots.txt Tester → test Bingbot user-agent on key URLs
- Use URL Inspection: confirm rendered HTML shows key content (not just JS shell)
- Implement IndexNow for real-time update notifications to Bing
- Expect 4–8 weeks for new pages to fully propagate into Bing's index
- Never block Bingbot or OAI-SearchBot via robots.txt or WAF — eliminates ChatGPT citation entirely
Optimising for Perplexity AI
Real-time crawl · Factual precision · Source credibility · Recency
Perplexity citation ranking factors
| Factor | Why It Matters | How to Optimise | Priority |
|---|---|---|---|
| Crawlability (PerplexityBot) | If PerplexityBot cannot crawl the page, it cannot be cited — the most fundamental requirement. | Allow PerplexityBot in robots.txt. Avoid JS-only content rendering. Ensure TTFB <800ms. | CRITICAL |
| Factual precision & specificity | Perplexity's core value prop is "answers with cited sources." It prefers specific, named statistics over vague claims. 94% citation accuracy — it can evaluate whether a claim is specifically attributed. | Include specific stats with source attribution. Replace "many companies" with named, cited data. Name the study and year for every quantitative claim. | HIGHEST |
| Content recency & freshness | Perplexity heavily weights recently published or updated content. Outdated content is systematically deprioritised. 40–60% of cited sources rotate monthly (Semrush AI Visibility Index). | Visible, accurate publication date and "last updated" date on all pages. Update statistics annually. Use dateModified in Article schema. | HIGHEST |
| Domain credibility | Domain traffic is the strongest single predictor of AI citation (SHAP value: 0.63, SE Ranking 2.3M-page study). | Build domain authority through authoritative backlinks and third-party brand mentions. | HIGH |
| Answer-first paragraph structure | Perplexity extracts inline citations from short, self-contained passages. | Key claims in 40–80 word standalone paragraphs leading with the main answer. | HIGH |
| Original research & primary data | Perplexity actively favours primary data — if a statistic lives only on your site, it has no choice but to cite you. | Publish original surveys and proprietary analysis. Label data clearly with methodology and sample size. | HIGH |
In monthly citation audits for 12 active clients, I see this consistently: sites that publish specific statistics with named sources — even on relatively low-authority domains — regularly displace higher-authority domains that publish vague generalisations. A mid-size B2B software site earned a Perplexity citation by publishing "47% of our 200-respondent customer survey preferred X" over a well-known industry publication that said "many businesses prefer X." Specificity beats authority more often than you'd expect on Perplexity.
🔍 Perplexity optimisation checklist
- PerplexityBot allowed in robots.txt with access to all key pages
- Every page has a visible, accurate publication date and "last updated" date
- Key claims include specific statistics with named sources
- Vague phrases like "research shows" replaced with named, cited sources
- Core content paragraphs are 40–80 words, leading with the main claim
- Article schema includes datePublished and dateModified properties
- At least one piece of original research per topic cluster
- TTFB under 800ms — Perplexity's live crawl has a short timeout window
- Thin pages with fewer than 400 words of substantive content are unlikely to be cited
- JavaScript-only rendering of core article content — Perplexity's crawler may not execute JS
Optimising for ChatGPT Search
Bing indexing · Browse tool · Footnote citations · Custom GPTs · Voice
ChatGPT vs ChatGPT Search: the distinction most SEOs miss
🤖 ChatGPT (Base Model)
- Answers from training data only — knowledge cutoff applies
- No real-time web retrieval — cannot cite current web content
- Cannot be influenced by SEO
- Memory-based citations are often hallucinated
- Used for writing, coding, brainstorming, conversation
🔍 ChatGPT Search (Browse)
- Augmented with real-time Bing web retrieval
- Live content from your pages is retrieved and synthesised
- Directly SEO-targetable
- Numbered footnote citations with verified source URLs
- Triggered for research, current events, product, and how-to queries
How the Browse tool works: the Bing-powered pipeline
ChatGPT Search is a retrieval-augmented generation (RAG) system built on GPT-4o combined with Microsoft Bing's web index. When Browse activates, it submits a search query to Bing, retrieves top-ranking pages from Bing's organic index — not Google's — and passes them to GPT-4o for passage extraction and synthesis.
(Intent classifier)
What triggers the Browse tool?
📰 Recent Events & Updates
"What's new in [field]", "[company] recent news" — time-sensitive by definition.
✅ Task-Completion Queries
"How do I [accomplish specific goal]", "what steps do I take to [achieve outcome]".
📚 Multi-Source Research
"Compare X, Y, and Z for [context]", "best [options] for [use case]".
🔧 Troubleshooting
"Why is [thing] not working", "how to fix [specific error]".
🛒 Product Evaluation
"Is [product] worth it in 2026", "reviews of [service]", "pricing for [tool]".
⚖️ Decision-Support
"Should I use X or Y", "what's better for [use case]".
ChatGPT Search citation signal strength
📊 Relative citation signal influence — ChatGPT Search
Content formatting for ChatGPT Search citation
Browse targets the first complete sentence cluster after a heading that directly answers what the heading asks. The ideal paragraph opens with a declarative statement, runs 40–70 words, and makes sense without any surrounding context — slightly longer than the 40–60 word target for Google AI Overviews, since ChatGPT responses run longer.
For one client, I restructured 15 blog posts so each H2 section opened with a 45–65 word declarative paragraph that directly answered the heading. Six weeks after Bing re-crawled the updated pages, ChatGPT Search referrals went from 34 sessions/month to 218 — a 541% increase. The actual content didn't change, just the structure and the opening paragraph format. That's the highest-impact structural change I've tested across the whole 47-site portfolio.
Three additional formatting principles specific to ChatGPT:
ChatGPT matches heading text against actual query phrasing. "What's the difference between X and Y?" outperforms "X vs. Y: A Comparison." Write headings the way a real person would type the question into ChatGPT — not the compressed keyword format that worked in traditional SEO.
Numbered lists with descriptive step titles extract cleanly. Each step should stand alone: ChatGPT can cite step 4 of a 10-step guide independently if step 4 makes sense without the surrounding steps.
Definition sentences ("X is [definition]. It works by [mechanism].") are among the most reliably extracted content units. Don't let the definition emerge from surrounding prose — put it in a clean sentence at the top of the section.
The ChatGPT footnote citation format
ChatGPT Search drops numbered footnote markers — [1], [2], [3] — right at the specific sentence drawing from each source. At the bottom of the response, each expands to show the page title, domain, a brief excerpt, and a direct link to your URL. This creates a direct, one-click path to your page from the cited claim — unlike Google AI Overviews, where attribution is less visible. Across 12 client sites, average ChatGPT session engagement time was 3m 42s — 61% higher than the same sites' average organic Google session.
Source: IndexCraft analysis, 12 client sites, 2025–2026.Custom GPTs and the GPT Store as a citation channel
Custom GPTs — purpose-built AI assistants in the ChatGPT ecosystem — represent an underexplored citation surface. As of Q1 2026, the GPT Store hosts over 3 million Custom GPTs. When a Custom GPT has Browse enabled, it can cite web sources — including your pages — in response to user queries in that GPT's domain. In my HR SaaS client audit, roughly 30% of the ChatGPT referral traffic growth was attributable to Custom GPT sources, not the main ChatGPT Search interface. Actual ChatGPT-driven traffic is meaningfully higher than chatgpt.com referral numbers alone suggest. Source: OpenAI platform data; IndexCraft internal analysis.
ChatGPT Voice: optimising for spoken AI responses
ChatGPT's Advanced Voice Mode (available to Plus users since September 2024, with ChatGPT Search integration confirmed December 2024) extends Browse into spoken conversation. Voice responses are 50–150 words — Browse extracts a single concise passage. The opening sentence of each direct-answer paragraph must be a complete, standalone statement: "The answer to X is Y, which works by Z." Also write out percentage figures in prose ("forty percent" rather than "40%") and avoid abbreviations on first use — these adjustments are minimal on the reading experience but meaningfully improve spoken citation quality.
Technical requirements unique to ChatGPT Search
| Requirement | Why ChatGPT-Specific | How to Verify |
|---|---|---|
| Bing index coverage | Google AI Overviews uses Google's index. ChatGPT Search uses Bing's. Google-indexed ≠ ChatGPT-eligible. | Bing WMT → Index Explorer → check key pages |
| Bingbot not blocked | Many sites block non-Google bots unintentionally. Any Bingbot block eliminates ChatGPT Search eligibility for those pages. | Bing WMT → Diagnostics → Robots.txt Tester |
| OAI-SearchBot not blocked | OpenAI's crawler supplements Bing retrieval for ChatGPT Search. | Check robots.txt for Disallow rules; verify in server logs |
| Server-rendered HTML | Bingbot renders JavaScript less reliably than Googlebot. Client-side-only content is absent from Bing's index. | Bing WMT → URL Inspection → View Rendered Page |
| Fast TTFB for Bingbot | Bingbot's crawl timeout is shorter than Googlebot's. TTFB above 2 seconds risks incomplete crawl. | Pingdom / GTmetrix — target <500ms TTFB |
| WAF not blocking ChatGPT user-agent | Some WAF rules block ChatGPT's fetch user-agent during Browse, preventing real-time retrieval even for Bing-indexed pages. | Check Cloudflare bot management settings for OpenAI user-agent blocks |
Tracking ChatGPT Search referral traffic in GA4
GA4 → Reports → Acquisition → Traffic Acquisition. Add "Session source" as secondary dimension. Filter for chatgpt.com and openai.com. Note: 19–28% of ChatGPT-sourced visits may appear as Direct due to referrer stripping, especially from mobile apps (Ahrefs, 2025). OpenAI appends utm_source=chatgpt.com to citation links since June 2025, improving desktop attribution.
GA4 → Admin → Channel Groups → Create channel. Rule: Session source contains "chatgpt.com" OR "openai.com" OR "perplexity.ai" OR "bing.com/chat". This consolidates AI search referral traffic into a single trackable channel.
In GA4 Explorations, segment by first session source = chatgpt.com / openai.com. Look at engagement time, scroll depth, and conversions. Across 12 client sites, average ChatGPT conversion rate was 2.3× the same sites' organic Google rate.
💬 ChatGPT Search — complete optimisation checklist
- Bing Webmaster Tools account created and site verified
- XML sitemap submitted to Bing Webmaster Tools
- Key pages confirmed as indexed in Bing via URL Inspection
- Bingbot confirmed "Allowed" in robots.txt via Bing's Robots.txt Tester
- OAI-SearchBot not blocked in robots.txt or WAF rules
- Key content visible in Bing's rendered HTML view (not JS-dependent)
- Server TTFB under 500ms for primary content pages
- Section headings rewritten in conversational question format
- Direct-answer paragraphs (40–70 words) added below each H2
- Definition sentences added for all key concepts
- Complete OGP tags: og:title, og:description, og:type, og:url, og:image, og:site_name
- Named author byline with linked bio page on all target pages
- Article and FAQPage schema validated via Google Rich Results Test
- GA4 custom "AI Search" channel group configured
- Do not assume Bing WMT sitemap submission alone resolves all crawl issues — verify individual page index status
- Never block Bingbot or OAI-SearchBot — eliminates ChatGPT Search candidacy entirely
Optimising for Google Gemini
E-E-A-T · Schema markup · Knowledge Graph · Core Web Vitals
Google Gemini powers both AI Overviews in Google Search and the standalone Gemini assistant. As of Q1 2026, AI Overviews appear in 25.11% of all Google searches (Conductor, 21.9M queries) — up from 13.14% in March 2025. Ahrefs' August 2025 analysis found 76.1% of AI Overview citations also rank in Google's top 10 — confirming that strong traditional Google SEO remains the most reliable path to Gemini citation. Sources: Conductor 2026 AEO/GEO Benchmarks; Ahrefs, August 2025.
Gemini citation ranking factors
| Factor | Why Gemini Weights It | How to Optimise | Priority |
|---|---|---|---|
| Google indexing & organic position | Gemini draws from Google's index. 76.1% of AI Overview citations rank in Google's top 10 (Ahrefs, 2025). Very few citations come from pages beyond position 20. | Standard Google SEO: technical excellence, strong backlinks, topical authority, Core Web Vitals. | CRITICAL |
| E-E-A-T signals | Google's Quality Rater Guidelines and Gemini's source selection both weight E-E-A-T above all other quality factors. 90% of AI citations driving brand visibility originate from earned and owned media (Edelman). | Name authors with verified credentials. Add author bio pages linking to professional profiles. Include "Experience" signals: testing data, case studies. Add editorial policy. | HIGHEST |
| Schema markup (FAQPage, HowTo, Article, Organization) | Schema is the clearest structural signal to Gemini's extraction pipeline. FAQPage marks Q&A pairs for AI extraction; HowTo marks step sequences; Article identifies content type and authorship. | Implement FAQPage on all Q&A sections. HowTo on all tutorials. Article schema with named author and datePublished. Organization + WebSite in global header. Validate in Google's Rich Results Test. | HIGHEST |
| Knowledge Graph entity recognition | Brands in the top 25% for web mentions get 10× more AI visibility. Top 50 brands receive 28.9% of all AI Overview mentions (Ahrefs). | Build Knowledge Graph presence via Wikipedia/Wikidata, consistent NAP across directories, Google Business Profile, social profile verification. | HIGH |
| Google-Extended crawler allowance | Google-Extended specifically feeds Gemini AI Overviews. Blocking it prevents Gemini citation even when Googlebot can access the page. | Audit robots.txt for Google-Extended Disallow rules. Remove to enable AI Overview citation on public content. | CRITICAL |
| Direct answer structure & extractability | Gemini's extraction pipeline looks for direct, concise answers in the first 60 words of each section. Pages with well-organised headings are 2.8× more likely to earn citations (AirOps). | Question-format H2/H3 headings. Lead every section with direct answer (40–70 words). Use "X is defined as…" pattern. | HIGH |
| Topical authority & cluster completeness | Gemini evaluates pages within their topical context. Comprehensive topic cluster coverage signals expertise to Gemini's evaluation. | Build topic clusters with a pillar page and 8–15 cluster pages. Strong internal linking from cluster to pillar. | HIGH |
The EEAT signal I've found most reliably measurable on Gemini is adding a named author with a linked author page, author schema, and a verifiable external credential. In 8 sites where I added proper author attribution and Person schema to previously anonymous content in late 2025, 6 saw a measurable increase in AI Overview impressions in Google Search Console within 6 weeks. The changes weren't major rewrites — just proper authorship infrastructure. Proper author attribution is the single most underpowered GEO investment I see in the market.
✨ Google Gemini optimisation checklist
- Googlebot and Google-Extended both allowed in robots.txt
- FAQPage schema on all pages with Q&A sections
- HowTo schema on all tutorial and step-based content
- Article/BlogPosting schema with named author, datePublished, dateModified
- Organization and WebSite schema in global site header
- All content has a named author with credential description and author page
- Wikidata or Wikipedia entity entry exists for the organisation
- Core Web Vitals pass Google's thresholds (LCP <2.5s, INP <200ms, CLS <0.1)
- Every section uses question-format H2/H3 headings followed by a direct 40–70 word answer
- Topic cluster architecture in place: pillar pages linked to comprehensive cluster pages
- AI Overview trigger rate should be checked manually per target keyword — high AIO trigger = lower organic CTR but higher citation opportunity
- Keyword stuffing and thin content — Gemini's E-E-A-T evaluation is particularly effective at identifying low-quality content
Entity SEO & Semantic Search for AI Citation
Knowledge Graph recognition · Entity building · Semantic relationships
Why entities matter for AI citation
AI systems like Gemini don't just match keywords — they understand the relationships between entities (people, organisations, topics, products). When your brand is a recognised entity in the Knowledge Graph, Gemini treats content from your domain as more authoritative for topics where that entity has established relevance. This translates directly into higher citation probability for your content.
Semantic search — the understanding of meaning and relationships between concepts rather than just keyword matching — is the foundation of how all three AI search engines evaluate topical authority. Content that demonstrates deep semantic relationships between concepts (through internal linking, comprehensive topic coverage, and consistent use of related terms) outperforms content that optimises for isolated keywords.
Entity building tactics for AI search visibility
🌐 Wikidata & Wikipedia entity establishment
Create or claim Wikidata entries for your organisation and key personnel. Use Organisation and Person schema with sameAs properties linking to Wikidata, Wikipedia, LinkedIn, and other authoritative profiles. This connects your digital presence to entities Google can resolve in its Knowledge Graph — directly feeding Gemini's entity recognition.
🏷️ Entity-first schema implementation
Use Organization schema with complete sameAs arrays in your site header. Use Person schema for all named authors with knowsAbout, jobTitle, and sameAs to professional profiles. Use DefinedTerm schema for glossary content. These explicitly signal entity relationships to Gemini's extraction pipeline.
📰 Third-party brand citations & mentions
Ahrefs data: top 25% of brands by web mentions receive 10× more AI visibility; top 50 brands receive 28.9% of all AI Overview mentions. Focus on earning brand mentions in industry publications, analyst reports, expert roundups, and press coverage — not just backlinks. Each third-party mention adds a data point to your entity's authority profile.
🔗 Semantic content clustering
Build comprehensive topic clusters that cover the full semantic field of your primary topics — definitions, subtopics, related concepts, applications, comparisons. Internal linking should connect semantically related pages with descriptive anchor text (not "click here"). This communicates topical semantic coverage to all three AI search engines, not just Gemini.
📋 Cross-web entity consistency
Consistent Name, Address, and Phone (NAP) across Google Business Profile, LinkedIn, Crunchbase, Clutch, G2, industry directories, and social profiles signals entity coherence. Inconsistent brand information across platforms weakens the entity signal Gemini uses to validate publisher identity. Audit all brand profile pages annually.
🏆 Semantic topical authority
AI engines evaluate the topical context a page appears in. A page from a site with comprehensive semantic coverage of a subject area is rated more authoritatively than the same page from a generalist blog. Brands in the top 25% for web mentions receive 10× more AI visibility (Ahrefs). Topic depth is more important than topic breadth for entity authority signalling.
Semantic SEO for AI search: practical implementation
Semantic SEO for AI search goes beyond keyword research. The goal is to create a content ecosystem where each piece of content is semantically connected to related content, entities, and concepts — so AI engines can recognise the full topical authority of your domain.
Identify the core entities (organisations, people, products, topics) your content covers. For each entity, ensure: (a) a dedicated page or section defines the entity with direct-answer structure; (b) schema markup explicitly identifies the entity type; (c) internal links connect entity pages to related topical content.
Structure headings to reflect the semantic hierarchy of your topic — from broad concept (H2) to specific subtopic (H3) to component details (H4). This mirrors how Knowledge Graphs represent entity relationships and how AI engines parse content structure.
DefinedTerm schema marks up glossary definitions for Gemini extraction. SpeakableSpecification marks sections particularly suited for voice responses. Both explicitly signal semantic content structure to AI extraction pipelines.
Anchor text should reflect the semantic relationship between pages: "entity optimisation techniques" linking to an entity SEO guide conveys semantic meaning; "click here" or "read more" does not. Descriptive, keyword-rich anchor text strengthens topical authority signals for all three AI platforms.
Universal Content Structure for Cross-Platform AI Citation
One structure that works across Perplexity, ChatGPT, and Gemini
Write the heading as the exact question the target user would search for. "What is email marketing?" not "Email Marketing Overview." This creates direct alignment between the query and the extraction anchor — all three AI engines match query phrasing to heading phrasing when selecting extraction targets.
The first paragraph after every question heading must be a complete, direct answer. Use the pattern: "[Subject] is [definition/action/fact]." Write it to stand alone — a reader (or AI) who reads only this paragraph should receive a complete, accurate answer to the heading question.
Follow with paragraphs providing context, nuance, examples, and specifics. Include at least one specific data point or named example per section. This depth drives clicks after an AI citation — users who see your answer cited click through for more detail when the cited answer signals depth is available.
Where content permits, include a structured sub-element. Comparison tables are particularly valuable: 32.5% of AI citations cite comparison articles (Princeton/Georgia Tech/Allen Institute, ACM SIGKDD 2024). These are especially valuable for Perplexity (inline extraction) and Gemini (AI Overview synthesis).
Every significant piece of content should end with 6–10 FAQ items targeting related queries. Each FAQ item: question heading → 60–100 word direct answer. Apply FAQPage schema. The FAQ section is the single highest-density GEO investment per piece of content: it addresses multiple queries simultaneously and provides clean extraction targets for all three AI engines.
E-E-A-T & Authority Signals Across All Three Platforms
Expertise · Experience · Authoritativeness · Trustworthiness
According to Edelman research, 90% of AI citations driving brand visibility come from earned and owned media — the direct product of genuine expertise, experience, authoritativeness, and trustworthiness. E-E-A-T signals pay off across all three platforms, with the highest weight assigned by Gemini and meaningful weight on ChatGPT Search. Source: Edelman, cited in Superlines AI Search Statistics, 2026.
Named author credentials
Every piece of content should be attributed to a named human author with a brief credential description — job title, years of experience, relevant expertise. Author attribution lifts citation probability on all three platforms. In every citation audit I've run, attributed content consistently outperforms anonymous content from the same domain. Author pages with professional biography, publication history, and links to external profiles strengthen the signal further.
Original data and primary research
Original surveys, proprietary tests, case studies, and original analysis tend to earn citations across all three AI platforms in a way few other content types do. Perplexity actively favours primary data — if a specific statistic lives only on your site, it has no choice but to cite you or leave the answer unsourced. SE Ranking's 2.3M-page study confirmed domain traffic driven by backlinks is the strongest single predictor of AI citation, and original research is the most reliable way to build it organically. Source: SE Ranking, 2025.
Organisational trust signals
About pages, editorial policies, contact information, privacy policies, and terms of service are the "trust infrastructure" AI engines evaluate when assessing publisher credibility. ChatGPT Search is particularly sensitive to publisher transparency. All three platforms elevate content from publishers with clearly disclosed organisational identity, mission, and editorial standards over anonymous or opaquely attributed content.
Third-party brand mentions and citations
Ahrefs data shows brands in the top 25% for web mentions get 10× more AI visibility than others; the top 50 brands receive about 28.9% of all AI Overview mentions. This is about brand entity citations, mentions in industry publications, expert roundup inclusions, and press coverage — not just backlinks. Each third-party mention adds a data point to your domain's authority profile across all AI engine indexes.
Schema Markup Strategy for AI Citation
FAQPage · HowTo · Article · Organization · Person · DefinedTerm
| Schema Type | Gemini | ChatGPT | Perplexity | Priority |
|---|---|---|---|---|
| FAQPage | Very High — direct extraction signal for Q&A content | Medium — helps ChatGPT classify Q&A structure | Low direct benefit | Mandatory — all Q&A sections |
| HowTo | Very High — preferred format for process queries | Medium — enables richer step display | Low direct benefit | Mandatory — all tutorials |
| Article / BlogPosting | High — authorship and date feed E-E-A-T evaluation | High — author schema feeds credibility evaluation | Medium — datePublished feeds recency signal | Mandatory — all editorial content |
| Organization / WebSite | Very High — feeds Knowledge Graph entity profile | High — publisher identity signal | Medium — domain identity signal | Mandatory — site header |
| Person (Author) | Very High — Experience signal; links to Knowledge Graph | Very High — explicit author credential signal | Medium — expert attribution | Mandatory — all author pages |
| DefinedTerm | High — explicitly signals definitional content for Gemini extraction | Low | Low | Implement on glossary content |
| Review / AggregateRating | Medium | Medium — rich citation display | Medium — product query citation | Implement on review content |
| BreadcrumbList | Medium — site structure signal | Medium — structural clarity | Low | Implement on all pages — low effort |
Original Research: The Highest-Value GEO Investment
Why primary data earns disproportionate AI citations across all three platforms
Original research earns more AI citations than comparable content for three reasons: uniqueness (AI engines can't cite data that only exists on your site from any other source — if you publish the only current survey for your industry, every AI answering that query has to cite you); factual density (a single research piece generates multiple citable statistics, each a separate citation opportunity); and backlink generation (original research earns authoritative inbound links that strengthen authority signals across all three AI engines over time). Source: SE Ranking, 2.3M-page study, 2025; Semrush AI Visibility Index.
The most citation-productive original research I've produced for a client was a survey of ~300 people in a specific professional role, asking questions that had clear answers practitioners would find useful but that no existing source had quantified. The finding that landed was counterintuitive: the assumption most people held about the primary barrier to a workflow turned out to be wrong. Counterintuitive-but-verifiable is the formula for linkable and citable research. Within the first month after publication, it had been cited in Perplexity responses on three different query variants we hadn't specifically targeted. — Rohit Sharma
Search your target queries across all three AI platforms. When you see "research shows" or "according to studies" with no specific source — or outdated data — you've identified a citation gap. Creating a current study that fills this specific data gap almost guarantees AI citation because the AI has an established need for the data and no satisfactory source to cite. 40–60% of cited sources rotate monthly (Semrush) — stale data gets replaced continuously.
Publish each key finding as a standalone paragraph or section with the statistic in the opening sentence, methodology described clearly, and a definitions section. Label each section with a question heading that anticipates the query and answers it directly. The Princeton/Georgia Tech/Allen Institute study found that adding statistics ("Statistics Addition") was one of the highest-performing GEO tactics, improving AI visibility by 40%.
Research that earns AI citations also earns backlinks — the two reinforce each other. Distribute to industry newsletters, journalist contact lists, and relevant communities. Reach out to sites that currently cite outdated data on your research topic and offer your updated findings. Each backlink from an authoritative domain increases citation probability across all three AI engines, creating a compounding authority effect.
Measuring AI Citation Performance
Manual checks · GA4 tracking · GSC signals · Third-party tools
AI referral traffic currently accounts for 1.08% of all website traffic (Conductor 2026) — small in absolute terms, but converting at 2× the rate of traditional organic search (Knotch via Conductor) and growing at ~1% month over month.
| Metric | Platform(s) | How to Measure | Frequency |
|---|---|---|---|
| Direct citation check | All three | Manually search your 20–30 target queries in each AI engine. Record: cited (yes/no), citation position, context. 40–60% of cited sources rotate monthly (Semrush) — monthly cadence is essential. | Monthly |
| AI referral traffic (GA4) | Perplexity ChatGPT | GA4 → Traffic Acquisition → filter by perplexity.ai and chatgpt.com/openai.com. Create custom AI Search channel group. Note: 19–28% of ChatGPT traffic may appear as Direct due to referrer stripping. | Weekly trend; monthly review |
| Branded search volume (GSC) | All three | GSC → Search results → filter by brand name. Month-over-month branded impressions growth is the downstream proxy for AI visibility building brand recognition. | Monthly |
| GSC impressions (informational queries) | Gemini | GSC → filter informational keywords with AI Overview. Impression growth without click growth indicates AI Overview exposure. | Monthly |
| Bing Search referral traffic | ChatGPT | Bing referral traffic growth is a proxy for Bing authority — and thus ChatGPT citation potential. | Monthly |
| Third-party AI citation tools | All three | Semrush AI Toolkit, BrightEdge Generative Parser, Profound. Valuable at 50+ target queries. Average tool cost: $337/month (Conductor, Rankability). | Weekly automated |
Implementation Roadmap: Week-by-Week
The fastest path from zero to citations on all three platforms
✅ Audit robots.txt — remove blocks on PerplexityBot, OAI-SearchBot, ChatGPT-User, Google-Extended
✅ Verify site in Bing Webmaster Tools + submit XML sitemap
✅ Check Bing Index Coverage for crawl errors and blocked pages
✅ Implement IndexNow for real-time Bing update notifications
✅ Run Google's Rich Results Test on 5 top pages — identify schema gaps
✅ Test site rendering with JavaScript disabled — identify JS-rendered content Perplexity can't crawl
✅ FAQPage schema on all pages with Q&A sections
✅ HowTo schema on all tutorials and step-based content
✅ Article/BlogPosting schema with datePublished, dateModified, and named author
✅ Organization, WebSite, and Person schema implemented and validated
✅ DefinedTerm schema on glossary and definitional content
✅ Validate all schema in Google's Rich Results Test + submit updated sitemaps
✅ Audit top 10 pages for answer-first structure — does every section lead with a direct answer?
✅ Rewrite section headings to question format
✅ Add "last updated" dates to all pages missing them
✅ Replace all vague statistics with specific, named-source data
✅ Implement complete OGP tags on all pages
✅ Add FAQ sections with 6–10 questions to top 5 content pieces
✅ Add definition sentences for all key concepts (for ChatGPT voice + Browse extraction)
✅ Audit About page — add editorial standards, team descriptions, publication mission
✅ Add named author bylines and credential descriptions to all content
✅ Create author biography pages for all primary contributors
✅ Check or create Wikidata entity entry for the organisation
✅ Audit brand entity consistency: Google Business Profile, LinkedIn, Crunchbase, industry directories
✅ Ensure contact page, privacy policy, and terms of service are accessible
✅ First full citation audit — search 20 target queries in each AI engine, record citation status
✅ Set up GA4 custom AI Search channel group (chatgpt.com + openai.com + perplexity.ai)
✅ Begin production of first original research piece targeting a high-frequency data gap
✅ Identify 5 queries with strong organic ranking but no AI citation — prioritise content restructuring
✅ Establish monthly citation tracking cadence
✅ Publish original research and distribute for backlink acquisition
✅ Refresh top 10 cited pages quarterly — update statistics, refresh "last updated" date. Critical given 40–60% monthly citation rotation (Semrush)
✅ Expand FAQ sections with PAA-sourced questions
✅ Monitor Core Web Vitals in GSC — affects Perplexity's live crawl success rate
✅ Evaluate third-party AI citation monitoring tools at 50+ target queries ($337/month average)
Frequently Asked Questions
Direct answers to the most common AI search optimisation questions
Allow PerplexityBot in robots.txt, use answer-first content structure, and replace vague generalisations with specific named statistics. These are the three highest-impact actions. In detail: crawlability (PerplexityBot must be allowed in robots.txt), answer-first structure (put the direct answer in the first 50–80 words of each section), factual precision (name specific statistics and sources — Perplexity achieves 94% citation accuracy, meaning vague claims get filtered out), source credibility (build domain authority through backlinks and third-party mentions), recency (visible publication and last-updated dates matter, since 40–60% of cited sources rotate monthly per Semrush), and concise paragraphs (40–80 word standalone paragraphs are easier for Perplexity's extraction system to cite inline).
Start with Bing: verify your site in Bing Webmaster Tools and submit your sitemap. ChatGPT Search draws from Bing's index and accounts for 87.4% of all AI referral traffic (Conductor 2026) — if you're not indexed there, none of the content work matters. Beyond that: implement complete OGP tags on every page, maintain a proper About page with named author attribution and editorial disclosure, use clean semantic HTML throughout (avoid JavaScript-only content rendering), include 40–70 word direct-answer paragraphs under question-format headings, include first-person experience and genuine testing data where relevant, and make sure OAI-SearchBot and ChatGPT-User are allowed in robots.txt.
Entity SEO is the practice of establishing your brand, people, products, and topics as recognised entities in search engine Knowledge Graphs — particularly Google's. For AI search, entity recognition directly elevates citation probability: Ahrefs data shows brands in the top 25% for web mentions get 10× more AI visibility. Google Gemini uses entity signals most heavily; ChatGPT Search evaluates entity signals via Bing's index; Perplexity uses domain credibility as a proxy for entity authority. To build entity recognition: create Wikidata entries, use Organization and Person schema with sameAs properties linking to authoritative profiles, build consistent brand presence across directories, and earn brand mentions in industry publications.
ChatGPT's Browse tool is the real-time web retrieval mechanism that activates when ChatGPT's intent classifier determines a query requires current information. When Browse triggers, it queries Bing's index, fetches top-ranking pages, extracts relevant passages, and synthesises them with numbered footnote citations. SEO impact: pages must be indexed in Bing (not just Google) and rendered in accessible HTML. Browse activates for task-completion queries (~90%), current events (~95%), and product evaluation queries (~80%). Bing Webmaster Tools setup is the essential first step in ChatGPT SEO — not Google Search Console.
Gemini draws from Google's index, so standard Google SEO is the foundation — Ahrefs' August 2025 analysis found 76.1% of AI Overview citations also rank in Google's top 10. On top of that: invest in E-E-A-T signals (named author credentials, original research, editorial standards), implement FAQPage, HowTo, Article, and Organization schema, structure every section with a question-format heading followed immediately by a 40–70 word direct answer, build brand entity recognition via Wikidata and consistent cross-web presence, and confirm Google-Extended is allowed in robots.txt.
GEO (Generative Engine Optimisation) is the practice of structuring content to earn citations in AI-powered generative search engines — including Google AI Overviews, Perplexity AI, ChatGPT Search, and Microsoft Copilot. A Princeton, Georgia Tech, and Allen Institute of AI study published at ACM SIGKDD 2024 found proper GEO implementation can boost AI visibility by up to 40%. GEO and SEO are parallel disciplines: SEO targets organic blue-link rankings; GEO targets AI-synthesised answer citations. The signals that matter most for GEO — E-E-A-T, direct answers, schema, entity recognition — also strengthen organic SEO rankings.
In GA4, filter Traffic Acquisition by session source chatgpt.com and openai.com. Create a custom AI Search channel group in Admin → Channel Groups capturing chatgpt.com, openai.com, and perplexity.ai. Note that 19–28% of ChatGPT-sourced visits may appear as Direct due to referrer stripping, particularly from mobile apps. OpenAI appends utm_source=chatgpt.com to citation links on the web interface since June 2025, improving desktop attribution. In audits across 12 sites, ChatGPT-sourced sessions convert at 2.3× the organic Google rate — the most compelling number when making the case to stakeholders.
Yes — robots.txt directly affects AI search engine crawling. Each AI engine uses different crawler user-agents: Perplexity uses "PerplexityBot"; OpenAI uses "GPTBot" and "OAI-SearchBot"; Google Gemini uses "Googlebot" and "Google-Extended". If any of these crawlers are blocked — either by specific Disallow rules or blanket wildcard rules — those AI engines cannot cite your content. Press Gazette research (2025) found nearly 80% of top news publishers now block at least one major AI crawler. Note: you can block Google-Extended specifically to prevent AI Overview citation while keeping Googlebot access for organic search.
Timelines vary by platform: Perplexity results can appear within days of technical fixes (crawler allowance, content refresh) because it performs real-time crawls — I've seen citation appearances within 72 hours of unblocking PerplexityBot and refreshing content. ChatGPT Search typically shows results within 2–4 weeks after Bing re-indexes updated pages. Google Gemini typically takes 4–8 weeks after implementing schema, E-E-A-T improvements, and content restructuring. Brand entity recognition (Knowledge Graph, branded search volume growth) takes 3–6 months to compound measurably — and is the most durable competitive advantage once established.
Related Deep-Dive Guides
The full IndexCraft AI Search & GEO cluster
The broader GEO framework — RAG architecture, universal content structure, and topical authority principles that underpin all platform-specific optimisation.
Read the GEO pillar →The authority signal framework that is the primary citation factor for Google Gemini and a major factor for ChatGPT Search.
Read the E-E-A-T guide →Platform-exclusive deep-dive covering AI Mode's Gemini architecture, full-page search experience, and content and technical signals specific to AI Mode citation.
Read AI Mode guide →The complete entity SEO framework — Knowledge Graph building, semantic content architecture, and DefinedTerm schema for AI search visibility.
Read entity SEO guide →The technical SEO foundation that underpins AI crawler access, indexability, and page speed requirements across all three platforms.
Read technical SEO guide →The content cluster architecture that simultaneously builds topical authority for organic search and domain-level credibility for AI citation.
Read cluster guide →📚 Sources & References
- Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. ACM SIGKDD 2024. Princeton, Georgia Tech, Allen Institute of AI. doi.org/10.1145/3637528.3671900
- Conductor. (2026). 2026 AEO/GEO Benchmarks Report. Analysis of 13,770 domains, 21.9M Google searches, and 17M AI responses.
- Previsible. (2025). AI Traffic Report — H1 2025.
- SE Ranking. (2025). AI Traffic Research Study. Analysis of 2.3 million pages. seranking.com
- Ahrefs. (August 2025). AI Overview Citation Analysis & AI Referral Traffic Research.
- OpenAI. (October 2024). ChatGPT Search Launch Announcement. openai.com
- Microsoft Bing Webmaster Blog. Bing Index & ChatGPT Search Developer Guidance. blogs.bing.com/webmaster/
- Semrush. (2025). AI Search Behaviour Report: Query Pattern Analysis. 80M+ clickstream records. semrush.com
- Semrush. (2025–2026). AI Visibility Index.
- BrightEdge. (2025). AI Search Behaviour Report.
- Superlines. (2026). AI Search Statistics 2026 & The State of GEO in Q1 2026. Analysis of 34,234 AI responses across 10 platforms. superlines.io
- Exposure Ninja. (2026). AI Search Statistics 2026. exposureninja.com
- Press Gazette. (2025). AI Training Crawler Blocking Research.
- AirOps. (2025). Heading Structure and AI Citation Rate Research.
- Perplexity AI. (2025). Official query volume data.
- Incremys. (2026). Perplexity AI 2026 Statistics. incremys.com
- WARC. (2025). Perplexity AI User Audience Research.
- Edelman. (2025). AI Citation and Earned Media Research.
- Similarweb. (2025). chatgpt.com Audience Intelligence Analysis.
- IndexCraft — Rohit Sharma. (2025–2026). Internal AI Citation Audit Data. Analysis of 47 client websites, 1,800+ ChatGPT Search response pairs, Oct 2024 – Mar 2026.
- OpenAI. (2026). GPT Store & Custom GPTs Platform Statistics.
- Knotch / Conductor. (2026). AI Referral Traffic Conversion Rate Study.