🔍 How do you do an SEO audit? (Direct answer)
A proper SEO audit works through nine layers: crawlability, technical health, on-page structure, content quality, E-E-A-T signals, AI search visibility, backlinks, Core Web Vitals, and analytics. At each layer, you're comparing what your site actually does against what Google and AI search engines need — using free tools: Google Search Console, Bing Webmaster Tools, Screaming Frog, and PageSpeed Insights. After 47 client audits, the most common finding isn't a technical one. It's a combination of pages missing from Bing's index and content written too vaguely for either Featured Snippets or AI citation engines to pull a clean answer from.
Most SEO audit guides are written by tool companies and funnel every step toward their paid product. This is a manual audit framework built around free tools, with the human-judgment checks that no crawler can do for you. It also includes an AI Visibility Audit step — an audit layer that simply didn't exist before 2024 and that now drives a fast-growing share of referral traffic for the sites I manage. Related guides:
- Technical SEO deep-dive: Technical SEO Guide →
- Google Search Console full walkthrough: GA4 & GSC Guide →
- ChatGPT Search optimisation: ChatGPT SEO Guide →
- AI search citation framework: GEO & AEO Guide →
Over the past 18 months I've run structured SEO audits on 47 client sites — covering SaaS, B2B tech, professional services, and e-commerce. The finding that surprises people most: the single most common issue isn't a technical failure or an algorithm penalty. It's that nobody has looked at the site systematically in two or three years. Pages that were accurate when published are now outdated. Internal links point to redirected URLs. Title tags were written for a keyword strategy that's since changed.
A full audit forces the kind of structured review that ongoing day-to-day work never does. The ROI is almost always higher than clients expect, not because there are dramatic problems to fix, but because there are dozens of small ones — each individually minor, cumulatively significant. — Rohit Sharma, IndexCraft
Ahrefs' 2024 data study of over one billion pages found that 96.55% of pages get zero organic traffic from Google — not because the content is bad, but because of fixable technical and structural issues that a proper audit finds and prioritises. [1] On top of that, there's now a second channel to audit that simply didn't exist before 2024: AI search visibility. ChatGPT Search, Google AI Mode, and Perplexity are driving a fast-growing share of referral traffic, and a site that hasn't been checked for AI citation eligibility is leaving a measurable slice of that on the table.
1. What Is an SEO Audit and Why 2026 Changes the Scope
An SEO audit is a structured look at why your site isn't ranking where it should — covering technical access, content quality, authority signals, and now AI citation eligibility. What's changed in 2026 is that there are effectively two audiences you need to satisfy at the same time: traditional crawlers (Googlebot, Bingbot) and AI retrieval systems (ChatGPT Search, Google AI Mode, Perplexity). A site that passes a clean technical audit but has no Bing index coverage and content that AI can't extract from is already losing a chunk of traffic it doesn't know about.
📋 Traditional SEO Audit (Pre-2024)
- Technical crawlability for Googlebot
- On-page keyword optimisation
- Backlink profile and authority
- Core Web Vitals and page speed
- Google Search Console coverage
- Content quality for human readers
🤖 2026 SEO Audit (Full Scope)
- All traditional layers above, plus:
- Bing index coverage for ChatGPT Search eligibility
- Direct-answer content structure for AI extraction
- E-E-A-T signals for both Google and AI citation preference
- FAQPage & Article schema for AI-readable structure
- OAI-SearchBot and GPTBot crawl access
- GA4 AI search channel tracking
2. Pre-Audit Setup: Your Free Tool Stack
Before you start crawling, get these tools configured. Everything in this guide can be done with free versions — I've listed paid alternatives where they exist, but you won't need them to complete any of the nine steps.
| Tool | Cost | What It Covers in This Audit | Where to Access |
|---|---|---|---|
| Google Search Console | Free | Indexation, coverage errors, Core Web Vitals, mobile usability, search performance, manual actions | search.google.com/search-console |
| Bing Webmaster Tools | Free | Bing index coverage, Bingbot crawl errors, robots.txt tester — critical for AI visibility step | webmaster.bing.com |
| Screaming Frog SEO Spider | Free ≤500 URLs | On-page audit: title tags, meta descriptions, H1s, canonical tags, redirect chains, broken links, duplicate content | screamingfrog.co.uk/seo-spider |
| Ahrefs Webmaster Tools | Free (limited) | Site audit for technical issues, backlink profile overview, broken backlink detection | ahrefs.com/webmaster-tools |
| Google PageSpeed Insights | Free | Core Web Vitals (LCP, INP, CLS), TTFB, FCP — lab and field data from Chrome UX Report | pagespeed.web.dev |
| Google Rich Results Test | Free | Validates Article, FAQPage, Product, HowTo, and other schema markup for structured data errors | search.google.com/test/rich-results |
| Google Analytics 4 | Free | Traffic acquisition, user engagement, conversion tracking, AI search channel group setup | analytics.google.com |
| Chrome Lighthouse / DevTools | Free | JavaScript rendering check, accessibility, performance, SEO on-page checks — no data sent to any server | Built into Chrome → F12 → Lighthouse tab |
3. Step 1 — Crawlability & Indexation Audit
If search engine bots can't crawl your pages — or if Google has decided not to index them — nothing else matters. Good content, clean code, strong backlinks: all wasted if the page is invisible. This step finds every page that should be indexed but isn't, and every exclusion that's either intentional or accidental (the accidental ones are more common than you'd think).
Open yoursite.com/robots.txt and read every Disallow rule line by line. Ask yourself: is this intentional? In my audit portfolio, 38% of sites had at least one unintentional Disallow rule — usually a wildcard pattern left over from a staging environment, or a CMS plugin that added a blanket block nobody noticed. Verify using GSC's robots.txt tester, then test Bingbot separately in Bing Webmaster Tools' Robots.txt Tester — the two crawlers interpret wildcard rules differently and what passes for Googlebot doesn't always pass for Bingbot. [4]
In Google Search Console, go to Indexing → Pages. Compare the "Indexed" count against your sitemap URL count. A gap of more than 10% usually means one of two things: pages that shouldn't be in the sitemap (paginated URLs, filter variants, tag archives), or pages Google has decided to exclude because of canonical conflicts, thin content signals, or crawl budget deprioritisation. Export the "Not indexed" list and triage each exclusion reason — they're not equal in severity or urgency.
Run Screaming Frog across the full site. In the Response Codes tab, filter for 3xx and 4xx. Redirect chains longer than one hop waste crawl budget and dilute link equity — collapse every intermediate hop to a direct 301. For internal 404s (your own pages linking to a broken URL), these are high-priority fixes: they're broken user journeys and a signal of poor site maintenance. External links pointing to your 404s are covered in Step 7 as a backlink recovery opportunity.
In Screaming Frog, filter the Directives tab for pages returning a noindex meta tag. Accidental noindex tags are one of the most common findings on sites that have recently migrated CMS platforms or gone through a redesign — staging noindex directives that were never removed before launch. Cross-check the GSC Coverage report's "Excluded by noindex tag" bucket against your intended noindex pages (privacy policy, thank-you pages, internal search results) to catch anything that shouldn't be there.
A a software client came to me in Q3 2025 with flat organic traffic despite consistently publishing two new articles per month for 18 months. The first crawl revealed that a robots.txt rule from a legacy CMS migration was blocking Googlebot from accessing every URL containing the string /resources/ — the directory where all 36 of their new blog posts lived. Thirty-six articles, 18 months of content investment, zero Google visibility. The fix took 90 seconds. Traffic began recovering within three weeks of Googlebot re-crawling the content. This is the most dramatic version of a finding I encounter in some form in approximately half of all new audit engagements. [4]
4. Step 2 — Technical SEO Health Audit
Technical SEO issues are binary in a way content issues aren't — canonical tags either point to the right URL or they don't. This step goes through the six technical areas that cause the most actual ranking damage in practice, based on what I find most consistently across client audits.
📊 Technical Issue Frequency Across 47 Site Audits
Every indexable page should have a self-referencing canonical pointing to its exact preferred URL — correct protocol, consistent trailing slash, no session IDs or tracking parameters. In Screaming Frog, filter the Canonicals tab for "Non-canonical pages" and "Canonical mismatch" — both mean Google's index version may differ from what you intended. Canonical mismatches are among the most common technical findings and among the easiest to fix at template level, which makes them a good early win.
Run your five highest-priority pages through the Google Rich Results Test. Focus on: Articles, FAQPages, Product pages, How-To pages, and Local Business listings. Invalid or missing schema turns up on 88% of sites I audit — most commonly because schema was set up correctly at launch, then quietly broke after a CMS update changed the template ID the schema plugin was referencing. [4] In 2026 this matters beyond rich results — structured data is one of the primary signals AI citation engines use to identify extract-ready content.
In Chrome DevTools, disable JavaScript (Settings → Debugger → Disable JavaScript) and reload your key content pages. Anything that disappears — nav, article body, product descriptions — is JS-dependent and may not be indexed by Bingbot, which renders JavaScript far less reliably than Googlebot. Then run Bing Webmaster Tools URL Inspection on the same pages and click "View Rendered Page." If key content is missing from Bing's rendered view, it's missing from Bing's index — which means it's invisible to ChatGPT Search, regardless of how well it performs in Google. [3]
In Screaming Frog, export all crawled URLs and cross-reference with your sitemap. Pages in the sitemap that have zero internal links pointing to them are orphan pages — Google's ability to re-discover and re-crawl them relies entirely on the sitemap schedule, and they accumulate almost no PageRank without internal link signals. Orphan pages tend to be older blog posts, migrated content, and paid campaign landing pages that were created and then never linked from anywhere on the main site.
A a client had added FAQPage schema to 40 pages of their site two years prior to engaging IndexCraft. During the technical audit, Google Rich Results Test revealed that 34 of those 40 pages had broken schema — caused by a WordPress theme update that modified the template ID used by their schema plugin. Those 34 pages had been submitting invalid schema for over 14 months without triggering any visible error in Google Search Console, because GSC only surfaces schema errors that were once valid and then broke — not schema that has been quietly malformed. The re-validation of these pages coincided with a 28% increase in FAQ-rich-result impressions within eight weeks of the fix. [4]
5. Step 3 — On-Page SEO & Content Structure Audit
On-page SEO is more than keywords in title tags. In 2026, it includes how your content is structured for extraction — meaning whether AI systems can pull a clean, direct answer from your page when someone asks a relevant question. That's the piece most sites are missing, and it's also what determines Featured Snippet eligibility.
| On-Page Element | What to Audit | Common Issue | Priority |
|---|---|---|---|
| Title Tags | Unique for every page; 50–60 characters; primary keyword near the start; not duplicated across pages | Duplicate titles from CMS auto-generation; truncated in SERPs; no keyword present | CRITICAL |
| Meta Descriptions | Unique per page; 140–160 characters; includes call to action and primary keyword; not auto-generated | Missing on 40%+ of pages; auto-generated from first sentence; identical across category pages | HIGH |
| H1 Tags | Exactly one H1 per page; includes primary keyword; distinct from title tag but aligned in topic | Multiple H1s from theme templates; missing H1 on landing pages; H1 = title tag verbatim | CRITICAL |
| Heading Hierarchy (H2–H4) | Logical hierarchy; H2s frame major sections; question-format H2s where relevant; no hierarchy skipping | Headings used for styling, not structure; no question-format headings; random hierarchy skips | HIGH |
| URL Structure | Short, readable, hyphenated; includes primary keyword; no parameters in production URLs; consistent trailing slash policy | Auto-generated numeric IDs; underscores instead of hyphens; excessively deep folder structures | MEDIUM |
| Direct-Answer Paragraphs | First paragraph under each H2 directly answers the heading's implicit question in 40–70 words; standalone and extractable | Intro paragraphs that contextualise rather than answer; answer buried in paragraph 3 or 4 | CRITICAL for AI |
| Image Alt Text | Every meaningful image has descriptive alt text; decorative images have empty alt=""; no keyword stuffing | Missing alt text on 60%+ of images; alt text = filename string; identical alt text across multiple images | MEDIUM |
6. Step 4 — Content Quality, Freshness & Gap Audit
Crawlers are good at finding broken links and missing title tags. They're terrible at telling you whether your content is actually any good. That's where this step comes in. We're looking at four things: thin content, keyword cannibalization, freshness decay, and topic gaps — all of which require a human to evaluate properly.
In Screaming Frog, filter the Content tab for pages under 300 words. Export the list and classify each one: pages that are intentionally short — privacy policy, contact page, thank-you pages — are fine. What you're looking for is content pages below 300 words that are trying to rank for informational queries. Google's Search Quality Evaluator Guidelines flag "thin content with little or no added value" as a quality issue that suppresses not just those individual pages but domain-level quality signals. [6] Your options: expand with something genuinely useful, consolidate with a stronger related page, or remove and redirect.
In Google Search Console, export search query and landing page data for the past 12 months. Filter for your target keywords and check: are multiple pages showing up for the same primary query? If they are, you have cannibalization — Google has to pick one on every search, often picks the wrong one, and neither page builds as much signal as it would if the authority were concentrated. The fix depends on intent: if both pages serve the same purpose, consolidate the weaker into the stronger. If they genuinely serve different angles of the same keyword, make those differences a lot clearer in the title and content focus.
In GSC Performance, filter by page and set the date range to the past 12 months with monthly comparison. Pages showing steadily declining impressions — without an algorithm update to explain it — are getting outranked by fresher competitor content. These are your best refresh targets: the page already has backlinks, internal links, and ranking history. It just needs the content brought up to date. Semrush's State of Content Marketing 2025 found that properly refreshed pages return to their previous peak rankings within 60–90 days on average. [7]
Use Ahrefs Webmaster Tools' competitor intersection feature, or manually compare your content inventory against your top two organic competitors. For each topic cluster in your niche, ask which subtopics your competitors rank for that you don't cover. These gaps cost you more than just the missing traffic — they tell Google you only partially own the topic. A site that covers 90% of a cluster consistently outranks one covering 50%, even when the individual pages are comparable in quality. [5]
An a client had a product comparison article that had been their highest-traffic page at publication in late 2023, but had lost 68% of impressions by the time I audited the site in September 2025. The article compared pricing tiers that were two years out of date, referenced features three of the compared products had discontinued, and lacked the "2025" year signal that competitors' refreshed versions included. After a full content refresh — updated pricing, re-verified features, new FAQPage schema, and a direct-answer opening paragraph — the page recovered to 109% of its previous peak traffic within 11 weeks. Zero new backlinks were built; the ranking recovery was entirely content-driven. [4]
7. Step 5 — E-E-A-T & Author Authority Audit
E-E-A-T — Experience, Expertise, Authoritativeness, Trustworthiness — is how Google's quality raters assess whether a page's content and author can actually be trusted. It's always mattered for health, finance, and legal topics. What's changed is that AI citation systems have adopted the same preference: both ChatGPT Search and Google AI Mode are more likely to cite pages with named, credentialed authors over anonymous content. [6]
| E-E-A-T Signal | What to Audit | Why It Matters in 2026 | Status Check |
|---|---|---|---|
| Named Author Bylines | Every content page has a named author byline linked to a verified author bio page with credentials | ChatGPT Search favours pages with verifiable authors when selecting citations. Anonymous content is a liability on both Google and AI platforms. | Audit every post — flag pages with no byline or a byline not linked to a bio |
| Author Bio Page Quality | Author pages list years of experience, specific credentials, named employers, and published work outside the site | Google raters check author bio pages directly. A generic "John writes about marketing" bio signals low E-E-A-T; specific, verifiable credentials don't. | Visit each author page — are the claims specific and verifiable, or vague? |
| About Page Transparency | About page names the people behind the site, the organisation's purpose, location, and founding story | For YMYL sites, anonymity is a red flag in Google's quality guidelines. Raters are specifically instructed to look for transparency about who's behind the content. | Does the About page name real people and provide real organisation details? |
| Contact & Editorial Policy | Site has a working contact page; health, finance, and legal content sites should document their editorial review process | Google's guidelines flag missing contact information as a trust concern. Editorial policies are particularly important for YMYL content where accuracy standards matter. | Is the contact page linked from the footer? Is there an editorial disclosure on applicable pages? |
| External Citations of Author | Author is mentioned or cited by third-party publications or industry sites, or has a Google Knowledge Panel | What other sites say about you carries more E-E-A-T weight than anything you say about yourself. External corroboration is Google's strongest authority signal. | Search "[Author Name] site:[domain.com]" to surface mentions; check for a Google Knowledge Panel |
| Content Accuracy & Sourcing | Factual claims are sourced; statistics come from named, credible, recent sources; no outdated or misleading figures | AI citation systems treat sourced, specific claims as a quality signal. Unsourced assertions are less likely to be extracted and cited. | Spot-check your 10 highest-traffic pages for unsourced statistics and outdated data |
A financial education site was consistently ranking on pages 2–3 for competitive informational queries despite having what appeared to be technically clean, well-structured content. The E-E-A-T audit revealed the root issue: every article was attributed to "The Editorial Team" with no individual author named, the About page listed no staff members by name, and no author had any external publication credits linked from the site. After adding named author bylines with linked bio pages, individual LinkedIn profiles as external credential references, and an editorial review policy page, the site's average ranking position for its 40 target keywords improved from 16.2 to 9.8 within 16 weeks — without a single piece of new content published or a backlink built. [4]
8. Step 6 — AI Visibility Audit
The AI visibility audit is the biggest structural addition to SEO auditing since mobile became a ranking factor in 2015. AI search referral traffic is growing faster than any other acquisition channel right now, and if your site hasn't been checked for AI citation eligibility, that's a present gap — not a future one to worry about later.
🔍 How AI Search Citation Works — The Retrieval Flow
Your site must pass two gates: (1) the index gate — your page must be in the AI engine's source index; (2) the extraction gate — your content must be structured for clean passage extraction.
Open Bing Webmaster Tools and go to Reports & Data → Index Explorer. Check your 10–20 highest-priority content pages individually. In my audit portfolio, an average of 23% of target pages are missing from Bing's index despite being fully indexed by Google — which means complete ChatGPT Search invisibility for those pages, no matter how good the content is. [4] The most common culprits: Bingbot blocked by robots.txt, JavaScript content Bingbot can't process, and canonical configurations Bing interprets differently from Google.
OpenAI runs two separate crawlers: GPTBot (training data) and OAI-SearchBot (real-time ChatGPT Search retrieval). Blocking GPTBot is a legitimate choice if you want to opt out of model training — but some sites have blocked both with a single robots.txt rule, which also kills real-time ChatGPT Search retrieval as a side effect. These are distinct user agents that need separate entries. If you're intentionally blocking training but want ChatGPT Search visibility, make sure OAI-SearchBot has its own explicit Allow rule. [8]
Manually go through your 10 highest-priority pages and read the first paragraph under each H2. Does it directly answer what the heading is asking, in 40–70 words, as a standalone unit someone could read without the surrounding context? Pages that open sections with "In this section we will explore…" or any other preamble are invisible to AI extraction even when the right answer is buried a few paragraphs later. Rewrite those openers to front-load the answer — it's the same change that wins Featured Snippets, so one edit serves multiple channels.
Open ChatGPT (with Search enabled), Perplexity, and Google AI Mode. Run your five most important target queries. Is your site getting cited? If yes — which pages, for which queries? If no — use the Bing URL Inspection and content structure checks above to track down why. This takes 20 minutes and tells you something no automated tool currently can: not whether you're theoretically eligible for AI citations, but whether you're actually getting them right now. [9]
Step 1: Bing index check → webmaster.bing.com → Index Explorer → test top 10 pages → Flag any "Not Indexed" or "Crawl Error" results Step 2: robots.txt crawler check → Open yoursite.com/robots.txt in browser → Search for: Bingbot, OAI-SearchBot, GPTBot, CCBot → Bing Webmaster → Diagnostics → Robots.txt Tester → test Bingbot → GPTBot blocking = training data opt-out (OK if intentional) → OAI-SearchBot blocking = ChatGPT Search invisibility (usually unintentional) Step 3: Content structure spot-check → Open 3 key pages. Read first paragraph under each H2. → Does it answer the heading directly in under 80 words? → Is it comprehensible without surrounding context? → If "no" to either: rewrite as direct-answer paragraph Step 4: Schema check for AI-readable structure → search.google.com/test/rich-results → test key pages → Confirm: Article schema present? FAQPage schema valid? Step 5: GA4 AI channel setup → Admin → Channel Groups → Create → "AI Search" → Rules: Session source contains chatgpt.com OR perplexity.ai OR openai.com
In a March 2026 audit for a professional services firm, I ran 20 target queries in ChatGPT Search. The client received zero citations despite having strong Google rankings for all 20 queries. Bing Webmaster Tools revealed that 14 of the 20 target pages were absent from Bing's index. The root cause: a React SPA architecture where all page content was loaded client-side — Googlebot could render it, Bingbot could not. After implementing server-side rendering for the primary content sections and submitting an updated sitemap to Bing Webmaster Tools, 9 of those 14 pages had entered Bing's index within four weeks. In the following month, ChatGPT referral sessions increased from 0 to 47 per month. Not a large number — but from an audience converting at 3.1x the rate of organic search traffic, every session carries meaningful pipeline value. [4]
9. Step 7 — Backlink & Off-Page Authority Audit
The backlink audit looks at the external signals pointing to your site — which links are working, which are wasted on 404 pages, and where the profile has risk patterns. In 2026 it also needs to account for brand mentions in AI training contexts and digital PR as an E-E-A-T signal, not just traditional link equity.
Ahrefs Webmaster Tools' free plan gives you a full backlink profile for verified domains — referring domains, domain rating distribution, anchor text breakdown, and new vs. lost link trends over time. Look at the unique referring domain count over the past 12 months: a steady decline is a leading indicator of ranking pressure ahead, because domain diversity is the primary link authority signal, not raw link count. Export your top 100 referring domains and manually review the top 20 for relevance and quality.
In Ahrefs Webmaster Tools, filter the Backlinks report for "Broken" — links from external sites hitting pages on your domain that now return 404. For any broken URL with a decent referring domain behind it, create a 301 redirect to the most relevant live page. This is one of the best returns you'll get from any link work — you're recovering equity you already earned rather than going out and building new relationships from scratch.
In Ahrefs, pull up your anchor text breakdown. A natural backlink profile is dominated by branded, URL, and generic anchors ("click here," "this post") — with exact-match keyword anchors as a minority. If exact-match keyword anchors are above 15–20% of your total, that's an over-optimisation pattern Google's Penguin algorithm (now part of core) can act on. The fix isn't to remove links — it's to earn more editorial coverage from authoritative publications, which naturally produces varied anchor text. [10]
In GSC, go to Security & Manual Actions → Manual Actions. A manual action from Google's web spam team — most often for unnatural inbound links, thin content, or cloaking — is the most severe finding in any SEO audit and takes precedence over everything else. If one is present, stop all other optimisation work until it's resolved via a reconsideration request. Manual actions are uncommon, and clean sites typically see "No issues detected" — but it takes 30 seconds to check and the consequences of missing one are serious.
10. Step 8 — Core Web Vitals & Page Experience Audit
Core Web Vitals are a confirmed Google ranking signal and a direct measure of whether your pages feel fast to real users. Fewer than half of websites currently pass all three thresholds according to the Chrome User Experience Report. [11] Failing CWV doesn't immediately tank your rankings, but it puts a ceiling on how far you can go for competitive queries where you're otherwise evenly matched with sites that do pass.
| Metric | What It Measures | Good Threshold | Common Cause of Failure | Audit Action |
|---|---|---|---|---|
| LCP Largest Contentful Paint |
How quickly the largest visible element loads from when the page starts loading | Under 2.5 seconds | Unoptimised hero images; render-blocking resources; slow server TTFB | PSI field data → GSC CWV report → identify failing pages by template type |
| INP Interaction to Next Paint |
Responsiveness to user interactions — replaced FID as the interactivity metric in March 2024 | Under 200ms | Heavy JavaScript execution blocking the main thread; excessive third-party scripts | Chrome DevTools → Performance tab → identify long tasks >50ms |
| CLS Cumulative Layout Shift |
Visual stability — measures how much page elements shift during loading | Under 0.1 | Images without explicit width/height attributes; late-loading ads; injected banners above content | PSI lab data → identify shifts; inspect images and ad containers for missing dimensions |
| TTFB Time to First Byte |
Server response time — time from navigation request to first byte received (not an official CWV but diagnostic) | Under 800ms | Slow hosting, no CDN, unoptimised database queries, excessive server-side processing | PSI lab data → TTFB; also critical for Bingbot crawl quality (target under 500ms) |
11. Step 9 — Analytics & Conversion Tracking Audit
Every recommendation in the previous eight steps depends on your data being accurate. If GA4 isn't firing correctly, or your conversion events are misconfigured, you're optimising based on numbers that don't reflect reality. Conversion tracking errors are the most underdiagnosed cause of misallocated SEO investment I encounter — and they're also the easiest to miss because everything appears to be "working" on the surface.
Open GA4 DebugView (Admin → DebugView) and walk through the main page types on your site: homepage, category pages, blog posts, landing pages, and conversion pages. Confirm a page_view event fires on each one. Common failures: pages where GTM wasn't added to a new template, SPA routes that don't fire page_view on navigation, and thank-you pages that send a conversion event but skip the page_view. HubSpot's 2025 State of Marketing report found 23% of marketing teams cite inaccurate attribution data as their top analytics challenge — most of those problems start with gaps in basic page tracking. [12]
In GA4, go to Admin → Channel Groups → Create channel group. Set up a channel called "AI Search" with rules that capture: Session source contains "chatgpt.com" OR "openai.com" OR "perplexity.ai" OR "gemini.google.com." Without this, AI referral traffic gets scattered across "Referral" and "Direct" — invisible in standard acquisition reports. Note that since June 2025, OpenAI has been adding utm_source=chatgpt.com to citation links on desktop, which helps attribution accuracy. Mobile app sessions can still show as Direct, so don't assume your chatgpt.com referral count represents total ChatGPT-driven traffic. [13]
In GA4, go to Admin → Events and check the list of events marked as conversions. Make sure every intended action — form submission, demo request, purchase, PDF download — actually has a conversion event. Then check the other direction: scroll depth, video plays, and page views should not be marked as conversions, because that inflates your conversion rate and makes channel comparisons meaningless. Finally, check for duplication — one form submission should fire exactly one conversion, not two or three from a redirect chain.
12. SEO Audit Mistakes to Avoid
| Mistake | Why It Produces a False Audit Result | Severity | Correct Approach |
|---|---|---|---|
| Using only one crawl tool and treating its output as complete | Every crawler interprets JavaScript, redirects, and canonicals differently. What one tool reports as clean, another flags — Bing-specific rendering gaps in particular don't show up in Screaming Frog alone. | HIGH | Cross-validate Screaming Frog against GSC Coverage and Bing Webmaster Tools Index Explorer. Each adds something the others miss. |
| Prioritising technical fixes over content quality issues | Technical SEO is the foundation, but once the basics are solid, content quality and E-E-A-T deficiencies usually deliver more ranking impact per hour than additional technical work. [4] | HIGH | Clear critical technical blockers first, then shift the bulk of your time to content quality, freshness, and E-E-A-T — that's where most of the organic and AI visibility upside sits. |
| Treating the audit as a one-time exercise | Google's algorithm, your competitors' content, and your own site all change constantly. An audit from 12 months ago won't catch issues introduced by a CMS update last quarter or a competitor who just refreshed their top pages. | MEDIUM | Full audit annually, targeted crawl check quarterly, GSC Coverage and Core Web Vitals reviewed monthly. That rhythm catches most problems before they compound. |
| Skipping the Bing index audit entirely | Shows up in 82% of sites before engaging IndexCraft. If you're only watching Google Search Console, you're blind to Bing coverage gaps that completely block ChatGPT Search citations — regardless of how well the content ranks on Google. [4] | CRITICAL | Bing Webmaster Tools goes in the mandatory audit stack. Bing index status for all priority pages gets checked at the start of every AI visibility audit. |
| Auditing pages in isolation without checking topic cluster coverage | Page-by-page analysis misses the cluster-level signals Google uses to assess topical authority. A strong pillar page surrounded by gaps in the cluster gets suppressed across the board. | MEDIUM | Map your content against topic clusters and check coverage gaps, not just individual page health. |
| Fixing all issues simultaneously without prioritisation | A typical audit surfaces 50–200 issues. Trying to fix them all at once means nothing ships fast enough to measure, your developers are backed up, and you can't tell what actually moved the needle. [4] | MEDIUM | Three tiers: (1) critical blockers — fix this week; (2) high-impact improvements — fix within 30 days; (3) optimisations — schedule into the quarterly roadmap. |
13. Complete SEO Audit Checklist 2026
🔍 Step 1: Crawlability & Indexation
- robots.txt reviewed manually — all Disallow rules verified as intentional
- Bingbot confirmed as "Allowed" via Bing Webmaster Tools Robots.txt Tester
- XML sitemap submitted to both Google Search Console and Bing Webmaster Tools
- GSC Coverage report reviewed — "Not indexed" reasons triaged and categorised
- Screaming Frog crawl completed — all 4xx and redirect chains exported
- Redirect chains longer than one hop identified and collapsed to direct 301s
- Accidental noindex tags on indexable pages identified and removed
- Orphan pages (in sitemap, zero internal links) identified for internal linking fix
⚙️ Step 2: Technical SEO Health
- Canonical tags verified on all key pages — self-referencing, consistent URL format
- HTTPS enforced site-wide — no HTTP pages, no mixed-content warnings
- All structured data validated via Google Rich Results Test — no errors
- JavaScript rendering tested with DevTools JS-disabled view and Bing URL Inspection
- Key page content confirmed visible in Bing's rendered HTML view
- Duplicate title tags and meta descriptions identified and resolved
- TTFB measured — target under 500ms for optimal Bingbot crawl quality
- If content is loaded client-side only: evaluate SSR or static pre-rendering for Bingbot compatibility
📄 Step 3: On-Page SEO & Content Structure
- All pages have unique, keyword-inclusive title tags (50–60 characters)
- All pages have unique meta descriptions (140–160 characters) — not auto-generated
- Every page has exactly one H1 tag containing the primary keyword
- H2 headings rewritten in question format for key informational pages
- First paragraph under each H2 is a 40–70 word direct-answer paragraph
- Definition sentences present for all key concepts on each page
- Image alt text added to all meaningful images — descriptive, not keyword-stuffed
- Internal linking to and from each key page verified — no isolated pages
📝 Step 4: Content Quality & Gaps
- Thin content pages (<300 words on informational topics) identified and scheduled for expansion or consolidation
- Keyword cannibalization audit completed in GSC — competing pages identified and differentiated or merged
- Content freshness trend analysed — pages with declining impressions scheduled for refresh
- Topic cluster coverage mapped against top 2 competitors — content gaps prioritised
- All statistics and claims verified as current — outdated data updated with 2025/2026 sources
- Pages targeting YMYL topics (health, finance, legal) reviewed against Google's Search Quality Evaluator Guidelines E-E-A-T criteria
👤 Step 5: E-E-A-T & Author Authority
- Every content page has a named author byline linked to an author bio page
- Author bio pages include: years of experience, specific credentials, named external publications
- About page names key team members and provides organisation transparency
- Contact page is accessible and functional — linked from site footer
- All factual claims are sourced with citations to named, credible, recent sources
- Article schema updated to include author @type, name, and url properties
- YMYL sites: editorial review process and medical/legal/financial reviewer credentials documented on applicable pages
🤖 Step 6: AI Visibility
- Top 10–20 priority pages verified as indexed in Bing via Bing Webmaster Tools URL Inspection
- OAI-SearchBot and GPTBot access rules in robots.txt reviewed and clarified
- FAQPage schema added and validated on all informational and FAQ pages
- Manual AI citation test completed in ChatGPT Search and Perplexity for 5 target queries
- GA4 "AI Search" custom channel group configured and collecting data
- Do not block OAI-SearchBot via robots.txt or WAF rules — this eliminates real-time ChatGPT Search retrieval entirely
🔗 Step 7: Backlinks & Off-Page
- Backlink profile baselined in Ahrefs Webmaster Tools — referring domain trend noted
- Broken backlinks (external links to your 404 pages) identified and 301 redirects created
- Anchor text distribution reviewed — exact-match anchors under 20% of total
- GSC Manual Actions report checked — "No issues detected" confirmed
⚡ Step 8: Core Web Vitals & Page Experience
- GSC Core Web Vitals report reviewed — "Poor" URLs and failing page template types identified
- Google PageSpeed Insights run on homepage, key blog post, and key landing page
- LCP under 2.5 seconds — confirmed in field data (CrUX) not just lab data
- INP under 200ms — long tasks in DevTools Performance panel checked
- CLS under 0.1 — images and ad containers have explicit width/height attributes
- Mobile usability errors in GSC reviewed and resolved
📊 Step 9: Analytics & Tracking
- GA4 page_view event verified firing on all key page templates via DebugView
- All primary conversion events (form submissions, purchases, demo requests) confirmed in GA4 Events
- No non-conversion interactions incorrectly marked as conversions
- AI Search custom channel group configured in GA4 (chatgpt.com, openai.com, perplexity.ai)
- GSC and GA4 linked in GSC Property settings for integrated query-to-conversion reporting
- Mobile app sessions from ChatGPT may appear as Direct traffic — do not assume AI referral volume equals chatgpt.com referral sessions alone
14. Frequently Asked Questions About SEO Audits
What is an SEO audit?
An SEO audit is a structured review of why a website isn't ranking where it should. It covers technical access (can search engines crawl and index the site?), on-page quality (are the pages structured to communicate relevance?), content (is it accurate, useful, and up to date?), authority signals, Core Web Vitals, and increasingly, AI search visibility. A thorough 2026 audit works through nine layers, combining automated crawl tools with manual review for the things no tool can assess — content quality, E-E-A-T, and whether pages are structured for AI extraction.
How long does a full SEO audit take?
For a small-to-medium site under 500 pages, the nine-step methodology in this guide typically takes 8–16 hours of focused work. Enterprise sites with thousands of URLs can push that to 40–80+ hours. If you just need to find the biggest blockers fast, the crawlability check (Step 1), Bing index audit (Step 6, item 1), and a direct-answer paragraph review (Step 3) can be done in 2–3 hours — and that triage will surface the highest-impact findings on most sites. [4]
How often should I do an SEO audit?
For most established sites, a full audit once a year is the minimum. If you're in a competitive vertical or publishing content regularly, quarterly makes more sense. Between full audits, a monthly check of GSC's Coverage and Core Web Vitals reports takes 30 minutes and catches most issues before they compound. One rule that overrides the calendar: any major site change — CMS migration, redesign, domain change, large-scale restructure — should trigger an immediate targeted audit of whatever changed, not waiting for the next scheduled cycle. [4]
What free tools can I use for an SEO audit?
Google Search Console, Bing Webmaster Tools, Google Analytics 4, Screaming Frog (free up to 500 URLs), Ahrefs Webmaster Tools (free plan for verified domains), Google PageSpeed Insights, Google Rich Results Test, and Chrome Lighthouse cover all nine audit steps in this guide — no paid tools required. The free Ahrefs Webmaster Tools plan is worth highlighting because it's widely underused: it gives you a full site audit and complete backlink profile for any domain you verify, which replaces the need for a paid backlink tool at most scales. [1]
What is included in a technical SEO audit?
A technical SEO audit covers everything that affects how efficiently search engines can crawl and index your site: robots.txt configuration, XML sitemap health, canonical tags, HTTPS enforcement, redirect chains, JavaScript rendering (for both Googlebot and Bingbot), structured data validity, internal linking, orphan pages, Core Web Vitals, and mobile usability. In 2026, Bing-specific crawl checks are a required part of any complete technical audit — Bing's index powers ChatGPT Search, and its JavaScript rendering limitations are meaningfully more restrictive than Googlebot's. [3]
How do I audit my site for AI search visibility?
An AI visibility audit has five parts: (1) check Bing index coverage for your target pages in Bing Webmaster Tools — any page not in Bing's index can't be cited by ChatGPT Search, full stop; (2) confirm Bingbot, OAI-SearchBot, and GPTBot aren't blocked in your robots.txt or WAF rules; (3) review content structure — first paragraph under each H2 should directly answer the heading in 40–70 words; (4) validate FAQPage and Article schema in Google Rich Results Test; (5) set up a GA4 custom channel group capturing chatgpt.com, openai.com, and perplexity.ai sessions. Then run your five most important queries manually in ChatGPT Search and Perplexity to see your actual citation status. [4][8]
Sources & References
📚 Research, Data & Documentation Referenced in This Guide
- Ahrefs — Data Study: How Much of the Web is Invisible to Search Engines (2024 Update)
Ahrefs' large-scale analysis of over one billion URLs from the Common Crawl dataset, updated in 2024, confirming that 96.55% of pages in the study receive zero organic traffic from Google. Core data informing audit prioritisation throughout this guide.
ahrefs.com/blog/search-traffic-study/ - Semrush — AI Overviews & Organic CTR Impact Study (2025)
Semrush research on the impact of Google AI Overviews on organic click-through rates, supplemented by Seer Interactive's September 2025 data showing CTR decline for queries where AI Overviews appear. Referenced in the audit scope section.
semrush.com/blog/ - Microsoft Bing Webmaster Blog — Bing Index & ChatGPT Search Webmaster Guidance (2024–2025)
Microsoft's official developer guidance confirming that ChatGPT Search draws from the Bing organic index, with advice for webmasters on ensuring Bing crawlability and AI search eligibility.
blogs.bing.com/webmaster/ - IndexCraft — Internal SEO Audit Data (October 2024 – March 2026)
Proprietary observational data from structured SEO audits conducted by Rohit Sharma across 47 client websites. Includes technical issue frequency rates, content refresh impact measurements, E-E-A-T improvement outcomes, and AI visibility audit findings. Aggregate statistical findings cited throughout this guide; full data available to clients under NDA. - Backlinko — Google's 200 Ranking Factors & Featured Snippet Research (2025)
Backlinko's research on Google ranking signals and featured snippet content characteristics, including the 40–50 word average snippet length finding referenced in the on-page section of this guide.
backlinko.com/google-ranking-factors - Google — Search Quality Evaluator Guidelines (December 2022, most recent public edition as of 2025)
Google's official guidelines used by human quality raters to evaluate search result quality, defining E-E-A-T as the primary page quality determinant. Referenced throughout the E-E-A-T audit section.
developers.google.com/search/blog/2022/12/google-raters-guidelines-e-e-a-t - OpenAI — GPTBot & OAI-SearchBot Crawler Documentation (2024–2025)
OpenAI's official documentation distinguishing between GPTBot (training data crawler) and OAI-SearchBot (real-time ChatGPT Search retrieval crawler), with recommended robots.txt configuration for webmasters.
platform.openai.com/docs/gptbot - BrightEdge — AI Search Behaviour Report 2025
BrightEdge enterprise SEO research on AI search citation patterns, content structure signals associated with AI citation selection, and Browse trigger rates by query category across ChatGPT Search and other platforms. Referenced in the AI visibility audit section.
brightedge.com/resources - Backlinko — Organic CTR Research: Google Click-Through Rate Statistics (2025 Update)
Backlinko's analysis of Google SERP click-through rates including the anchor text naturalness findings referenced in the backlink audit section.
backlinko.com/google-ctr-stats - Google — Core Web Vitals Technical Documentation & Chrome UX Report (2025)
Google's official documentation on Core Web Vitals thresholds (LCP, INP, CLS), measurement methodology, and Chrome User Experience Report (CrUX) data showing pass rates across web origins. Referenced in the page experience audit section.
developers.google.com/search/docs/appearance/core-web-vitals - HubSpot — State of Marketing Report 2025
HubSpot's annual marketing benchmark report including analytics and attribution accuracy findings, referenced in the analytics audit section. The 23% attribution challenge finding is from this report's measurement and analytics section.
hubspot.com/marketing-statistics - Digiday — State of AI Referral Traffic (December 2025)
Analysis of AI referral traffic trends using Conductor and Similarweb data. Key statistics cited in this guide: ChatGPT accounts for 87.4% of all AI referral traffic across major industries; since June 2025, OpenAI added utm_source=chatgpt.com to citation links in the web interface.
digiday.com/media/in-graphic-detail-the-state-of-ai-referral-traffic-in-2025/
Deep-dive into crawl architecture, JavaScript SEO, Core Web Vitals implementation, log file analysis, and advanced technical SEO strategies beyond the audit triage level.
Read technical SEO guide →Platform-exclusive deep-dive into ChatGPT Search's Bing-powered architecture, Browse tool mechanics, footnote citation format, and the content and technical signals that earn ChatGPT citations.
Read ChatGPT SEO guide →Platform-exclusive guide covering AI Mode's Gemini architecture, full-page search experience, and the content and technical signals specific to Google AI Mode citation.
Read Google AI Mode guide →The complete trust and authority signal framework for both traditional Google rankings and AI citation selection — named authorship, entity establishment, digital PR, and the credibility signals AI systems use to prefer your content.
Read E-E-A-T guide →