What we cover
Give me the TL;DR

A passing automated accessibility scan only confirms that a subset of WCAG issues are not present, but it does not verify real usability, full conformance, or ADA compliance.

This question comes up in almost every audit conversation we have. An agency has run axe DevTools or Siteimprove or a similar tool. The scan came back clean — or close to clean. Zero violations, or a handful of minor ones. And now the IT director or the web manager wants to know: does that mean we are done?

The answer is no. A passing automated scan score does not mean a website is accessible. It does not mean a website is WCAG 2.1 AA conformant. And it absolutely does not mean an agency has a defensible ADA Title II compliance program.

The gap between a clean automated scan and actual WCAG conformance is significant — approximately 60 to 70 percent of real-world accessibility failures are invisible to automated tools. The gap between WCAG conformance and a defensible ADA compliance program is even larger — conformance is a technical state that automated tools partially measure; a compliance program is a governance structure with documentation, monitoring, remediation tracking, and executive accountability that automated tools measure not at all.

Understanding these gaps — what automated tools catch, what they miss, and what they cannot even see — is essential for any agency that is trying to make informed decisions about its compliance posture. A false sense of security based on a passing scan score is not just a compliance risk. It is the specific kind of compliance risk that produces the largest surprises when an enforcement inquiry opens.

This guide explains exactly where automated accessibility testing ends and where the compliance obligation continues.

 

What Automated Accessibility Scans Actually Measures

Automated accessibility scanners — axe, WAVE, Siteimprove, Deque, Google Lighthouse, and others — test for WCAG success criteria that can be evaluated programmatically. This means criteria where the presence or absence of conformance can be determined by examining the page's code, structure, and computed properties without any human judgment about meaning, context, or user experience.

The criteria that automated tools can reliably detect:

Color contrast ratios. Tools can calculate the contrast ratio between a text element's color and its background color and compare it to the WCAG 2.1 AA threshold (4.5:1 for normal text, 3:1 for large text). This is entirely mathematical. The tool either finds a ratio above threshold or below.

Missing alt text attributes. Tools can detect when an <img> element has no alt attribute, or when the alt attribute is empty on an image that does not appear to be decorative based on its context. This is a code check.

Missing form label associations. Tools can check whether every <input>, <select>, and <textarea> element has an associated <label> element with a matching for/id pair, or an aria-label, or an aria-labelledby reference. This is a code check.

Page language. Tools can check whether the <html> element has a lang attribute specifying the page language. This is a code check.

Heading hierarchy. Tools can detect when heading levels are skipped — an H1 followed immediately by an H3 with no H2 — by examining the heading elements present in the DOM.

Duplicate IDs. Tools can detect when multiple elements on the page share the same id attribute value, which can cause ARIA attribute references to behave unpredictably.

ARIA attribute validity. Tools can check whether ARIA attributes have valid values and whether required attributes for given ARIA roles are present. A role="combobox" without a required aria-expanded attribute is flagged.

Link text descriptiveness. Tools can flag links whose text is "click here," "read more," or other generic phrases that do not describe the link destination.

Button name. Tools can detect when a <button> element has no accessible name — no text content, no aria-label, no aria-labelledby reference.

Frame titles. Tools can check whether <iframe> elements have a title attribute.

These checks are valuable. Catching them reliably across a large site at scale is something only automated tools can do efficiently. Automated scanning should absolutely be part of every accessibility program.

The problem is not with automated scanning. The problem is with what happens when a passing automated scan score is treated as a complete accessibility evaluation rather than as a first pass that catches a specific, limited category of failures.

 

What Accessibility Scanners Miss (Critical Failures)

The following categories of accessibility failures are either invisible to automated tools or require human judgment that automated tools cannot apply. These are the failures that determine whether a website is actually usable by residents with disabilities — and they are the failures that generate the most complaints and the most enforcement attention.

1. Keyboard Navigation Failures

Automated tools can check whether certain elements are in the tab order and whether they have visible focus indicators. They cannot navigate through a page using keyboard commands, experience the tab sequence from a user's perspective, or determine whether the tab order is logical and usable.

What tools miss:

Keyboard traps — points in the page where keyboard focus enters an interactive component and cannot exit. An automated tool might scan a date picker and confirm that it has appropriate ARIA roles. It cannot discover that once focus enters the date picker's calendar, pressing Escape does not close it and Tab does not exit it — leaving the user unable to move forward without a mouse.

Tab order that is technically present but practically disorienting. An automated tool sees that all interactive elements are in the tab order. It cannot perceive that the tab sequence jumps from a field in the left column to a button in the footer to a field in the right column — a sequence that is technically valid but completely disorients keyboard users.

Dropdown navigation that cannot be operated by keyboard. An automated tool can flag a navigation menu that has no keyboard interaction implemented. It cannot detect a navigation menu that has partial keyboard interaction — some submenus open with arrow keys, others require mouse interaction — because the partial implementation passes the automated check while failing real users.

The real-world consequence: An agency whose permit application form has a keyboard trap in the file upload component will pass an automated scan with flying colors. Every form field has a label. The upload button has an accessible name. The error messages reference the correct field IDs. None of this tells the scan that a keyboard user cannot actually complete the application because they cannot exit the file browser dialog.

2. Screen Reader Compatibility

Automated tools analyze the accessibility tree — the structured representation of the page that screen readers use. They cannot simulate how a screen reader actually reads and navigates a page. The difference is significant.

What tools miss:

ARIA implementations that are technically valid but produce confusing or meaningless announcements. An automated tool can confirm that a live region has a valid aria-live value. It cannot determine that the live region is announcing content changes at the wrong moment, in a confusing order, or in a way that interrupts the screen reader user's current reading position disruptively.

Form error handling that announces errors in a technically compliant way but at the wrong point in the user's navigation flow. A tool can confirm that error messages are associated with their fields via aria-describedby. It cannot simulate a user submitting a form, the errors appearing, the user trying to navigate back to the first errored field, and the error message not being read in the expected order because the focus management moved to the error summary before the user was ready.

Custom component announcements that are syntactically correct but semantically wrong. An automated tool can confirm that a custom accordion has role="button" and aria-expanded attributes. It cannot determine that the text of the button announcement — "Section 3 button collapsed" — is confusing because the button text is just the section title and the "button collapsed" portion is added by screen readers interpreting the role and state, producing an announcement that sounds grammatically wrong to the user navigating by headings.

Interactive state changes that are not announced. An automated tool scans the page at a point in time. It cannot trigger interactions, observe state changes, and verify whether those changes are announced to screen readers. A tab interface where clicking a tab loads new content — but the content change fires no announcement to the screen reader — will pass an automated scan because the structure of the tabs is correct. The failure only manifests when a screen reader user activates a tab and waits for content that they cannot perceive has changed.

3. Cognitive Accessibility Issues Scanners Ignore

WCAG 2.1 AA includes success criteria addressing cognitive accessibility — consistent navigation, error prevention, accessible authentication — but many cognitive accessibility barriers are not detectable by automated tools because they require human judgment about comprehension, clarity, and cognitive load.

What tools miss:

Reading level and plain language. Automated tools cannot evaluate whether the text on a page is comprehensible to a user with a cognitive disability, a user with limited literacy, or a user who is under stress. A permit application with instructions written at a college reading level with complex sentence structures and regulatory jargon passes every automated accessibility check because the text is technically formatted correctly.

Form instructions that are technically present but practically inadequate. A tool can confirm that a phone number field has a label and that format hints are present as aria-describedby text. It cannot determine that the format hint — "Please enter your phone number in the required format" — tells the user nothing about what the required format is.

Error messages that identify the problem but do not explain the solution. A tool can confirm that error messages are associated with their fields and announced by screen readers. It cannot evaluate whether "Invalid input" provides sufficient guidance for a user with a cognitive disability to understand what they entered incorrectly and what to do instead.

Complex multi-step form flows where the relationship between steps is not clearly communicated. A tool can scan each step of a multi-step form and confirm that the elements on each step are correctly labeled and keyboard accessible. It cannot evaluate whether the user understands that they are on step 3 of 7, what information is still needed, or how to return to a previous step to correct an entry.

4. Alternative Text Quality

Automated tools can detect the presence or absence of alt text. They cannot evaluate whether the alt text is meaningful, accurate, or equivalent to the information the image conveys.

What tools miss:

Filename-based alt text that passes the presence check. An image with alt="img-2024-0315-council.jpg" passes the automated check — the alt attribute is present and has content. But the alt text is a filename, not a description. A screen reader user hears "graphic img-2024-0315-council.jpg" and receives no useful information about the image.

Alt text that describes the visual appearance rather than the information conveyed. An image of a bar chart with alt="Bar chart with blue and orange bars" passes the automated check. But it communicates nothing about the data the chart shows. The automation cannot evaluate whether the alt text is equivalent to the visual content.

Decorative images with incorrect alt text. An automated tool flags images with empty alt attributes as potentially missing alt text — because it cannot determine definitively whether the image is decorative. A human tester evaluating the page in context can determine that the background texture image is decorative and the empty alt is correct, or that the data visualization is informational and the empty alt is wrong. The automated tool cannot make that determination.

AI-generated alt text that is plausible but incorrect for the specific content. Overlay products and some CMS plugins auto-generate alt text using image recognition. "Bar graph showing colored bars of different heights" may be what the image recognition model sees. It is not accessible alt text for a budget allocation chart where the specific data values and categories are what a blind user needs.

5. Document and PDF Accessibility

With limited exceptions, automated website accessibility scanners do not evaluate the accessibility of PDF documents linked from the website. The link to the document may be correctly labeled — the automated tool confirms that the link text is descriptive — but the document itself, which is where the actual accessibility failure lives, is not evaluated.

A government website can pass an automated accessibility scan with a green score while linking to hundreds of inaccessible scanned PDFs. The scan sees the links. It does not open the documents.

6. Third-Party Embedded Tools

Automated scanners typically do not evaluate content inside <iframe> elements from different origins — the cross-origin security restrictions prevent the scanner from accessing the iframe's DOM. A payment portal embedded in an iframe on the agency website is invisible to the scan that runs on the agency website. The scanner sees a correctly titled iframe. It does not see the inaccessible form inside it.

This is one of the most consequential limitations of automated testing for government websites, where third-party embedded tools — payment portals, permit systems, scheduling tools — are often the most critical service delivery surfaces and the most inaccessible components.

7. Mobile Accessibility

Most automated accessibility scans are conducted on desktop browser configurations. Mobile-specific accessibility failures — touch target sizes, gesture navigation, viewport scaling behavior, mobile screen reader compatibility — are not evaluated.

WCAG 2.1 includes several success criteria specifically relevant to mobile and touch interfaces. A website that passes automated scanning on desktop may have significant mobile accessibility failures that only manifest on touch devices.

 

The Detection Rate: What the Research Actually Shows

Independent research on the detection rate of automated accessibility tools has produced consistent findings over many years.

The WebAIM Million project — an annual analysis of the accessibility of the top one million websites — uses automated scanning to identify WCAG failures. The project's own methodology notes consistently that automated testing detects a fraction of all real-world accessibility failures. The specific figure varies by study and methodology, but the research consensus places automated detection at approximately 30 to 40 percent of WCAG 2.1 AA success criteria violations.

A 2020 study by Deque Systems — the company behind the axe tool — found that automated testing detects approximately 32 percent of accessibility issues. A 2022 study by researchers at Carnegie Mellon University estimated that automated tools detect between 25 and 37 percent of WCAG failures depending on the tool and the page type.

The most cited figure in professional accessibility practice is that automated tools catch approximately one-third of WCAG failures. The other two-thirds require manual testing — keyboard navigation testing, screen reader testing, and human evaluation of content quality and cognitive accessibility.

This does not mean automated testing is not valuable. Catching one-third of failures reliably at scale is a meaningful contribution to an accessibility program. It means that automated testing cannot be the whole program. The agency that relies exclusively on automated scanning is missing the majority of its real accessibility failures.

 

What a Passing Automated Scan Score Actually Means

A passing automated scan score means your website does not have any of the specific, programmatically detectable WCAG failures that the tool checks for. It means:

Your text contrast ratios are above threshold — at least where the tool was able to evaluate them.

Your images have alt attributes — though the quality of those attributes is not evaluated.

Your form fields have label associations — though whether those labels are announced correctly by screen readers is not evaluated.

Your page has a language attribute — which is meaningful for screen reader pronunciation.

Your heading structure has no skipped levels — though whether the headings actually describe the page's content structure is not evaluated.

A passing scan score means your site has passed a first-pass automated check of a specific subset of WCAG criteria. It does not mean:

Your site can be navigated using only a keyboard from any starting point to any transaction completion.

Your site's interactive components are compatible with screen readers across different screen reader and browser combinations.

Your site's error handling is accessible to screen reader users.

Your PDFs are accessible.

Your third-party embedded tools are accessible.

Your cognitive accessibility is adequate.

Your mobile accessibility is adequate.

And it definitely does not mean your agency has an ADA compliance program — because a compliance program is a governance structure, not a technical state.

 

What a DOJ Reviewer Evaluates vs. What an Automated Scan Evaluates

When an ADA Title II enforcement inquiry opens, the DOJ is evaluating whether the agency has a defensible compliance program. The evaluation criteria look nothing like an automated scan.

What an automated scan evaluates: The current technical state of specific, programmatically testable WCAG criteria on the pages scanned, at the moment of the scan.

What a DOJ reviewer evaluates:

Does the agency have a current baseline audit documenting where WCAG failures exist across the full digital environment — including PDFs, embedded tools, and transactional workflows?

Does the agency have a remediation log that shows what failures have been identified, when, and what has been done about them?

Does the agency have a risk-based prioritization framework that shows the highest-impact failures were addressed first?

Does the agency have an accessibility statement that honestly describes its current conformance status and provides a working complaint contact?

Does the agency have documentation that residents who reported accessibility barriers received responses?

Does the agency have training records showing that staff who create and publish content have been trained on accessibility?

Does the agency have monitoring records showing that it has been evaluating its compliance posture on an ongoing basis — not just at a point in time?

Does the agency have vendor documentation — VPATs, testing records — showing that third-party tools have been evaluated for accessibility?

Does the agency have executive reporting records showing that leadership is informed of and accountable for accessibility compliance?

Not one of these questions is answered by a passing automated scan score. Not one. The scan tells you about the current technical state of a subset of WCAG criteria on the pages you scanned. It tells you nothing about documentation, governance, vendor management, training, remediation history, or the dozens of other elements that constitute a defensible compliance program.

 

The Compliance Program vs. The Scan Score: A Direct Comparison

To make this concrete, here is what two agencies look like from a compliance standpoint — one that has a clean scan score with no compliance program, and one that has failing scan results but a functioning compliance program.

Agency A: Clean Scan, No Program

Agency A's web manager runs axe DevTools monthly. The results are consistently clean — a handful of minor warnings, no critical violations. The agency does not have a baseline audit. It does not have a remediation log. It has never tested keyboard navigation or screen reader compatibility on any of its transactional workflows. Its PDF library contains 1,400 documents, none of which have been evaluated for accessibility. Its billing portal and permit system are third-party tools whose VPATs have never been requested. It does not have an accessibility statement. Its web manager has never received accessibility training.

If an enforcement inquiry opens, Agency A can show clean scan results. It cannot show anything else that the DOJ evaluates. Its permit portal, which is completely keyboard inaccessible, has never been tested. Its PDFs are entirely inaccessible. It has no complaint contact, no training records, no remediation history, no monitoring records beyond the scan reports.

Agency A has a passing scan score and a completely indefensible compliance posture.

Agency B: Failing Scans, Real Program

Agency B has been building a compliance program for 18 months. Its automated scans still show violations — color contrast failures in some legacy content, a few heading hierarchy issues in older pages, some alt text gaps in the media library. Its scan score is not clean. But:

Agency B conducted a baseline audit 16 months ago that documented all identified failures across the website, transactional workflows, document library, and embedded vendor tools.

Agency B has a remediation log with 87 entries showing issues identified, actions taken, and validation dates. Its permit application workflow — identified in the baseline audit as having a keyboard trap — was remediated 14 months ago and validated with screen reader testing.

Agency B's billing portal vendor was contacted six months ago with a formal list of identified accessibility gaps. The vendor has committed to a remediation timeline. An accessible phone payment alternative is documented on the billing portal page.

Agency B has an accessibility statement with a working complaint contact. It has received four accessibility complaints and responded to all four within 10 business days with documented resolutions.

Agency B's web team and department coordinators received accessibility training eight months ago with completion records documented.

Agency B sends quarterly accessibility status reports to the city manager's office.

Agency B has failing automated scan results. It also has a documented, ongoing, defensible compliance program.

If an enforcement inquiry opens, Agency A produces scan reports. Agency B produces audit documentation, remediation logs, vendor correspondence, complaint records, training records, and executive reports.

Which agency is in a better compliance position?

 

The Right Role for Automated Scanning in a Compliance Program

Automated scanning is valuable. It just needs to be positioned correctly within a compliance program rather than treated as the compliance program.

The right role for automated scanning:

Monthly monitoring. Automated scans run on a monthly schedule across primary website templates and high-traffic pages provide continuous monitoring for regressions — new failures introduced by content updates, CMS changes, or template modifications. This is the monitoring function that keeps issues from accumulating undetected.

Scope prioritization for manual testing. Automated scan results can help prioritize which pages and templates to focus manual testing efforts on. Pages with high automated violation counts may have underlying structural issues that manual testing can investigate more deeply.

Regression detection after changes. Running an automated scan before and after a significant website change — a CMS update, a template modification, a new component deployment — provides a quick check for regressions that should prompt more thorough manual testing if new violations appear.

Baseline documentation. Automated scan results from a specific date provide documented evidence that the agency assessed its compliance posture on that date. This is meaningful documentation even when it shows failures — because the agency that assessed and found failures is in a better position than the agency that never assessed.

What automated scanning is not: a substitute for manual keyboard navigation testing, screen reader testing, document accessibility evaluation, vendor tool testing, or any of the governance elements that constitute a compliance program.

 

Building a Complete Testing Program

The complete accessibility testing program for a government agency uses automated scanning as one component of a multi-method approach.

Automated monthly scans — axe DevTools, WAVE, or Siteimprove running monthly across primary templates and high-traffic pages. Results documented and reviewed. New violations logged in the remediation log.

Quarterly manual keyboard navigation testing — a structured protocol testing all transactional workflows end to end using keyboard only. Results documented. Failures logged.

Quarterly manual screen reader testing — NVDA on Chrome for primary transactional workflows. Results documented. Failures logged.

Annual document accessibility evaluation — a sample of the document library evaluated against the priority matrix. Scanned documents identified. Untagged exports identified. Priority remediation queue updated.

Annual vendor tool testing — keyboard navigation and screen reader testing of each embedded third-party tool. VPAT review for any tools with updated versions.

Event-triggered testing — keyboard and automated testing following any significant website change, new component deployment, or vendor platform update.

This is the monitoring and testing infrastructure of a compliance program. Automated scanning is one line in the schedule — monthly, important, documented, and insufficient on its own.

 

Related: 

How to Make a PDF Accessible

ADA Compliance Checklist

Accessibility Remediation Log

WCAG 2.1 AA Explained

How to Train Your Government Staff on Accessibility

 

FAQ: Automated Accessibility Scanning and ADA Compliance

Does passing an automated accessibility scan mean a website is WCAG 2.1 AA compliant? No. Automated accessibility tools detect approximately 30 to 40 percent of WCAG 2.1 AA failures — the failures that can be identified by examining page code and computed properties without human judgment. The majority of real-world accessibility failures — keyboard navigation failures, screen reader incompatibilities, cognitive accessibility issues, document accessibility failures, and third-party embedded tool failures — require manual testing to identify and are invisible to automated scanners. A passing automated scan means a website has passed a first-pass check of a specific subset of WCAG criteria. It does not mean the website is accessible to users with disabilities who rely on assistive technology.

What percentage of accessibility failures do automated tools actually catch? Research consistently places automated detection at approximately 30 to 40 percent of WCAG 2.1 AA success criteria violations. Studies by Deque Systems (the company behind the axe tool), Carnegie Mellon University, and WebAIM have produced estimates in this range across different tool configurations and page types. The remaining 60 to 70 percent of failures require manual testing — keyboard navigation testing, screen reader testing with NVDA and VoiceOver, and human evaluation of content quality and cognitive accessibility.

What are the most important accessibility failures that automated tools cannot detect? The most consequential failures invisible to automated tools are: keyboard navigation failures including keyboard traps and illogical tab order in transactional workflows; screen reader incompatibilities in interactive components including inaccessible dynamic content updates, incorrect ARIA state announcements, and broken focus management; alt text quality failures where alt attributes are present but contain filenames, auto-generated descriptions, or visual descriptions rather than meaningful content equivalents; PDF and document accessibility failures (most scanners do not evaluate linked documents); and third-party embedded tool failures (most scanners cannot access cross-origin iframe content).

What does a DOJ reviewer evaluate that automated scanning does not measure? A DOJ enforcement review evaluates whether an agency has a defensible, documented compliance program — not whether its automated scan score is clean. The review looks at: a current baseline audit documenting failures across the full digital environment; a remediation log showing identified issues and actions taken; an accessibility statement with a working complaint contact; complaint intake records and response documentation; staff training records; vendor accessibility documentation including VPATs; monitoring records showing ongoing evaluation; and executive reporting records. None of these elements are produced by or evaluated by an automated accessibility scan.

Should government agencies stop using automated accessibility scanning? No. Automated scanning is a valuable component of a compliance program — it catches a reliable subset of failures efficiently at scale and provides continuous monitoring for regressions. It should be run monthly across primary templates and documented carefully. The problem is not with automated scanning — it is with treating automated scanning as sufficient for compliance purposes. Automated scanning should be positioned as one component of a multi-method testing program that also includes quarterly manual keyboard testing, quarterly screen reader testing, annual document evaluation, and annual vendor tool testing. The compliance program is the combination of testing, remediation, documentation, monitoring, and governance — not any single testing method.

What should an agency do if its automated scan results are clean but it has not done manual testing? A clean automated scan with no manual testing is an incomplete picture of the site's accessibility status. The next steps are: conduct manual keyboard navigation testing on the five highest-traffic pages and all primary transactional workflows; conduct screen reader testing with NVDA on Chrome on the same surfaces; evaluate a sample of the document library for PDF accessibility; test embedded third-party tools for keyboard navigation; and review vendor VPATs for all embedded tools. The results of these tests — likely revealing failures that the clean scan did not detect — should be documented in a remediation log and addressed in priority order starting with transactional workflow failures. A compliance program built from this point forward will be defensible in ways that a scan-score-only program is not.

Share this post