PDFs often inaccessible on most visited websites
However, the accessibility of these documents, which are often available in large quantities on Luxembourg's most visited public portals in 2023, is steadily improving
Friday, April 28, 2023
The prevalence of inaccessible PDF files on public websites is a significant problem for people with disabilities. These accessibility issues can completely prevent access to vital information and hinder the completion of administrative procedures, particularly when forms are concerned. In this article, we look at the accessibility of PDF files on the 17 most frequently visited public websites in Luxembourg.
When a PDF document is digitised as an image or not tagged, a blind or partially sighted user has no access at all to its content (for more details on this subject, see the article "PDF and accessibility, the false good idea").
The study
In April 2023, the SIP analysed a sample of PDF files from the 17 most visited public websites in Luxembourg, according to Google's Top 1 Million:
- adem.lu,
- cita.lu,
- gouvernement.lu,
- govjobs.lu,
- guichet.lu,
- impotsdirects.public.lu,
- inll.lu,
- inondations.lu,
- itm.lu,
- lod.lu,
- luxembourg.lu,
- map.geoportail.lu,
- meteolux.lu,
- mobiliteit.lu,
- petitions.lu,
- portal.education.lu,
- vdl.lu.
The analysis focused essentially on the three most blocking accessibility problems. There are, of course, many other potential accessibility problems (see the RAPDF PDF accessibility assessment framework for all the criteria to be met for producing accessible PDFs), but tagging is a prerequisite. If it is missing, a PDF is immediately considered not accessible.
The results
General
We analysed 25,398 PDF files, representing a volume of 42 GB and over 471,000 pages. PDF documents represent 95% of the office files downloaded from the sites analysed. The remaining 5% were mainly documents from the Microsoft Office suite.
Of all the PDF documents available for download, 46% are a priori exempt from the accessibility obligation, as they were published before 23 September 2018 (exemption provided for in the law of 28 May 2019). In the remainder of this article, we will only consider PDF documents that are subject to the accessibility obligation, i.e. forms and documents published after 23 September 2018.
With regard to the accessibility of these documents, we have detected that 59% are untagged. Of these untagged documents, 9% are forms and 16% are scanned documents on which no optical character recognition has been performed.
From another perspective, if we look at the tagging of all PDFs according to their nature, we see that around 10% of PDF documents are forms. These documents are therefore particularly important because they support active administrative procedures. 52% are not tagged.
On a positive note, very few documents are protected against the use of assistive technologies (0.03%).
Evolution over time
Based on the last modification date of the files studied, we can identify some interesting trends over the last four years. While the number of documents published per year has been increasing since 2019, the share of untagged PDFs is decreasing (from 64% in 2019 to 53% in 2022).
While the number of untagged PDFs is still far too high, particularly for forms, the general trend is towards a gradual improvement in the accessibility of downloadable documents.
Comparison of the main sites
There are significant differences between the sites in terms of the proportion of tagged PDFs. We are selecting here the sites that have more than a hundred PDF files available. Guichet.lu leads the way with 82% and meteolux.lu comes in at just 3%.
These results must be qualified, however, as our automatic tests do not allow us to determine whether the documents in question are exempt from the obligation to comply with accessibility standards. This is because a document may be issued by a third party and not be under the control of the publishing organisation, or an accessible alternative may be available. These two exceptions are provided for in the law.
Impact of production method
We then wanted to know the origin of tagged and untagged files. Fortunately, the PDF format has "Creator" and "Producer" metadata that can be used to identify the source.
Below are the Top 5 software and hardware products that our tests identified as producing the most tagged and untagged documents:
Top 5 producers of tagged PDFs
- Microsoft Word
- Adobe Acrobat PDFMaker
- Adobe InDesign
- Adobe LiveCycle Designer
- Microsoft Powerpoint
Top 5 untagged PDF producers
- Adobe InDesign
- Konica Minolta
- Pscript5.dll (Acrobat Distiller or GhostScript)
- Microsoft Print to PDF
- Adobe Acrobat
There is still a significant proportion (35%) of files whose origin cannot be identified via their metadata.
The main producers of untagged PDFs are InDesign DTP software, scanners and the PDF printing functionality included in most recent operating systems.
The prevalence of documents digitised as images on public sites varies. Their number is very low on a site such as guichet.lu (2% of untagged PDFs on this site) but very high on that of the City of Luxembourg (52% of untagged PDFs on the site).
Analysis of accessibility statements
All public websites are required to publish an accessibility statement. This is generally available via an "Accessibility" link in the footer of each page. The organisations in charge of these sites must describe the level of accessibility achieved and any accessibility problems of which they are aware. We wanted to find out whether these organisations are aware of any accessibility problems with the PDF files they publish.
11 of the 17 sites studied have an accessibility statement. These include
- 8 invoke an exemption provided for by law for old documents (4) or for documents originating from third parties (7).
- 7 invoke a derogation for disproportionate burden: the work involved in bringing their PDF documents into compliance would be too costly in relation to the estimated benefit for citizens.
- 3 mention PDFs as a non-compliance that will be corrected.
None of these statements make it possible to identify precisely which PDFs on their sites are not accessible.
Most of the organisations responsible for these sites are therefore aware of the problem, but are not necessarily in a position to resolve it in a simple way.
How can PDF documents be made accessible?
As we saw above, the top three producers of untagged PDFs are DTP software, scanners and the PDF printing function. That is why we believe it is important to raise awareness and to train the teams in charge of the production of brochures. If this is outsourced, it should be possible to include accessibility in the request (see our page on specifications).
It would also be appropriate to put in place processes to manage the accessibility of digitised documents (OCR and tagging stages or provision of an accessible alternative such as the source document before printing and digitisation) and, finally, to raise awareness among administrations to avoid using the PDF printing function as much as possible and to favour PDF export, which produces tagged documents.
To go further and work on the accessibility of the PDF documents produced, the SIP is making available the RAPDF framework, which sets out all the criteria to be met, and is offering a training course for the public sector entitled "Accessibility of PDF documents in practice". If you are interested, don't hesitate to sign up.