In January I've removed tens of thousands of web pages on www.debian.org. Have you noticed it?

In the past

From 1997 onwards, we had web pages for security announcements. We had to manually prepare a .data and a .wml file which then generated a web page for each security announcement (DSA or DLA). We have listed the 6 most recent messages in a short list that was created from these files. Most of the work that went into the Debian web pages was creating these files.

Our search engine often listed the pages with security announcements instead of a more relevant web page for a particular topic.

Preparation

At DebConf Kosovo (2022) I started with a proof of concept and wrote a script, that generates this list without using the .data/.wml files in the Git repository, but instead reading the primary sources of security information[1]. This new list now includes links to the security tracker and the email of the announcement.

Following web pages and scripts were also using these .data and .wml files:

  • OVAL files
  • RSS feeds for security announcements (and LTS)
  • Apache config file for mapping URLs from dsa-NNN to YEAR/dsa-NNN
  • A huge list of crossreferences between DSA and CVE numbers

Before I could remove all the security web pages, I had to adjust the scripts, that create the above information.

When I looked at the OVAL files and the apache logs of our web server, I saw that more than 99% of the web traffic was generated by these XML files (134TB of 135TB total in two weeks). They were not compressed and were around 50MB in size. With the help of Carsten Schönert we managed to modify the python scripts that generate this OVAL file without using the .data/.wml files and now we only provide bzip2 compressed XML files[2].

The RSS feeds are created by the new Perl script which reads the DSA/DLA list the security tracker and determines the URL of the email of all entries. This script also generates the list of the most recent DSA/DLA entries. Currently we show the last 350 entries which covers more than the last year and includes links to the announcement email and the security tracker.

The huge list of crossreferences is not needed any more, since the mapping of CVE to DSA is already included in the DSA list[3] of the security tracker.

The amount of translations of the DSA/DLA was very different. French translations were almost all done, but all other languages did translations for a couple of months or years only. E.g. in 2022, Italian had 2 translations, Russian 15, Danish 212, French and English each 279. But from 2023 on only French translations were made. By generating the list of DSA/DLA we lost the ability to translate these web pages, but since these announcements are made of simple, identical sentences it is easy to use an automatic translation service if needed.

Now the translation statistics of all web pages are more accurate. Instead of 12200 pages that need to be translated (including all these old DSA/DLA) there are now only 2500 pages to translate[4]. Languages that had a lot of old translations of DSA/DLA lost some percentage but languages that are doing translations of newer web pages won in the statistics of how many pages are translated. Examples:

Before

German (de)   3501  28.5%
Italian (it)  1005   8.2%
Danish (da)   6336  51.7%

After

German (de)   1486  59.0%
Italian (it)   909  36.1%
Danish (da)    982  39.0%

Cleanup of all the security web pages

Finally in January, I could remove all web pages of the security announcements in one git commit[5]. Using several git rm -rf commands this commit removed 54335 files, including around 9650 DSA/DLA data files, 44189 wml files, nearly 500 Makefiles.

Outcome

No more manual work is needed for the security team and we now have direct links from a DSA-NNN/DLA-NNN to the email in our mailing list archive. This was not possible before. The search results became more accurate.

But we still host a lot of other old content on the Debian web pages which may be removed in the future.

[1] https://www.debian.org/security/#infos

[2] https://www.debian.org/security/oval/

[3] https://salsa.debian.org/security-tracker-team/security-tracker/-/raw/master/data/DSA/list

[4] https://www.debian.org/devel/website/stats

[5] https://salsa.debian.org/webmaster-team/webwml/-/commit/2aa73ff15bfc4eb2afd85c