LinkMedic
Contents
A Python script for checking links and resources used in local static webpages (.htm
, .html
). With optional dependencies, it can also work with OpenDocument files (.odt
, .odp
, .ods
), single OpenDocument XML files (.fodt
, .fodp
, .fods
), and user-defined XML files.
linkmedic
starts a test web server, requests an entry page from the server, and crawls all local pages. It checks all links within specific HTML tags (by default: <a>
, <img>
, <script>
, <link>
, <iframe>
, and <event-listener>
) and reports any "dead" links found. If a link appears on multiple pages, it is tested only once. By default, links to external websites are ignored. If there is a .linkignore
file in the website's root, links matching the regular expressions listed in this file (one pattern per line; see below for examples) are also ignored during testing. After checking all the links, if any dead links are discovered, linkmedic
exits with a non-zero status code.
For testing links in dynamic HTML content (e.g., using JavaScript template engines) or other document formats, you must first convert your files (using a third-party tool) to static HTML and then run linkmedic
.
Quick start
Install prerequisites
Depending on your operating system, you may have multiple options for installing the prerequisites. For a typical installation you will need:
-
Python:
linkmedic
is only tested on officially supported Python versions. - A
Python
package installer: For example, pip or pipx
Install linkmedic
You can install the linkmedic
using your favorite Python package installer. For example, using pipx
, you can download it from PyPI:
pipx install linkmedic
Run
To start a test web server with files at /var/www
and crawl the pages and test all the links starting from the /var/www/index.html
page, run:
linkmedic --root=/var/www
Distribution Options
As linkmedic
is a Python script, it requires a working Python interpreter to be executed. Open-source implementations like CPython and PyPy support multiple operating systems and hardware architectures. Below are the available options for using this package, along with their requirements and details, sorted by size.
Container image
- Installation: Use a container engine like
docker
orpodman
to pull the image from Quay or MPCDF GitLab container registry:
podman pull quay.io/meisam/linkmedic:latest
podman pull gitlab-registry.mpcdf.mpg.de/tbz/linkmedic:latest
- Availability: All versions
- Content (latest): Alpine Linux Base OS with system packages, Python packages, and linkmedkit.
- Requirements: Container engine
- Compatibility: Works with open-source engines like podman (which support multiple operating systems such as Linux, macOS, Windows, and hardware architecture)
- Size (version
0.9.1.dev91+g7752c53
): ~32MB - Automatic update: Supported via tools like podman-auto-update
- Notes: Original distribution method. Includes bash shell with globstar enabled for CI environments. Uses musl library instead of GNU libc. Access specific versions using tags (e.g.,
linkmedic:v0.7.4
). Available tags: Quay
Mount website files when using containers:
podman run --volume /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test
The --volume
flag maps /www/public
to /test
inside the container.
Static single binary
- Installation: Download the file with name ending in
-static
from this page, set execute permissions, and run. - Availability: Latest version only (UNSTABLE)
- Content:
linkmedic
scripts, all dependencies, Python interpreter, and required system packages in compressed format - Requirements: None
- Compatibility: Most Linux distributions (x86-64 only)
- Size (version
0.9.1.dev91+g7752c53
): ~17 MB - Automatic update: Not supported
- Notes: Minimal reproducible method. This distribution method may not suit packages with large data or short execution times.
Dynamically-linked single binary
- Installation: Download the non-
-static
file from this page, set execute permissions, and run. - Availability: Latest version only (UNSTABLE)
- Content:
linkmedic
scripts, dependencies, and Python interpreter - Requirements: glibc and related system packages
- Compatibility: Linux with recent glibc (x86-64 only)
- Size (version
0.9.1.dev91+g7752c53
): ~16 MB - Automatic update: Not supported
Build from source
- Installation: Install directly from Git:
pipx install git+https://gitlab.mpcdf.mpg.de/tbz/linkmedic.git
- Availability: All versions
- Content: Git snapshot with submodules
- Requirements: Git, a python package manager (for example, pipx), and Python
- Compatibility: Any Python-compatible environment
- Size (version
0.9.1.dev91+g7752c53
): ~2.1MB - Automatic update: Supported
- Notes: Supports PEP 517 and PEP 621 builders. Tested with pdm-backend, flit-core, and hatchling.
Released Python package
- Installation: Install from PyPI or MPCDF GitLab:
pipx install linkmedic
pipx install linkmedic --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple
- Availability: All versions
- Content: Pure-Python wheel built by PDM using PDM-backend
- Requirements: a python package manager (for example, pipx) and Python
- Compatibility: Any Python-compatible environment
- Size (version
0.9.0
): ~20KB - Automatic update: Supported
Package with Optional Dependencies
To test OpenDocument files (.odt
, .odp
, .ods
) or single XML files (.fodt
, .fodp
, .fods
), install with:
pipx install linkmedic[odf]
User's Guide
CI/CD
You can also use the container image in your CI/CD pipelines. For example, for GitLab CI, in the .gitlab-ci.yml
file:
test_internal_links:
image: quay.io/meisam/linkmedic:latest
script:
- linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge
after_script:
- gitlab_badge_sticker.sh
or for Woodpecker CI in the .woodpecker.yml
file:
test_internal_links:
image: quay.io/meisam/linkmedic:latest
commands:
- linkmedic --root=/var/www/ --entry=index.html --warn-http
If you want to check the external links of your website in your CI pipeline, you must avoid running multiple tests in a short period of time, e.g., on each commit to the development branches. Otherwise, the IP address of your CI runners may get banned by external web servers. For example, in GitLab CI, you can limit the external link checks to only the default branch of your Git repository:
test_external_links:
image: quay.io/meisam/linkmedic:latest
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
script:
- linkmedic --root=/var/www/ --ignore-local --with-badge
after_script:
- gitlab_badge_sticker.sh
allow_failure: true
Please note that the gitlab_badge_sticker.sh
script used in these examples requires an API access token CI_API_TOKEN
with maintainer permission to modify the GitLab repository badges. See the linkmedkit documentation for more details.
CLI reference
- Display help: This will show all the command-line options and their default values.
linkmedic -h
- Start the web server with the current directory as the root path of the server. Starting from
index.html
, crawl the pages and test all the links.
linkmedic
- Start the web server with
./tests/public1/
as the root path of the server. Starting fromindex.html
, crawl the pages and test all the links.
linkmedic --root=./tests/public1/
- Start the web server with
./tests/public1/
as the root path of the server. Starting fromindex2.html
, crawl the pages and test all the links. The entry point should be relative to the server root. (In the example,index2.html
should be accessible at./tests/public1/index2.html
)
linkmedic --root=./tests/public1/ --entry=index2.html
- Configure the test web server not to redirect missing local pages (e.g., from
/directory/page
to/directory/page.html
).
linkmedic --no-local-redirect
-
Check links to external websites.
⚠️ IMPORTANT: You must avoid running the link checker on external links multiple times in a short period, e.g., on each commit to the development branch. Otherwise, the IP address of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external web servers. See the CI/CD section for a possible solution.
linkmedic --check-external
- Do not follow external link redirections and consider the link to be alive. Depending on the configuration of external web servers, this option can result in some dead links not being detetcted when instead of returning 404 page directly, the webserver is asking the client to load another page.
linkmedic --no-external-redirects
- Ignore local dead links and activates external link checking.
linkmedic --ignore-local
- Do not consider external links that return HTTP status codes 403 and 503 as dead links.
linkmedic --ignore-status 403 503
- Check links in an OpenDocument file (e.g.,
.odt
,.odp
,.ods
), or a single OpenDocument XML file (e.g.,.fodt
,.fodp
,.fods
).
linkmedic --entry=./presentation.odp
- Show warning for HTTP links.
linkmedic --warn-http
- If any link to
mydomain.com
is encountered, treat it as an internal link and resolve it locally.
linkmedic --domain=mydomain.com
- Start the web server on port 3000. If the web server cannot be started on the requested port, the initializer will automatically try the next available ports.
linkmedic --port=3000
- Generate badge information file. Depending on the type of diagnosis, this file will be named
badge.dead_internal_links.json
,badge.dead_external_links.json
, orbadge.dead_links.json
. If the--warn-http
flag is used, a badge file for the number of discovered HTTP links will also be written to thebadge.http_links.json
file. These files can be used to generate badges (see linkmedkit scripts) or to serve as a response for the shields.io endpoint.
linkmedic --with-badge
- Check the links but always exit with code 0.
linkmedic --exit-zero
- Log the output at a different level of verbosity. If more than one of these flags is defined, the most restrictive one will be in effect.
-
--verbose
: log debug information -
--quiet
: log only errors -
--silent
: completely silence the output logs
-
- Read guidelines overrides from
linkmedic.guides.ini
instead of the default path (.linkmedic.ini
)
linkmedic --guidelines-override-file=linkmedic.guides.ini
- Dump all guidelines to
log.linkmedic.ini
and exit. You can use this file as a template to write overrides for the default values.
linkmedic --guidelines-dump-file=log.linkmedic.ini
- Dump the crawler links list to the a file. The filename is set in guidelines (
[crawler] -> links_dump_file
). If the--domain
flag has not been set, local links will be referenced from the website root as/your/path/page.html
.
linkmedic --dump-links
Guidelines
You can override the internal guidelines (configuration) of linkmedic
by adding a .linkmedic.ini
file with your desired values. This file is parsed using Python's internal configparser module. You can choose a different name for this file using the --guidelines-override-file
flag. The default values will be used for any options that are missing in the override file.
The guidelines are logged to the output while running in verbose mode and can be saved to a file using the --guidelines-dump-file
flag.
.linkignore
Each line in the .linkignore
file specifies a regex pattern for addresses that should be ignored during link checks. Note that regex matches .
to any character (use \.
for matching only to .
) and the leading /
is considered when matching local links.
/ignore/.*/this
/invalidfile\.tar\.gz
/will_add/later\.html
https://not\.accessible\.com
Reporting Issues
Please report bugs and code-related issues here. If you have an MPCDF account, use the upstream repository instead.
Development and Maintenance
This repository is frequently used as a template for configuring Python development environments and CI/CD pipelines. It is intentionally designed with strict boundaries while prioritizing scalability and maintainability. Third-party dependencies are minimized to support this goal.
Code coverage is intentionally not 100%. While a few testing approaches are demonstrated, the focus is on showcasing practical methods rather than exhaustive coverage.
The design goal is to who the possibility of having the entire development toolchain in a PDM-managed virtual environment, and also a CI container, showing multiple methods developer can run CI pipelines locally. PDM tracks exact Python dependency versions, which are detailed in its PEP 751 lock file pylock.toml
.
Versioning is dynamic, based on Git tags. Project documentation is versioned, and its HTML output is automatically built and deployed here.
CI and release container recipes are versioned, with OS packages sourced from the latest minor version of their base OS image at build time. The development toolchain includes:
- Development, release
- git: source versioning
- flit_core (Flit backend): ensures packaging script compatibility
- hatchling (Hatch backend): ensures packaging script compatibility
- pdm: packaging and dependency managing
- pdm-backend: packaging
- pre-commit: linting before each commit
- pyinstaller: converting packages to dynamic single binaries
- staticx: converting dynamic binaries into static
- sync-pre-commit-lock: syncing pre-commit tool versions from lockfile
- Linting, styling
- bandit: security analysing
- black: Python script styling
- codespell: fixing common spelling errors
- licensecheck: checking licenses used by
- mypy: static typing
- pip-audit: auditing dependency security
- pydoclint: docstring linter
- pylint: linting Python scripts
- restructuredtext-lint
- reuse: testing compliance with the REUSE recommendations.
- ruff: Python linting and code formatting
- shellcheck-py: shell script linting
- shfmt: shell script styling
- toml-sort: TOML file sorting
- trivy: security scanner
- vermin: checking minimum compatible Python version
- Testing
- bash shell for scripts
- check-jsonschema: validating JSON schema
- coverage: measuring test coverage
- gitlabci-local: for running GitLab CI jobs locally
- jq: parsing JSON files
- pytest: testing framework
- Documentation
Refer to the developer's guide for code development details. See the maintainer's guide for maintenance and release checklists.
History
The original idea for this project came from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated the packaging and release of this project.
Accompanying tools for linkmedic
have been moved to a separate repository (linkmedkit) starting with version 0.7.
License
- Copyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)
- Copyright 2023-2025 M. Farzalipour Tabriz, Max Planck Institute for Physics (MPP)
All rights reserved.
This software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.