We run a travel blog (joyofexploringtheworld.com) on Docker Compose with Traefik v3 and Cloudflare. One morning Google Search Console showed every page blocked from indexing with “Failed: Robots.txt unreachable”. The site was working fine in a browser, so what was going on?
Two separate issues were conspiring to break Googlebot’s ability to fetch /robots.txt. Here’s what we found and how we fixed both.
The setup Link to heading
Our WordPress service sits on two Docker networks: app-network (shared with Traefik, Redis, imgproxy) and db-network (shared with MariaDB). We run two scaled WordPress containers behind Traefik’s load balancer.
wordpress:
build: .
scale: 2
networks:
- app-network
- db-network
labels:
- 'traefik.enable=true'
- 'traefik.http.routers.wordpress.entrypoints=websecure'
# ...
Problem 1: Traefik routing to the wrong network Link to heading
When a Docker service connects to multiple networks, Traefik sees IP addresses for both. Our two WordPress containers had IPs on app-network (reachable from Traefik) and db-network (not reachable from Traefik). Traefik was randomly load-balancing to the db-network IP, causing intermittent 504 Gateway Timeouts.
We confirmed this by querying Traefik’s API:
curl -s http://127.0.0.1:8081/api/http/services/wordpress@docker \
| jq '.loadBalancer.servers'
The output showed four server entries — two on 172.18.x.x (app-network, reachable) and two on 172.19.x.x (db-network, unreachable). Every other request was timing out.
The fix Link to heading
Add a single label telling Traefik which Docker network to use:
wordpress:
labels:
- 'traefik.docker.network=wordpress_app-network'
The value is the Docker network name as it appears in docker network ls — typically <project>_<network>, so wordpress_app-network for a project directory called wordpress. After restarting the WordPress containers, Traefik’s API showed only the two correct app-network IPs.
Problem 2: Wrong Content-Type on robots.txt Link to heading
With routing fixed, we expected Google Search Console’s Live Test to pass immediately. It didn’t. We dug into the response headers:
curl -sI https://joyofexploringtheworld.com/robots.txt | grep -i content-type
The response came back as text/html; charset=UTF-8 instead of text/plain. Googlebot may reject a robots.txt served with the wrong MIME type — the Robots Exclusion Protocol spec requires text/plain.
WordPress generates /robots.txt dynamically via PHP rewrite rules, so Apache’s FilesMatch "\.(html|htm|php)$" rule was setting text/html headers. Since there’s no physical robots.txt file on disk, FilesMatch on the filename doesn’t work — we needed to match the request URI.
The fix Link to heading
We added an <If> directive to our Apache config that matches the original request line:
<IfModule mod_headers.c>
<If "%{THE_REQUEST} =~ m#/robots\.txt#">
Header set Content-Type "text/plain; charset=UTF-8"
</If>
</IfModule>
We use %{THE_REQUEST} (the original HTTP request line) rather than %{REQUEST_URI} because WordPress rewrites the URI to /index.php before headers are finalised. THE_REQUEST preserves the original path.
While we were in this file, we also added proper caching for sitemaps. Our no-cache rule for .php files was preventing Cloudflare from caching Rank Math’s dynamic XML sitemaps:
<IfModule mod_headers.c>
<If "%{THE_REQUEST} =~ /sitemap.*\.xml/">
Header set Cache-Control "public, max-age=3600, s-maxage=3600"
Header unset Pragma
Header unset Expires
</If>
</IfModule>
Verifying the fix Link to heading
After restarting the containers and reloading Apache, we ran a quick stress test to confirm there were no more intermittent failures:
for i in $(seq 1 20); do
docker compose exec -T wordpress \
curl -so /dev/null -w '%{http_code} %{content_type}\n' \
-H 'User-Agent: Googlebot' http://localhost/robots.txt
done
All 20 requests returned 200 text/plain; charset=UTF-8. Google Search Console’s Live Test passed shortly after, and the “Robots.txt unreachable” warning cleared within 24 hours.
What you can do Link to heading
- Always set
traefik.docker.networkwhen your service connects to multiple Docker networks. Without it, Traefik may route to an unreachable IP and cause intermittent 504s. - Check
Content-Typeon/robots.txt— it must betext/plain. Dynamicrobots.txt(generated by WordPress, Rank Math, Yoast, etc.) is served through PHP, so filename-based Apache rules won’t match. - Use
%{THE_REQUEST}in Apache<If>directives when WordPress rewrites are involved —%{REQUEST_URI}will show/index.php, not the original path. - Don’t panic at GSC delays — even after a fix, Google’s Live Test can take minutes to hours to reflect reality. Verify with
curlfirst.
See also: Running a WordPress Travel Blog on a Budget VPS: The Full Stack | Rank Math Sitemap Not Loading with Traefik | SEO Housekeeping: Focus Keywords and Sitemaps That Match
Built for a travel blog on a budget. This stack powers Joy of Exploring the World — curated travel itineraries, restaurant reviews, and destination guides. If you're planning your next trip, come explore with us.
All config files from this post are in the companion GitHub repo.