My Tech-Notes

Block Google (and other search engines) from crawling your Koha OPAC safely and properly

Koha’s OPAC usually runs on Apache or another web server. You can add a robots.txt file to the OPAC’s root directory.

Example content to block all crawlers:

User-agent: *
Disallow: /

This means: all user agents (crawlers) are disallowed from crawling anything.

Block Google only:

User-agent: Googlebot
Disallow: /

in your Koha OPAC root directory — typically /usr/share/koha/opac/htdocs/ or wherever your Koha OPAC is served from.

https://your-opac-domain/robots.txt

If you want a stronger block, add this HTTP header to OPAC pages:

Header set X-Robots-Tag "noindex, nofollow"

How to do this in Apache:

Add this to your OPAC’s virtual host config (/etc/apache2/sites-available/koha.conf or equivalent):

<Directory /usr/share/koha/opac/htdocs>
    Header set X-Robots-Tag "noindex, nofollow"
</Directory>

Then reload Apache:

sudo systemctl reload apache2

You can also add:

<meta  name="robots"  content="noindex, nofollow">

in the OPAC page <head> section (opac-main.tt or similar template).

But the robots.txt + X-Robots-Tag header is usually enough.

robots.txt is only advisory — good bots (like Google) respect it, bad bots may not.
The X-Robots-Tag header is stronger because it’s in the HTTP response.
Make sure you don’t block your Koha staff client — usually you only block the OPAC.