-
Type:
Suggestion
-
Resolution: Unresolved
-
None
-
Component/s: Content - Labels, Macros - Content by Label
-
1
-
CtB - Improve Existing
When a public-facing Confluence instance has pages with labels, crawler bots
can reach the Labeled Content page at /label/<labelname>.
From that page, the "Related Labels" section (top-right) displays links to
all other labels that co-occur with the current label(s). Crucially, clicking
any "Related Label" appends it to the URL (e.g. /label/foo+bar+baz), and the
new page again shows its own "Related Labels", creating a near-infinite
combination of crawlable URLs.
This is confirmed in the Atlassian Support KB article:
https://support.atlassian.com/confluence/kb/web-crawler-bots-and-confluence-how-public-access-can-lead-to-performance-issues/
Which shows real-world log examples of bots (PanguBot, bingbot, etc.)
generating requests like:
GET /label/aggregate+coverage+database_management+estimation+eu+intra_regional_trade+qa+territory+world
These bots:
- Use unique user-agent strings
- Come from unique IP addresses
- Actively ignore robots.txt directives
This creates near-infinite traffic combinations that degrade performance
and can cause outages on public Confluence Data Center instances.
This issue was partially addressed in CONFSERVER-11940 (fixed in 2.8.2)
which added rel="nofollow" to label links, but the fix does not appear
to cover the "Related Labels" links on the /label/ Labeled Content page
in modern versions of Confluence Data Center.
STEPS TO REPRODUCE
1. Set up a public-facing Confluence Data Center instance with anonymous access
2. Add labels to several pages (e.g. "kb-how-to-article", "troubleshooting")
3. Visit /label/kb-how-to-article
4. Inspect the HTML of the "Related Labels" section in the top-right
5. Observe that the label links do NOT have rel="nofollow"
6. A crawler bot will follow each Related Label link, landing on a new
/label/ page with its own Related Labels, generating combinatorial
URL explosion
EXPECTED BEHAVIOR
- The "Related Labels" links on the /label/ Labeled Content page should
have rel="nofollow" and/or the page should include a
<meta name="robots" content="noindex,nofollow"> tag, preventing bots
from following the combinatorial label URL chains.
OR alternatively:
- Provide an admin-level option to disable the "Related Labels" feature
entirely, or restrict it to logged-in users only.
ACTUAL BEHAVIOR
- Related Labels links are fully followable by crawlers, with no
nofollow attribute, creating a near-infinite crawl loop.
WORKAROUND (per Atlassian KB):
- Add "Disallow: /label" in robots.txt. But this does NOT work against
bots that ignore robots.txt. - Block IPs at firewall level. Impractical when bots use thousands of
unique IP addresses
RELATED TICKETS
CONFSERVER-11940: Add nofollow to label links (Fixed in 2.8.2 — but
appears incomplete for modern DC versions)CONFSERVER-12011: Multiple-label filter generates redundant URLs (Closed)- CONFSERVER-8749: Make Confluence more configurable for web crawlers
- CONFCLOUD-82811: Disable "Show Details" for anonymous users (incl. labels)
- Atlassian Support KB: https://support.atlassian.com/confluence/kb/web-crawler-bots-and-confluence-how-public-access-can-lead-to-performance-issues/