robots.txt with Headless SXA in XM Cloud

There are those features that you always have to implement when coming towards the end of the project and think about going live.

robots.txt is one of them.

In general the robots.txt is responsible for telling Search Engines what parts of the website to index and which ones not. That does not mean that Search Engines care. If you want to read more: https://developers.google.com/search/docs/crawling-indexing/robots/intro

Luckily Headless SXA provides that feature so you don't have to implement something yourself.

 

Where to maintain Robots.txt?

Within the Settings Item of your site you find the Robots content field in the Robots section. Here you can enter basically anything.

Maintain Robots.txt

 

Default Output

When the field is blank it show the following when calling your site/robots.txt

User-agent: * Disallow: / Sitemap: http://xmcloudcm.localhost/sitemap.xml

 

What happens when maintained?

Robots Maintained

When adding a string to the field: such as "This is my robots content" it will show like this in your website after saving the item:

This is my robots content Sitemap: http://xmcloudcm.localhost/sitemap.xml

As you can see, the default is overwritten by the values I provided. Only the reference to the sitemap.xml is kept.

Note: This is just an example and not useful robots content.

When not running locally, but in a cloud setup don't forget to publish the item so it becomes effective on your rendering host.

 

How does it work?

The Service caring about returning the content from the field can be found following this path:

\src\Project\Sugcon\SugconAnzSxa\src\pages\api\robots.ts

Code Service

(Code Example taken from https://github.com/Sitecore/XM-Cloud-Introduction

 This is configured here: 

\src\Project\Sugcon\SugconAnzSxa\src\lib\next-config\plugins\robots.js

Code JS

(Code Example taken from https://github.com/Sitecore/XM-Cloud-Introduction

Created: 4.10.2022

XM Cloud NextJs JSS SXA Headless SXA