SEO Fix: Robots.txt — Protecting Crawl Budget and Invite Tokens
SEO Fix: Robots.txt — Protecting Crawl Budget and Invite Tokens
Release: v1.0.358 | Control: SEO-07 | Category: SEO / Crawlability
Overview
Version 1.0.358 introduces a robots.ts file to the application. This is a Next.js-native way of serving a robots.txt file to search engine crawlers, giving them explicit instructions about which routes they should and should not index.
Prior to this release, no robots file existed. Crawlers were free to follow any link they discovered — including internal API routes, onboarding flows, and time-limited invite URLs containing sensitive tokens.
Why This Matters
Crawl Budget
Search engines allocate a finite crawl budget to each domain. Without a robots.txt, crawlers spend that budget on routes that provide no indexing value — API endpoints, authentication pages, and redirect flows. This crowds out pages that should be indexed, like the homepage, pricing, and policy pages.
Invite Token Exposure
Invite links follow the patterns /invite/[token] and /agent-invite/[token]. If a crawler follows one of these links and indexes it, the token could appear in search engine caches or third-party SEO tools. Disallowing these paths in robots.txt prevents this.
Auth and Onboarding Routes
Routes such as /sign-in/, /sign-up/, and /onboarding/ serve no purpose in a public index. Including them creates noise in search results and may surface incomplete or transient application states to users who find them via search.
Implementation
The file is implemented using Next.js's built-in MetadataRoute.Robots type, which automatically serialises the configuration into a valid robots.txt response served at /robots.txt.
// src/app/robots.ts
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: '*',
allow: ['/', '/pricing', '/privacy', '/terms', '/ropa'],
disallow: [
'/api/',
'/dashboard/',
'/onboarding/',
'/invite/',
'/agent-invite/',
'/sign-in/',
'/sign-up/'
]
}
],
sitemap: `${process.env.NEXT_PUBLIC_APP_URL}/sitemap.xml`
};
}
Crawling Rules Summary
| Route | Crawlable | Reason |
|---|---|---|
/ | ✅ Yes | Public marketing homepage |
/pricing | ✅ Yes | Public pricing information |
/privacy | ✅ Yes | Privacy policy |
/terms | ✅ Yes | Terms of service |
/ropa | ✅ Yes | Records of processing activities |
/api/* | ❌ No | Internal API — no public value |
/dashboard/* | ❌ No | Authenticated routes |
/onboarding/* | ❌ No | Setup flows, not indexable |
/invite/* | ❌ No | Contains sensitive token parameters |
/agent-invite/* | ❌ No | Contains sensitive token parameters |
/sign-in/* | ❌ No | Auth pages |
/sign-up/* | ❌ No | Auth pages |
Sitemap Reference
The generated robots.txt includes a Sitemap: directive pointing to:
Sitemap: https://<your-domain>/sitemap.xml
This URL is constructed from the NEXT_PUBLIC_APP_URL environment variable. Ensure this variable is set correctly in your deployment environment so crawlers can discover and parse the sitemap.
Verifying the Output
Once deployed, you can verify the robots file is being served correctly by visiting:
https://<your-domain>/robots.txt
You should see output similar to:
User-agent: *
Allow: /
Allow: /pricing
Allow: /privacy
Allow: /terms
Allow: /ropa
Disallow: /api/
Disallow: /dashboard/
Disallow: /onboarding/
Disallow: /invite/
Disallow: /agent-invite/
Disallow: /sign-in/
Disallow: /sign-up/
Sitemap: https://<your-domain>/sitemap.xml
You can also use Google Search Console to test the robots file and inspect which URLs are blocked or allowed.