SEO Fix: Robots.txt — Protecting Crawl Budget and Invite Tokens

Release: v1.0.358 | Control: SEO-07 | Category: SEO / Crawlability

Overview

Version 1.0.358 introduces a robots.ts file to the application. This is a Next.js-native way of serving a robots.txt file to search engine crawlers, giving them explicit instructions about which routes they should and should not index.

Prior to this release, no robots file existed. Crawlers were free to follow any link they discovered — including internal API routes, onboarding flows, and time-limited invite URLs containing sensitive tokens.

Why This Matters

Crawl Budget

Search engines allocate a finite crawl budget to each domain. Without a robots.txt, crawlers spend that budget on routes that provide no indexing value — API endpoints, authentication pages, and redirect flows. This crowds out pages that should be indexed, like the homepage, pricing, and policy pages.

Invite Token Exposure

Invite links follow the patterns /invite/[token] and /agent-invite/[token]. If a crawler follows one of these links and indexes it, the token could appear in search engine caches or third-party SEO tools. Disallowing these paths in robots.txt prevents this.

Auth and Onboarding Routes

Routes such as /sign-in/, /sign-up/, and /onboarding/ serve no purpose in a public index. Including them creates noise in search results and may surface incomplete or transient application states to users who find them via search.

Implementation

The file is implemented using Next.js's built-in MetadataRoute.Robots type, which automatically serialises the configuration into a valid robots.txt response served at /robots.txt.

// src/app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: '*',
        allow: ['/', '/pricing', '/privacy', '/terms', '/ropa'],
        disallow: [
          '/api/',
          '/dashboard/',
          '/onboarding/',
          '/invite/',
          '/agent-invite/',
          '/sign-in/',
          '/sign-up/'
        ]
      }
    ],
    sitemap: `${process.env.NEXT_PUBLIC_APP_URL}/sitemap.xml`
  };
}

Crawling Rules Summary

Route	Crawlable	Reason
`/`	✅ Yes	Public marketing homepage
`/pricing`	✅ Yes	Public pricing information
`/privacy`	✅ Yes	Privacy policy
`/terms`	✅ Yes	Terms of service
`/ropa`	✅ Yes	Records of processing activities
`/api/*`	❌ No	Internal API — no public value
`/dashboard/*`	❌ No	Authenticated routes
`/onboarding/*`	❌ No	Setup flows, not indexable
`/invite/*`	❌ No	Contains sensitive token parameters
`/agent-invite/*`	❌ No	Contains sensitive token parameters
`/sign-in/*`	❌ No	Auth pages
`/sign-up/*`	❌ No	Auth pages

Sitemap Reference

The generated robots.txt includes a Sitemap: directive pointing to:

Sitemap: https://<your-domain>/sitemap.xml

This URL is constructed from the NEXT_PUBLIC_APP_URL environment variable. Ensure this variable is set correctly in your deployment environment so crawlers can discover and parse the sitemap.

Verifying the Output

Once deployed, you can verify the robots file is being served correctly by visiting:

https://<your-domain>/robots.txt

You should see output similar to:

User-agent: *
Allow: /
Allow: /pricing
Allow: /privacy
Allow: /terms
Allow: /ropa
Disallow: /api/
Disallow: /dashboard/
Disallow: /onboarding/
Disallow: /invite/
Disallow: /agent-invite/
Disallow: /sign-in/
Disallow: /sign-up/

Sitemap: https://<your-domain>/sitemap.xml

You can also use Google Search Console to test the robots file and inspect which URLs are blocked or allowed.