Skip to main content
Back to Help Center
SEO Tools & Features

Site crawler configuration

6 min read
Last updated: March 2025

What you'll learn

Configure your site crawler for optimal performance. Learn how to set crawl limits, handle JavaScript-heavy sites, respect robots.txt, and get the most accurate SEO data from your website.

Understanding site crawling

Site crawling is the process of systematically browsing and analyzing all pages on your website. Our AI-powered crawler discovers pages, analyzes content, checks for SEO issues, and maps your site's structure.

What the crawler analyzes

Page titles and meta descriptions
Heading structure (H1-H6)
Internal linking structure
Image alt text and optimization
Page speed and performance
Mobile responsiveness
Schema markup validation
Broken links and redirects

Crawler configuration options

1

Set crawl limits

Configure how many pages to crawl and how deep to go. This prevents excessive resource usage on large sites.

Recommended settings:

  • Small sites (< 100 pages): No limit, crawl all pages
  • Medium sites (100-1000 pages): Max 500 pages, depth 3
  • Large sites (> 1000 pages): Max 1000 pages, depth 4
  • E-commerce sites: Focus on product/category pages first
2

JavaScript rendering

Enable JavaScript rendering for sites built with React, Vue, Angular, or other JavaScript frameworks. This ensures the crawler sees the fully rendered content.

When to enable:

  • • Single Page Applications (SPAs)
  • • Sites using React, Vue, or Angular
  • • Content loaded via JavaScript
  • • Dynamic content based on user interactions
3

Respect robots.txt

Our crawler always respects robots.txt directives. You can also configure custom crawl delays to be extra respectful to your server resources.

Robots.txt Compliance
We never crawl disallowed paths
4

Custom user agent

Set a custom user agent string for your crawls. This helps with tracking and can be useful for sites that serve different content based on user agent.

AISEOTurbo/1.0 (https://aiseoturbo.com/bot)
5

Sitemap integration

Automatically discover and crawl URLs from your XML sitemap. This ensures we don't miss important pages and provides a complete site audit.

Pro tip:

Submit your sitemap to Google Search Console and Bing Webmaster Tools for better crawling coverage.

Crawler best practices

Optimization Tips

  • • Run crawls during off-peak hours
  • • Start with a small sample before full crawl
  • • Use crawl delays for large sites
  • • Monitor server resources during crawl
  • • Schedule regular crawls (weekly/monthly)

Common Issues

  • • Crawl timeouts on large sites
  • • JavaScript rendering failures
  • • Blocked by security firewalls
  • • Missing sitemap submissions
  • • Incorrect robots.txt directives

Troubleshooting crawler issues

Crawl not starting

Check if your site is accessible and not behind a login. Ensure robots.txt allows crawling.

Incomplete crawl results

Increase crawl limits or check for JavaScript-heavy content that needs rendering enabled.

Performance monitoring

Monitor crawl progress in real-time. Average crawl speed is 50-100 pages per minute.

Ready to configure your crawler?

Start with our recommended settings and adjust based on your site's specific needs. Contact support if you need help with complex configurations.