What you'll learn
Configure your site crawler for optimal performance. Learn how to set crawl limits, handle JavaScript-heavy sites, respect robots.txt, and get the most accurate SEO data from your website.
Understanding site crawling
Site crawling is the process of systematically browsing and analyzing all pages on your website. Our AI-powered crawler discovers pages, analyzes content, checks for SEO issues, and maps your site's structure.
What the crawler analyzes
Crawler configuration options
Set crawl limits
Configure how many pages to crawl and how deep to go. This prevents excessive resource usage on large sites.
Recommended settings:
- Small sites (< 100 pages): No limit, crawl all pages
- Medium sites (100-1000 pages): Max 500 pages, depth 3
- Large sites (> 1000 pages): Max 1000 pages, depth 4
- E-commerce sites: Focus on product/category pages first
JavaScript rendering
Enable JavaScript rendering for sites built with React, Vue, Angular, or other JavaScript frameworks. This ensures the crawler sees the fully rendered content.
When to enable:
- • Single Page Applications (SPAs)
- • Sites using React, Vue, or Angular
- • Content loaded via JavaScript
- • Dynamic content based on user interactions
Respect robots.txt
Our crawler always respects robots.txt directives. You can also configure custom crawl delays to be extra respectful to your server resources.
Custom user agent
Set a custom user agent string for your crawls. This helps with tracking and can be useful for sites that serve different content based on user agent.
Sitemap integration
Automatically discover and crawl URLs from your XML sitemap. This ensures we don't miss important pages and provides a complete site audit.
Pro tip:
Submit your sitemap to Google Search Console and Bing Webmaster Tools for better crawling coverage.
Crawler best practices
Optimization Tips
- • Run crawls during off-peak hours
- • Start with a small sample before full crawl
- • Use crawl delays for large sites
- • Monitor server resources during crawl
- • Schedule regular crawls (weekly/monthly)
Common Issues
- • Crawl timeouts on large sites
- • JavaScript rendering failures
- • Blocked by security firewalls
- • Missing sitemap submissions
- • Incorrect robots.txt directives
Troubleshooting crawler issues
Crawl not starting
Check if your site is accessible and not behind a login. Ensure robots.txt allows crawling.
Incomplete crawl results
Increase crawl limits or check for JavaScript-heavy content that needs rendering enabled.
Performance monitoring
Monitor crawl progress in real-time. Average crawl speed is 50-100 pages per minute.
Ready to configure your crawler?
Start with our recommended settings and adjust based on your site's specific needs. Contact support if you need help with complex configurations.