SiteGuru Crawler
You may find requests in your log file from our crawler. You'll recognize them by the user agent SiteGuruCrawler. If you're wondering why, you've come to the right place.
What is SiteGuru?
SiteGuru is an SEO tool that helps website owners find issues on their sites, and fix them. To find these issues, our servers need to crawl pages.
Why am I seeing SiteGuru Crawler in my log files?
There could be 2 reasons why you're seeing this. Someone (possibly you, but not necessarily) may have added your site to SiteGuru.co to monitor how this site is doing from an SEO perspective. In that case, our server tries to find as many pages on your site, and checks them for issues.
Or, your website could be linked to from a website that is checked by SiteGuru. One of the issues we check for is broken links, so if there is a link to your site on a site we are checking, we'll try to visit that link and see if it works.
What are you checking for?
We only check for things that are publicly available. We're using the HTML source to determine meta descriptions, page titles and missing alt texts. We're also using:
- The W3 Validator to check your site for HTML validity
- Google's Pagespeed API to check your pagespeed
We're only using these two if we're checking your site, not if we're checking broken links.
Should I be worried?
Not at all. We're only crawling parts that are available to everyone. Also, we won't submit any forms. We're only crawling pages that are in your sitemap, or linked to from another page.
To limit the effect on your server and bandwidth, we're limiting the number of concurrent requests, so a normal server should not have issues handling this.
Allowing SiteGuru to crawl your site
Are you blocking all crawlers from your site, and are you looking for a way to let SiteGuru in? Here's what you should add to your robots.txt file:
User-agent: SiteGuruCrawler
Allow: /
How do I block SiteGuru from accessing my site?
If you don't want our crawler to access your site, you can add this to your robots.txt file
User-agent: SiteGuruCrawler
Disallow: /
Our crawler will respect these instructions and won't access your site.
I still have questions
Feel free to contact us if anything is not clear.