|
|
|
ICCrawler - iCjobs: iCjobs Web Crawler
ICCrawler - iCjobs is a specialized web-crawling robot. It collects documents from
the web to build a searchable index for the iCjobs search
engine. On this page, you'll find answers to the most commonly asked
questions about how our web crawler works.
1. How often will ICCrawler - iCjobs (user agent) access my web pages? ICCrawler - iCjobs default download delay is 2 (two) seconds. It will never access any documents more frequently than that. Network conditions and/or our server load can only increase this delay. 2. How do I request that iCjobs not crawl parts or all of my site? robots.txt is a standard document that can tell ICCrawler - ICjobs not to download some or all information from your web server.
The format of the robots.txt file is specified in the
Robot Exclusion Standard.
Remember, changes to your server's robots.txt file won't be
immediately reflected in ICjobs; they'll be discovered and propagate when ICCrawler - ICjobs next crawls your site. 3. Why is ICCrawler - iCjobs asking for a file called robots.txt that isn't on my server? robots.txt is a standard document that can tell ICCrawler - iCjobs not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard. If you just want to prevent the "file not found" error messages in your web server log, you can create an empty file named robots.txt. 4. Why is ICCrawler - iCjobs trying to download incorrect links from my server? Or from a server that doesn't exist? It's a given that many links on the web will be broken or outdated at any particular time. Whenever someone publishes an incorrect link to your site (perhaps due to a typo or spelling error) or fails to update links to reflect changes in your server, ICCrawler - ICjobs will try to download an incorrect link from your site. This also explains why you may get hits on a machine that's not even a web server. 5. Why is ICCrawler - iCjobs downloading information from our "secret" web server? It's almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, your "secret" URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. So, if there's a link to your "secret" web server or page on the web anywhere, it's likely that ICCrawler - ICjobs and other web crawlers will find it. 6. Why are there hits from multiple machines at ICjobs.de, all with user-agent ICCrawler - iCjobs? ICCrawler - iCjobs was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage, we run many crawlers on machines located near the sites they're indexing in the network. 7. Why is ICCrawler - iCjobs downloading the same page on my site multiple times? In general, ICCrawler - iCjobs should only download one copy of each file from your site during a given crawl. Very occasionally the crawler is stopped and restarted, which may cause it to recrawl pages that it's recently retrieved. 8. Why don't the pages of my site that ICCrawler - ICjobs crawled show up in your index? Don't be alarmed if you can't immediately find documents that ICCrawler - iCjobs has crawled in the ICjobs search engine. Documents are entered into our index soon after being crawled. Occasionally, documents fetched by ICCrawler - ICjobs won't be included for various reasons (e.g. they appear to be duplicates of other pages on the web). 9. What kinds of links does ICCrawler - iCjobs follow? ICCrawler - iCjobs follows links to all documents of text/html type -- provided also that ICCrawler - iCjobs's built-in context based filtering rules allow it to access the given document. ICCrawler - iCjobs will typically access only a small portion of all available documents on any given site. Additionally we follow META HTTP-EQUIV=Refresh... 10. My ICCrawler - iCjobs question isn't answered here. Where should I send it? Please contact us with questions. Used Domains are: icjobs.de | stellenangebote.jobfocus.de | karimba.com | jobchecker.com | stellenangeboten.com | icjobs.com ©2005-2008 Intelligence Competence Center AG - Home - About iCjobs |