iCjobs iCcrawler
 ICjobs Information for Webmasters

ICCrawler - iCjobs: iCjobs Web Crawler

ICCrawler - iCjobs is a specialized web-crawling robot. It collects documents from the web to build a searchable index for the iCjobs search engine. On this page, you'll find answers to the most commonly asked questions about how our web crawler works.



German:
Was ist iCjobs®?

iCjobs ist eine Jobsuchmaschine welche Stellenangebote DIREKT auf Unternehmensseiten sucht.

Was ist das Besondere an ICjobs und warum finden Sie in ICjobs mehr, aber auch andere Stellenangebote als in üblichen Jobbörsen?

In klassischen Jobbörsen finden Sie Stellenanzeigen, die Unternehmen meist kostenpflichtig inserieren. Aus Kostengründen inseriert jedoch nur eine begrenzte Anzahl von Unternehmen oder aber es werden sehr häufig nicht ALLE freien Stellen eines Unternehmens eingestellt. Meist zeigen Unternehmen jedoch ALLE übrigen freien Stellen direkt auf der eigenen Website an.

Genau hier setzt ICjobs® an - Vorteil der neuartigen ICjobs®-Suchtechnologie:
Die Unternehmen müssen ihre Website nicht erst (kostenintensiv) anmelden, wie bei den meisten gewöhnlichen Jobbörsen üblich.
Mit einer eigenen Suchtechnologie (ICCrawler) findet ICjobs® diese Stellenanzeigen vollautomatisch und zeigt diese dann unter www.iCjobs.de an. Dadurch kann ICjobs eben auch Aufmerksamkeit auf mittlere und kleine Betriebe lenken – und somit alle Berufsgruppen abbilden! Diesen Unternehmen bietet ICjobs zusätzliche preiswerte Services an, welche die erfolgreiche Suche nach dem geeigneten Bewerber pro Stellenangebot optimieren und sich vom Wettbewerb zu distanzieren.


 Frequently Asked Questions

  1. How often will ICCrawler - ICjobs access my webpages?
  2. How do I request that ICjobs not crawl parts or all of my site?
  3. Why is ICCrawler - ICjobs asking for a file called robots.txt that isn't on my server?
  4. Why is ICCrawler - ICjobs trying to download incorrect links from my server? Or from a server that doesn't exist?
  5. Why is ICCrawler - ICjobs downloading information from our "secret" web server?
  6. Why are there hits from multiple machines at ICjobs.de, all with user-agent ICCrawler - ICjobs?
  7. Why is ICCrawler - ICjobs downloading the same page on my site multiple times?
  8. Why don't the pages of my site that ICCrawler - ICjobs crawled show up in your index?
  9. What kinds of links does ICCrawler - ICjobs follow?
  10. My ICCrawler - ICjobs question isn't answered here. Where should I send it?
 Answers

1. How often will ICCrawler - iCjobs (user agent) access my web pages?

ICCrawler - iCjobs default download delay is 2 (two) seconds. It will never access any documents more frequently than that. Network conditions and/or our server load can only increase this delay.
ICCrawler - iCjobs also implements robots.txt standard directive "crawl-delay'' by which individual sites can tune highest access rate. However, ICCrawler - iCjobs does not crawl sites which require crawl-delay of 30 seconds or more.

2. How do I request that iCjobs not crawl parts or all of my site?

robots.txt is a standard document that can tell ICCrawler - ICjobs not to download some or all information from your web server. The format of the robots.txt file is specified in the Robot Exclusion Standard. Remember, changes to your server's robots.txt file won't be immediately reflected in ICjobs; they'll be discovered and propagate when ICCrawler - ICjobs next crawls your site.

User-agent: ICCrawler - iCjobs

3. Why is ICCrawler - iCjobs asking for a file called robots.txt that isn't on my server?

robots.txt is a standard document that can tell ICCrawler - iCjobs not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard. If you just want to prevent the "file not found" error messages in your web server log, you can create an empty file named robots.txt.

4. Why is ICCrawler - iCjobs trying to download incorrect links from my server? Or from a server that doesn't exist?

It's a given that many links on the web will be broken or outdated at any particular time. Whenever someone publishes an incorrect link to your site (perhaps due to a typo or spelling error) or fails to update links to reflect changes in your server, ICCrawler - ICjobs will try to download an incorrect link from your site. This also explains why you may get hits on a machine that's not even a web server.

5. Why is ICCrawler - iCjobs downloading information from our "secret" web server?

It's almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, your "secret" URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. So, if there's a link to your "secret" web server or page on the web anywhere, it's likely that ICCrawler - ICjobs and other web crawlers will find it.

6. Why are there hits from multiple machines at ICjobs.de, all with user-agent ICCrawler - iCjobs?

ICCrawler - iCjobs was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage, we run many crawlers on machines located near the sites they're indexing in the network.

7. Why is ICCrawler - iCjobs downloading the same page on my site multiple times? In general, ICCrawler - iCjobs should only download one copy of each file from your site during a given crawl. Very occasionally the crawler is stopped and restarted, which may cause it to recrawl pages that it's recently retrieved.

8. Why don't the pages of my site that ICCrawler - ICjobs crawled show up in your index?

Don't be alarmed if you can't immediately find documents that ICCrawler - iCjobs has crawled in the ICjobs search engine. Documents are entered into our index soon after being crawled. Occasionally, documents fetched by ICCrawler - ICjobs won't be included for various reasons (e.g. they appear to be duplicates of other pages on the web).

9. What kinds of links does ICCrawler - iCjobs follow?

ICCrawler - iCjobs follows links to all documents of text/html type -- provided also that ICCrawler - iCjobs's built-in context based filtering rules allow it to access the given document. ICCrawler - iCjobs will typically access only a small portion of all available documents on any given site. Additionally we follow META HTTP-EQUIV=Refresh...

10. My ICCrawler - iCjobs question isn't answered here. Where should I send it?

Please contact us with questions.


Used Domains are:
icjobs.de | stellenangebote.jobfocus.de | karimba.com | jobchecker.com | stellenangeboten.com | icjobs.com


©2005-2008 Intelligence Competence Center AG - Home - About iCjobs