Proximic Spider

A spider, also known as a crawler or a bot, is the software Proximic uses to visit and access the content of webpages.

Proximic's spiders:

  • are friendly and identify themselves,
  • only download the static, textual content,
  • honor the rules of a robots.txt,
  • don't execute JavaScript to e.g. generate ad impressions,
  • crawl at a slow rate by default.
Spider FAQ

Here are answers to the most common questions.
If you need to know more, please contact us.

Proximic's content analysis enables advertising partners to determine the best matching campaign for a page's content to achieve the highest CPM for you as a publisher. Proximic works with many advertising partners and it is very likely that one of them is serving ads to your site.
When an ad is about to be served, the spider crawls the page, our system processes the content on the page and provides the page-level analysis to the requesting advertiser. The frequency (how often) a page is being crawled depends on many factors such as type of content, change of content, number of ad elements, etc... Any number of factors can affect the spider frequency of individual sites.

Sites may also be crawled in a linear fashion to provide site-level analysis to advertising partners who are interested in a specific site.
The spider identifies itself with the user-agent:
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)
Large publishers like PC World explicitly allow our spiders to crawl their content. Publishers benefit from our analysis and gain deep insights on their inventory to optimize direct sales or accurately target campaigns.

To whitelist our spiders please add a separate paragraph to the robots.txt like this:
User-agent: proximic
Disallow:
We do not extract and store any source code, but only provide data about the page to our advertising partners, such as the content language, the content's rating (G, PG13, R) and relevant IAB categories of the content (e.g. "Real Estate::Buying/Selling Homes").

This analysis helps the advertiser to place topically relevant campaigns onto a safe environment. Relevance drives CPM, which is your win.
In general this should not happen. Please contact us and we will find out what is causing it.

Some advertisers are stripping the URL parameters, which means a working URL like
www.forum.com/showthread.php?t=123
is rendered into something like this
www.forum.com/showthread.php?
We successfully work with many large publishers and please feel free to contact us if you have any concerns or questions. If you want to exclude our spiders to not crawl specific parts of your site, please add a separate paragraph to the robots.txt and specify the path you'd like to exclude:
User-agent: proximic
Disallow: /path/
Make sure that the robots.txt is in the correct location. It must be in the top directory, e.g. www.domain.com/robots.txt.
Placing the file in a subdirectory won't have any effect. Furthermore please note that the IP addresses used by the spiders change from time to time and that it may take up to a day for changes in robots.txt to propagate to all of our spiders.
Back to top