Coming back
on Sunday, September 9, 2007Those that usually visit my page (yes, there are such people) would notice that it had an irregular behavior being inaccessible the most of the time. It was due to two reasons:
The abusive frequency of the new search engine Cuill indexing web pages squeezing the bandwidth with 1,000 daily requests, most of them referring to inexistent URLs that seem to be randomly generated by the robot combining different URLs, until it gets my hosting server to block my web page because of exceeding the daily bandwidth.
The hotlinking of an image hosted in my web page as an avatar by one user from one of the 75 most visited web pages in Spanish (and about 2.000 most visited web page in the world), according to Alexa. This causes that every time someone reads a thread from that forum in which that user posted, the image is downloaded from my web page milking my bandwidth.
To fix this I did the following:
Redirected my web page to Coral Content Distribution Network caché, so when yo visit my web page you will be redirected to that copy (this is why it's slower now, but at least it works) being the original web page hosted at SDF-eu the source of Coral CDN. IF you want to do the same with your web page, you must detect when your web page is being visited from Coral CDN to allow the access and otherwise redirect the visitor to the same URL adding .nyud.net at the end of the domain name. Using PHP:
<?php if (strpos($_SERVER["HTTP_USER_AGENT"], "CoralWebPrx") === false) { header("HTTP/1.1 302 Found"); header("Location: " . $_SERVER["HTTP_HOST"] . ".nyud.net" . $_SERVER["REQUEST_URI"]); exit; } ?>
If you don't use PHP, you can do it with .htaccess:
RewriteEngine on RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx* RewriteRule ^(.*) http://%{HTTP_HOST}.nyud.net/$1 [P,L]
Block Cuill. Cuill is a search engine founded by ex Googlers that's not yet working and nowadays it only indexes web pages. Only known by those of us who had to suffer its robot indexing our web page, this is why I'm not interested in being DoSed by this crawler in order to be indexed in their search engine. To block Twiceler -Cuill robot- with PHP:
<?php if (strpos($_SERVER["HTTP_USER_AGENT"], "Twiceler") !== false) { header("HTTP/1.1 403 Access denied"); exit; } ?>
With .htaccess
RewriteEngine on RewriteCond %{HTTP_USER_AGENT} Twiceler RewriteRule .* - [F,L]
If this search engine respected the most basic rules for search engines, it would be possible to block it using robots.txt:
User-Agent: Twiceler Disallow: /
But it seems that I must add to the list of Cuill misbehaviours the fact that it seems to not honor this protocol.
Block hotlinking. Hotlinking consists in linking an image or any other file from one page in another without consent consuming the first one's bandwidth. Is one of the most common, harmful and rejected bad practices. This is why I blocked it. Since now, you won't be able to link to any image hosted on my web page, but you will be able to use ethic hotlinking service ImgRed, as I do.
RewriteEngine On RewriteCond %{REQUEST_FILENAME} .*jpg$|.*gif$|.*png$ [NC] RewriteCond %{HTTP_REFERER} !^$ # Allow my own web page RewriteCond %{HTTP_REFERER} !h0m3r\.sdf-eu\.org [NC] # Allow my own web page in Coral CDN RewriteCond %{HTTP_REFERER} !h0m3r\.sdf-eu\.org\.nyud\.net [NC] # Allow ImgRed.com RewriteCond %{HTTP_REFERER} !imgred\.com [NC] # Allow search engines RewriteCond %{HTTP_REFERER} !google\. [NC] RewriteCond %{HTTP_REFERER} !yahoo\. [NC] # Allow Google cache RewriteCond %{HTTP_REFERER} !search\?q=cache [NC] RewriteRule (.*) /img/leech.png
That's all, it may affect site performance (especially Coral CDN) but I have no choice as long as I have so limited bandwidth and some people reject minimum code of ethics. If you want me to move to a better server, then you should know what's the utility of advertisements ;-)