Message Board

Bugs & Development

Older Posts ]   [ Newer Posts ]
 'Search IP' Page Broken?
Author: B.Booey   (28 Nov 06 6:49am)
http://www.projecthoneypot.org/search_ip.php

For approximately the last couple of weeks, I have no longer been able to do IP Searches whereas it used to work fine. It keeps spitting out blank pages now every time I enter an IP, with the URL changing to something like this on the blank page...

http://www.projecthoneypot.org/i_0586c53069cea994a8df22063aaec644

Happens in both IE and Firefox btw.
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (28 Nov 06 7:44am)
It usually works, but yeah, sometimes it doesn't, like when you search for 64.180.144.204:

http://www.projecthoneypot.org/i_9a21c67517a4048afcfb8bc40a6ad810

Strange. Maybe these pages were internally cached incorrectly?

(BTW, the /i_* URL is normal. It's just not normal that the page is blank.)

- The unaffiliated-with-Unspam-and-PHPot Matt

Post Edited (28 Nov 06 8:09am)
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (29 Nov 06 4:03am)
It isn't occassional though Matt, every single IP does this for me - very odd.

There's simply nothing between the BODY tags when viewing the HTML output.

Even holding down CTRL while refreshing IE gives the same blank page so I don't think it's a caching issue - especially with it occurring in both browsers.

Kind of a shame, I really liked that feature.
 
 Re: 'Search IP' Page Broken?
Author: M.Prince   (29 Nov 06 12:16pm)
It's an issue of our database being overloaded. If you get a blank page it's actually an error on our end, and we've just hidden the error codes so it comes up blank.

There are a lot of reasons for the problem, but one of the big ones is that Google decided our PageRank should be higher and they should index us faster. That's usually great news, but in this case it has become a bit overwhelming. When Google, Yahoo, and MSN all start hammering away at the same time we have started to experience database slowdowns. That means any page that isn't cached may not show up. And any calls which have to query the database may result in a white page.

We're working on fixing it. First, we need to upgrade our infrastructure. Since we're not making any money off the service, we're trying to scrape together funds in other places in the Unspam budget to justify the purchase of some more serious machines and a big RAID array to store what has become a massive amount of data. Second, we have asked Google to slow down with the indexing for now. They were requesting 100K+ unique, uncached pages per day; we've throttled that back to what we hope is a more reasonable number.

Hopefully that will fix the issues you were experiencing with the Lookup IP feature. If not, be patient and it should be back online as soon as we figure out our hardware upgrade.

Finally... there are six or seven major cool new features we've been working on in the background. We hope to release them all sometime in January. Some are ready now, but we'd like to make sure the site is in healthy shape before we start cranking on new services. Watch these pages for more details!
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (30 Nov 06 3:44am)
Thanks for the thorough reply, that certainly explains the situation then.

I run a dynamic database/php driven site as well, and I've taken great pains to block all manner of spiders/bots (including search engines) just to keep from getting overwhelmed by abuse. Yes, even the non-malicious ones can be extremely taxing on the server including Google.

My robots.txt blocks everything by default...

User-agent: *
Disallow: /

...and this is part of my current .htaccess file for the rest that don't behave. Keeps me alive at least :) Lowering the mysql.connect_timeout value (or its equivalent depending on the type of database being used) from 60 to 20 secs is highly recommended btw.



Options -Indexes

DirectoryIndex index.php index.html index.htm default.htm index.php3 index.phtml index.php5 index.shtml mwindex.phtml

php_flag register_globals off

php_value max_input_time "5000"
php_value memory_limit "20M"
php_value post_max_size "41M"
php_value upload_max_filesize "40M"
php_value mysql.connect_timeout "20"

SetEnvIfNoCase X-moz prefetch bad
SetEnvIfNoCase HTTP:x-moz prefetch bad

SetEnvIf user-agent "^$" bad
SetEnvIfNoCase user-agent "192\.comAgent" bad
SetEnvIfNoCase user-agent "1-More Scanner" bad
SetEnvIfNoCase user-agent "AltaVista" bad
SetEnvIfNoCase user-agent "Appie" bad
SetEnvIfNoCase user-agent "Arach" bad
SetEnvIfNoCase user-agent "Archite" bad
SetEnvIfNoCase user-agent "beholder" bad
SetEnvIfNoCase user-agent "bot" bad
SetEnvIfNoCase user-agent "ccubee" bad
SetEnvIfNoCase user-agent "CherryPicker" bad
SetEnvIfNoCase user-agent "cosmos" bad
SetEnvIfNoCase user-agent "curl" bad
SetEnvIfNoCase user-agent "Crawl" bad
SetEnvIfNoCase user-agent "Crescent" bad
SetEnvIfNoCase user-agent "disco" bad
SetEnvIfNoCase user-agent "dragonfly" bad
SetEnvIfNoCase user-agent "Drupal" bad
SetEnvIfNoCase user-agent "GNUTLS" bad
SetEnvIfNoCase user-agent "Go2" bad
SetEnvIfNoCase user-agent "Gulliver" bad
SetEnvIfNoCase user-agent "earthcom" bad
SetEnvIfNoCase user-agent "email" bad
SetEnvIfNoCase user-agent "Excite" bad
SetEnvIfNoCase user-agent "Extractor" bad
SetEnvIfNoCase user-agent "EZResult" bad
SetEnvIfNoCase user-agent "fetch" bad
SetEnvIfNoCase user-agent "findlinks" bad
SetEnvIfNoCase user-agent "forex" bad
SetEnvIfNoCase user-agent "FrontPage" bad
SetEnvIfNoCase user-agent "Hotzonu" bad
SetEnvIfNoCase user-agent "HTTrack" bad
SetEnvIfNoCase user-agent "ia_archiver" bad
SetEnvIfNoCase user-agent "IEAutoDiscovery" bad
SetEnvIfNoCase user-agent "Indy" bad
SetEnvIfNoCase user-agent "Informant" bad
SetEnvIfNoCase user-agent "InfoSeek" bad
SetEnvIfNoCase user-agent "Inktomi" bad
SetEnvIfNoCase user-agent "internetseer" bad
SetEnvIfNoCase user-agent "Java" bad
SetEnvIfNoCase user-agent "jeeves" bad
SetEnvIfNoCase user-agent "kastaneta" bad
SetEnvIfNoCase user-agent "KIT-Fireball" bad
SetEnvIfNoCase user-agent "Kolibri" bad
SetEnvIfNoCase user-agent "ksoap" bad
SetEnvIfNoCase user-agent "larbin" bad
SetEnvIfNoCase user-agent "libwww" bad
SetEnvIfNoCase user-agent "lwp" bad
SetEnvIfNoCase user-agent "Lycos" bad
SetEnvIfNoCase user-agent "Lynx" bad
SetEnvIfNoCase user-agent "Mediapartners-Google" bad
SetEnvIfNoCase user-agent "Mercator" bad
SetEnvIfNoCase user-agent "Meta" bad
SetEnvIfNoCase user-agent "Microsoft URL Control" bad
SetEnvIfNoCase user-agent "miniRank" bad
SetEnvIfNoCase user-agent "Missigua" bad
SetEnvIfNoCase user-agent "Muscat" bad
SetEnvIfNoCase user-agent "Newt" bad
SetEnvIfNoCase user-agent "NG/2\.0" bad
SetEnvIfNoCase user-agent "NICErsPRO" bad
SetEnvIfNoCase user-agent "nikto" bad
SetEnvIfNoCase user-agent "Nutch" bad
SetEnvIfNoCase user-agent "OnetSzukaj" bad
SetEnvIfNoCase user-agent "Pogodak" bad
SetEnvIfNoCase user-agent "Profile/MIDP" bad
SetEnvIfNoCase User-Agent "psycheclone" bad
SetEnvIfNoCase user-agent "ReGet" bad
SetEnvIfNoCase user-agent "robozilla" bad
SetEnvIfNoCase user-agent "sbider" bad
SetEnvIfNoCase user-agent "Scooter" bad
SetEnvIfNoCase user-agent "scrubby" bad
SetEnvIfNoCase user-Agent "search" bad
SetEnvIfNoCase user-agent "Slurp" bad
SetEnvIfNoCase user-agent "Snoopy" bad
SetEnvIfNoCase user-agent "Sphere Scout" bad
SetEnvIfNoCase user-agent "spider" bad
SetEnvIfNoCase user-agent "StackRambler" bad
SetEnvIfNoCase user-agent "Stripper" bad
SetEnvIfNoCase user-agent "teleport" bad
SetEnvIfNoCase user-agent "teoma" bad
SetEnvIfNoCase user-agent "T-H-U-N-D-E-R-S-T-O-N-E" bad
SetEnvIfNoCase user-agent "UltraSeek" bad
SetEnvIfNoCase user-agent "UrlChecker" bad
SetEnvIfNoCase user-agent "Url Control" bad
SetEnvIfNoCase user-agent "urllib" bad
SetEnvIfNoCase user-agent "User-Agent" bad
SetEnvIfNoCase user-agent "Vampire" bad
SetEnvIfNoCase user-agent "voyager/" bad
SetEnvIfNoCase user-agent "wget" bad
SetEnvIfNoCase user-agent "WebBandit " bad
SetEnvIfNoCase user-agent "WebCopier" bad
SetEnvIfNoCase user-agent "Webzip" bad
SetEnvIfNoCase user-agent "Whisker" bad
SetEnvIfNoCase user-agent "Widow" bad
SetEnvIfNoCase user-agent "Windowns CE" bad
SetEnvIfNoCase user-agent "\.([0-9]+); Windows XP)" bad
SetEnvIfNoCase user-agent "WiseWire" bad
SetEnvIfNoCase user-agent "Yandex" bad
SetEnvIfNoCase user-agent "Yahoo" bad
SetEnvIfNoCase user-agent "zeus" bad
SetEnvIfNoCase user-agent "dts agent" bad

SetEnvIfNoCase Request_URI "cmd\.exe" bad
SetEnvIfNoCase Request_URI "root\.exe" bad
SetEnvIfNoCase Request_URI "/_vti_bin/" bad
SetEnvIfNoCase Request_URI "/_mem_bin/" bad
SetEnvIfNoCase Request_URI "/msadc/" bad
SetEnvIfNoCase Request_URI "/MSADC/" bad
SetEnvIfNoCase Request_URI "/winnt/" bad
SetEnvIfNoCase Request_URI "/x90/" bad
SetEnvIfNoCase Request_URI "siteinfo\.xml" bad
SetEnvIfNoCase Request_URI "default\.ida" bad
SetEnvIfNoCase Request_URI "Admin\.dll" bad
SetEnvIfNoCase Request_URI "_vti_inf\.html" bad
SetEnvIfNoCase Request_URI "nsiislog\.dll" bad
SetEnvIfNoCase Request_URI "cltreq\.asp" bad
SetEnvIfNoCase Request_URI "xmlrpc\.php" bad
SetEnvIfNoCase Request_URI "wget" bad
SetEnvIfNoCase Request_URI "\.rar\.cc" bad

Order Allow,deny
Allow from all
deny from env=bad
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (5 Dec 06 5:59pm)
Thanks for the explanation, M.Prince. :)

If you set up a Google Webmaster Tools account, you can set it to crawl the site more slowly.

https://www.google.com/webmasters/tools/

I don't know of any similar things for other search engines.

Post Edited (5 Dec 06 6:01pm)
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (23 Dec 06 11:56am)
You can also throttle the crawl rate in robots.txt and AFAIK Googlebot will obey it. Here's what I'm currently using, I have allowed just one search engine on my site - Google....


User-agent: Fasterfox
Disallow: /

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

Crawl-delay: 5
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (26 Dec 06 11:16am)
B.Booey:

It'll obey Crawl-delay when it's in a separate section? IIRC, according to the robots.txt spec, Googlebot should read down to the "User-agent: Googlebot" section, obey that Disallow and ignore the rest.
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (1 Jan 07 1:14pm)
I've researched the issue and apparently Google is one of the few major search engines to ignore the crawl-delay line, which is rather unfortunate.
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (2 Jan 07 11:29pm)
It is unfortunate, but as I mentioned above, you can use a Google Webmaster Tools account to tell it to slow down.

Edit: If this post sounds harsh or anything, it really isn't meant to.

Post Edited (2 Jan 07 11:34pm)
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (4 Jan 07 2:50am)
Yes thanks, I suppose I'll need to create a Google Webmaster Tools account to gain more control over this, which I find a bit silly but there's no other choice.

I've also noticed the default crawling rate is rather erratic, which I find quite odd. Sometimes it will wait for 10-20 secs while other times it hits rapidly once every couple secs.
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (6 Jan 07 1:32am)
Just unbelievable, the Webmaster Tools only offered me 2 choices for controlling crawl rate...

Normal: Recommended crawl rate.

Slower: A slower crawl will reduce Googlebot's traffic on your server, but we may not be able to crawl your site as often.

But what's really annoying is that the setting expires every 90 days and requires the webmaster to keep logging in and changing it. Grrrrr.
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (6 Jan 07 10:34pm)
Oh, I didn't know it expired. That sucks. Googlebot should really obey Crawl-Delay.

(A little Googling suggests it might, but nothing on Google's site says it does. Hm.)
 
 Re: 'Search IP' Page Broken?
Author: B.Booey   (7 Jan 07 12:47am)
They explicity stated on their Webmaster Tools page that crawl-delay is ignored after I had tested my robots.txt file there. So anything else would have to be FUD.
 
 Re: 'Search IP' Page Broken?
Author: M.Nordhoff   (7 Jan 07 6:25pm)
Okay.



do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–24, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email