Message Board

Installing Honey Pots

Older Posts ]   [ Newer Posts ]
 can i modify the code in the honepot
Author: A.Doherty   (14 Apr 06 5:08pm)
specifically i already use a modified version of the cose form www.spider-trap.de

to block ip's not obeying robots.txt {thus all further requests get 403 errors}

and a 403 page that allows people to de-list their ip via a capatcha test

i was wanting to combine the index.php of the spider trap code to the end of the honeypot php page and was wondering if anyone here would have any issues with this/see any possible problems etc.

and obviously as one is mpl and the other is gpl is there an issue with the combined product {obviously i can send a copy back if anyone interested in my messy messy code}
 
 Re: can i modify the code in the honepot
Author: M.Prince   (1 May 06 6:35pm)
You're welcome to modify the code. If you change it, however, it won't be able to interface back with our servers. We do this to prevent someone from modifying it in such a way that it could corrupt the data or overwhelm us. Our anti-modification systems are minimal, but we would still prefer you respect them.

That said, I don't know why you'd need to modify the script to accomplish what you need to do. On the 403 page why don't you just include a hidden link to your Project Honey Pot script. Then spiders that get caught by your 403 will follow that link to the script and get caught by the PHPot too. Make sense? Or am I not understanding what you're trying to do.

We'd be interested in the code to block HTTP visitors based on the IP. We've been kicking around something we call HTTP:BL for a while now. We've got the data, we've just been a bit lax in getting the code to implement the blocking end. If you're willing to share, we might be interested in what you're putting together and bundling it with our datafeed.

Thanks!
Matthew.
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (3 May 06 8:19pm)
sorry for late reply what i'm trying to do is have the one php trap page that runs your script and adds their ip to the block list

the 403 php page is just a way for real people to use a capatcha to de-list themselves

so in order to get them to work together i need to add these lines to your script

include "settings.php";
include "functions.php";
add_blacklist($_SERVER['REMOTE_ADDR'], $_SERVER['REQUEST_METHOD'], $_SERVER['REQUEST_URI'], $_SERVER['SERVER_PROTOCOL'], $_SERVER['HTTP_REFERRER'], $_SERVER['HTTP_USER_AGENT']);

and re-name your script to index.php {not as neccissary but makes the trap more effective to block people trying the hostname/site-scripts/ directory {attempting a directory listing} {as its the only blocked user accessable dir as otherwise they wouldn't be able to get the 403 page after blocking and its shared accross all the virtual hosts on the phisical machine {and the only one allowed to execute scripts ;)

also have a nice mailform.php that takes hostname{of url accessed} lookes it up in external file to find path to users site then reads sitepath/.htaddress for a list of displayname:email@address and when called by itself displays a pulldown list of people to e-mail and a form to send {with capatcha to weed out non humans}
and if called in form mailform.php?user=x
it just displays the form {no pulldown list} as the user to mail is already selected so makes a great replacement for mailto: links

will give you a copy of the lot or ssh access to server during an icq/msn/yahoo/aim walkthrough if you want to see an explanation etc chat all my contact details are icq# 17003438 :msn alan_ie@hotmail.com :yahoo gothic_ie :aim gothic ie
but to verify your you you can i guess reply with the id you'll be comming from here?
i'm assuming this board is non spoofable ;)
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (3 May 06 8:39pm)
btw i notice in your code the line
define('__ROBOT1', '<meta name="robots" content="follow,noarchive">\n<meta name="robots" content="noindex">\n');
surely this should be
define('__ROBOT1', '<meta name="robots" content="follow,noindex">\n<meta name="robots" content="noarchive">\n');

as [no]follow and [no]index, are part of the defined standard
and [no]archive is only supported by a fer search engines and may cause the follow to be ignored if they are only allowing 'defined' meta lines

tiny point but worth adding i think just to avouid issues
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (4 May 06 4:44am)
or can i just modify the code submit it to you and get the code approved so you can show me how to alter the checksumming {or if it is done server side you can add the new codes checksum to the allowed list...}
as the ip barring etc will only work with some site designs not all ;(
{requires a common path ie somepath/site1 somepath/site2
as it modifies somepath/.htaccess and dynamically adds removes the deny lines
also the scripts to be stored outside somepath/
btw the scriptdir on mine is a global alias can i referance it by more than one to obsufate it to spammers?
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (4 May 06 11:58pm)
oh another thought, combining the two could also benifit honeypot by meaning the page can only be traweled once
in case spammer is testing for honeypots/poisoners by loading each page twice and ignoring any where the e-mail addrsses change each time
as they would only get the trigger/honeypot page once {from then on any request only gets the custom 403 error {offering de-listing via capatcha/turing test}
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (8 Aug 06 10:25am)
still would like a response
 
 Re: can i modify the code in the honepot
Author: M.Prince   (8 Aug 06 12:24pm)
Sorry to take some time to reply.

We've been working on something we call HTTP:BL for quite some time. The system works much the same as what you've described, but distributed over a wide net of honey pots rather than just your own. Generally, the system will work like other DNSBLs in that you'll be able to query against our DNS servers to see whether a particular IP address visiting your site belongs to a harvester or other bad robot.

Determining known harvesters is easy, and we should be able to provide that functionality in the next couple months. The harder part is other "bad robots." Many of the techniques that you described above are things we're thinking about. For example, adding a "nofollow" meta tag to the honey pot page, including a specially formed link on the page, and seeing whether the robot follows it. If it does, we know there's a problem and they get labeled a "bad robot."

I'm concerned about allowing modification of the honey pot scripts themselves on a case-by-case basis. While I am 100% sure that you have good intentions, it becomes a logistical problem for us to manage a bunch of users with their own ideas on modifying the scripts (some of whom may not have good intentions).

I think what may make the most sense for you is for you to create your own trap gateway page. From that page you could do a lot of the things you describe above in order to generate data for your own system. You could also then link to the Project's honey pots and any other kinds of traps that you want. I think that gets you most of the benefits without having to require you to modify the honey pot scripts themselves.

Finally, if you have more ideas on how we can use the collective data coming off all the honey pots in order to do some of the things you propose, please do let us know. We're very much in the alpha-design stage of HTTP:BL, so we can make a lot of adjustments and accommodations to good ideas if you and other folks send them to us.

Thanks for your input, and sorry again for taking a while to respond.

Matthew.
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (8 Aug 06 6:39pm)
well you see my main issue with a dns based blacklist is the
1 time added to each http request
2 inability for people to easily and quickly de-list the ip they just got after it was used by a harvester
{as dnsbl entries are mistakinly cached by many isps, fine for mail as you can suffer an smtp outage easier than loss of all access to large chunks of the web} hell if your on dynamic ip you have no real need to run an mta as your isp will provide a smarthost}

thus my preferance for catching harvesters/bad bots & fast {first and last link on each page} on a per server basis and a de-listing immediatly available on the server {for some wierd new browser that prefetches every link {ive seen shareware web accelerators that do it}

the script i have simply adds the ip of the machine running the script to a .deny line in a .htaccess file and then copies this to a list {in my case 2} but can be just one of locations on the server {mine has two main root folders out of which all the vhosts are branched
{this is the one i'd like to merge with yours} as atm it presents a redirect to yours but after they GET that script only maybe 5% do a subsequent GET for your script
and i know it all works fine in a browser ;)
another option might be for my script to output an empty frameset that uses yours for content but i'd prefer for the one http GET to run both as if they are running a multithereaded spider they notice that every link but the one to your script suddenly returns 403 they might get suspicious

{with a combined script even re-trying the script returns a 403} {looking like the server has failed in some way} {thus avoiding the possibility of harvesters detecting your script by getting each url twice and discarding ones that change} {also used to detect list poisoning scripts that generate random e-mail address}

and i dont worry about good-robots because the url of both scripts is in the nofolow section of every robots.txt

would be glad to walk you or one of your other developers through it in detail on the server or by skype or IM {or even if you just wanna drop me an e-mail to say go ahead as long as you don't alter xyz as that will break the validation ;) in private of course}

just for completeness but unrelated to the script i'm interested in:
the other half but completly independant is a de-listing option that is displayed if a 403 {denied} error happens and the visiting ip is listed [with capatcha to verify non automoton]
{if the ip isn't listed it sends a fairly normal but informative 403 message to user, and if refferrer is on same hosted site it sends a notification to the webmaster same as my custom 404 in many ways}
 
 Re: can i modify the code in the honepot
Author: M.Prince   (9 Aug 06 9:35pm)
I'm still not sure why you can't basically get everything you're trying to do to work through some sort of gateway page. That page can do all the processing you want to do in terms of who goes on your local blacklist. You can then include on that page a link to the honey pot page. Bad spiders will hit your gateway and then go on to the honey pot. It's one more step, but doesn't seem like a huge deal.

If the concern is that they'll get suspicious if they if the honey pot is the only page that isn't 403ed: 1) I doubt that any of the harvesters is paying that much attention to their logs, and 2) why not use a 302 instead of a 403? Just automatically redirect all banned traffic to the honey pot. Seems like that'd solve your problem and keep trapping the bad spider. Alternatively, you could redirect the bad traffic to your gateway page and basically do the same thing.

Still not sure why you need to modify the script in order to accomplish what you're trying to do.
 
 Re: can i modify the code in the honepot
Author: R.Monks2   (18 Sep 06 5:02pm)
It's possible to append, prepend code to a page without editing the file at all. See this page:

http://zend.com/zend/spotlight/prepend.php

Robin
 
 Re: can i modify the code in the honepot
Author: A.Doherty   (24 Jan 07 10:54am)
well after many months of having my sitebanning php immediatly redirect bots to the honeypot php url i see few bots following it
so i have a new approach
allow server sid includes in this one directory and have index.shtml be
<!--#include virtual="/path/ban-bot.php" --> {my code .htaccess blocking the ip from further browsing all but this dir}
<!--#include virtual="/path/contact.php" --> {the honeypot code}
and hope that this ups the number of bots that get fed the honeyed information

the downside of this method is the php generated headers don't get passed to the client {btw}
{found a server side fix to give same headers}

my bot-banning code still available for review/use etc

Post Edited (24 Jan 07 12:36pm)



do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–25, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email