Message Board

http:BL Use/Development

Older Posts ]   [ Newer Posts ]
 httpbl blocking googlebot
Author: C.Walter   (6 May 15 9:59am)
I am using whm cpanel and just set up modsecurity configuration using my API key. Looking at the new logs it is blocking googlebots ip's. Example being 66.249.67.135. Is there a setting to prevent this?

Access denied with redirection to http://xxxxxxxxxxx.com/ using status 302 (phase 2)
RBL lookup of xxxxxxxxxxxx.135.67.249.66.dnsbl.httpbl.org succeeded at TX:real_ip.

Post Edited (6 May 15 10:06am)
 
 Re: httpbl blocking googlebot
Author: H.User1325   (6 May 15 11:59am)
I do not know how your cpanel has implemented http:BL lookup, but the information returned by a query does include search engine identification, see http://www.projecthoneypot.org/httpbl_api.php look at the fourth octet

If all the information returned by http:BL is used you can avoid blocking search engines like googlebots.

Post Edited (6 May 15 12:01pm)
 
 Same problem here
Author: D.B27   (9 May 15 10:46pm)
If it helps…

66.249.64.238 CRITICAL 302
981138: HTTP Blacklist match for client IP.


Request: GET /somepageonmysite
Action Description: Access denied with redirection to http://www.mysite.com/ using status 302 (phase 2).
Justification: RBL lookup of xxxxxxxxxxx.238.64.249.66.dnsbl.httpbl.org succeeded at TX:real_ip.


Original Id
981138
Rule Text
#
# Check Client IP against ProjectHoneypot's HTTP Blacklist
# Ref: http://www.projecthoneypot.org/httpbl_api.php
#
# Must register for an HttpBL API Key and configure SecHttpBlKey directive
# in the modsecurity_crs_10_setup.conf file.
# Ref: https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#wiki-SecHttpBlKey
#
SecRule TX:REAL_IP "@rbl dnsbl.httpbl.org" "msg:'HTTP Blacklist match for client IP.', severity:'CRITICAL', id:'981138', phase:request, block, t:none, tag:'IP_REPUTATON/MALICIOUS_CLIENT', setvar:'tx.msg=%{rule.msg}', setvar:tx.anomaly_score=+%{tx.critical_anomaly_score}, setvar:tx.%{rule.id}-AUTOMATION/MALICIOUS-%{matched_var_name}=%{matched_var}, setvar:ip.block=1, expirevar:ip.block=%{tx.block_duration}, setvar:'ip.block_reason=%{rule.msg}', setvar:ip.previous_rbl_check=1, expirevar:ip.previous_rbl_check=86400, skipAfter:END_RBL_CHECK"


Then again, I got no clue what I'm doing so….

I have no desire to block Google.

Post Edited (11 May 15 8:43am)
 
 Re: httpbl blocking googlebot
Author: H.User1325   (10 May 15 1:02pm)
A couple of points:
1. The line that starts "Justification: RBL ... includes your 12 digit API key which should not be made public. It would be a good idea for you to edit that out of your post.

2. It look to me like the implementation of httlBL does not correctly parse the returned results, because if they did a Google search bot would not be blocked.

Without taking the time to read what may be the documentation for the API,
# Ref: https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#wiki-SecHttpBlKey
there is no way for me to know if you can change the action of this implementation.

I would suggest you go to your ISP, explain what is happening, so they can change the program or explain to you how you can change the modsecurity_crs_10_setup.conf configuration file to meet your needs.

Post Edited (10 May 15 1:04pm)
 
 Re: httpbl blocking googlebot
Author: D.B27   (11 May 15 9:37am)
1. Edited. Thank you.

2. I believe it's an OWASP (vendor) rule I can only switch on/off. Seems to me my other option would be not entering my 12 digit API key into OWASP in the first place.

The documentation for the API that you chose not to read is….

SecHttpBlKey

Description: Configures the user's registered Honeypot Project HTTP BL API Key to use with @rbl.

Syntax: SecHttpBlKey [12 char access key]

Example Usage: SecHttpBlKey whdXXXXXXXtnf

Scope: Main

Version: 2.7.0

If the @rbl operator uses the dnsbl.httpbl.org RBL (http://www.projecthoneypot.org/httpbl_api.php) you must provide an API key. This key is registered to individual users and is included within the RBL DNS requests.
 
 Re: httpbl blocking googlebot
Author: D.B27   (11 May 15 10:22am)
I did a search for 66.249. in my ModSecurity Hits List and found this example

2015-05-04 10:44:24 www.mysite.com 66.249.69.42 CRITICAL 302

981138: HTTP Blacklist match for client IP.

Request: GET /somepage.html

Action Description: Access denied with redirection to http://www.mysite.com/ using status 302 (phase 2).

Justification: RBL lookup of xxxxxxxxxxxx.42.69.249.66.dnsbl.httpbl.org succeeded at TX:real_ip.

--------------------
Just because all the numbers match the order matters. Looks to me like OWASP and projecthoneypot's RBL don't work well together.

Then I really got no clue but when Google yells at me I pay attention.
 
 Re: httpbl blocking googlebot
Author: H.User1325   (11 May 15 2:18pm)
Sorry I didn't mean to sound short about the documentation. I just didn't want to fall down the rabbit hole of reading for someone else the documentation for a program I don't have.

However, if you will scroll up and down on that page, you will see that in addition to a description of the API key, there are also descriptions of variables in the config file that may control how the results of the API are applied.

The application documented does much more than use RBL and you are right OWASP and php may now play well together. The query looks correctly formatted, ([Access Key] [Octet-Reversed IP] [List-Specific Domain]) but what the security rules, etc. direct that the engine do with the returned results may not be correct. In fact, as you noted, blocking a Google indexing bot does not seem correct, nor desirable.

When I query RBL about 66.249.69.42 I get the expected results 127.0.5.0 The last 0 (forth octet == 0) indicates this is an IP for a search engine, the 5 (third octet) indicates Google. The second octet (==0) is reserved for future use. The first octet == 127 indicates a correct query and response.

If the special case of Search engines is NOT being identified (forth octet == 0), the intended meaning of each octet will by incorrectly read.
 
 Re: httpbl blocking googlebot
Author: D.B27   (12 May 15 12:12am)
I took no offense. I meant none and I hope we both want the same thing.

I'm new to ModSecurity and WHM. They (cpanel) offered OWASP as a vendor (that I cannot change the rules for, just on/off) and I was hoping their(?) field for your project would be an enhancement. But it's really not if google gets blocked 98% of the time or whatever is enough for it to complain to me.

OWASP seems to prevent things I wasn't aware of and it seems to cause no harm that I've noticed. Fine by me. Granted, I've not really looked into it but if adding PHP's RBL blocks the one major thing who's traffic I want/need, eh, maybe I'll try it again sometime in the future when I have more time to look.

I just assumed both of you (PHP and OWASP) were on the same page and had some sort of relationship worked out but all I see is a bug that's bigger than me.

Post Edited (12 May 15 12:14am)
 
 Re: httpbl blocking googlebot
Author: H.User1325   (12 May 15 11:51am)
D.B just to be sure we are on the same page, I am a user of PHP just like you. No official connection with honeyPot other that I have several HPs on websites I control, and I use the RBL to "filter" visitors to two forums I have for some non-profits, subscription to a newsletter and a 'leave up a comment" webpage. Those implementations are all home brew to meet my needs.

To my knowledge, the relationship between PHP & OWASP is similar; OWASP is just another user of the RBL database. There are several totally independent implementations that PHP has no control over, as far as I know. Like all software, the results are no better than the programmers understanding. And as you said, looks like the designer/programmer of your implementation, may have been under a deadline and not read far enough through the documentation to get it right. To bad.

Seems to me that what is intended to help you with site security, is really harming your efforts to get your website and/or blog out there. If Google can not index your content to help others find you, your job is harder than necessary. I would turn off the use of RBL or have your provider switch you to another security option. JMHO
 
 Re: httpbl blocking googlebot
Author: D.Howe   (8 Apr 17 4:58am)

in your csr-setup file, change the Action so that the variable tx.block_search_ip is set to 0, rather than the default 1.

ie:

SecHttpBlKey xxxxxxxxxxxxx
SecAction "id:900500,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.block_search_ip=0,\
setvar:tx.block_suspicious_ip=1,\
setvar:tx.block_harvester_ip=1,\
setvar:tx.block_spammer_ip=1"



do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–24, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email