Http:BL API Specification

Overview

For many years email recipients have benefited from the use of various DNSBLs in the fight against spam. Through efficient DNS lookups, mail servers are able to check individual connecting clients against various black lists. This provides mail servers with the ability to decide to how client requests are handled from hosts based on individual black list criteria. Hosts are able to decide to block requests, allow requests, or perform extra spam filtering scrutiny to messages from hosts based on results from black lists lookups.

Http:BL is similar, but is designed for web traffic rather than mail traffic. The data provided through the service allows website administrators to choose what traffic is allowed onto their sites. This document describes how to integrate with and take advantage of the http:BL service.

If you have questions about this document, please contact us and we will do our best to answer them.

Http:BL Usage

Usage of the http:BL service is governed through the Project Honey Pot Terms of Service. By using the http:BL service you agree to abide by these terms.

To use http:BL a host need simply perform a DNS lookup of a web visitor's IP address. Http:BL's DNS system will return a value which indicates the status of the visitor. Visitors may be identified as search engines, suspicious, harvesters, comment spammers, or a combination thereof. The response to the DNS query, as outlined below, indicates what type of visitor is accessing your page.

Each user of http:BL is required to register with Project Honey Pot (www.projecthoneypot.org). Each user of http:BL must also request an Access Key to make use of the service. All Access Keys are 12-characters in length, lower case, and contain only alpha characters (no numbers). Generating non-assigned keys, not including a key in DNS queries, and sharing keys with other members or non-members are all violations of the Terms of Service.

Special Consideration

Developers who build the http:BL service into their software are encouraged to enable users of their software to give back to the Project. Http:BL is only valuable if malicious robots continue to run across the Project's honey pots. As such, developers are encouraged to implement systems which would allow the easy creation of and/or linking to honey pots.

For example, if you are developing a plugin for blogging software, we encourage you to prompt users during creation to provide their a link to a honey pot they have installed or a QuickLink. Your plugin can then drop invisible links to the honey pot throughout the blog site. Again, http:BL's value depends on getting as much data from the honey pot network as possible, and getting that data depends on getting traffic to honey pots. Please keep this in mind as you develop your software.

DNS Query Format

Queries are performed using standard DNS queries. For example, from the command line of a Unix-based machine, you could run a DIG or NSLOOKUP query for a particular address. All queries to http:BL should be run against your local DNS server which, if it does not have an authoritative answer, will hand the query off to a more authoritative DNS server. You should not attempt to query the most authoritative DNS server directly, but should instead rely on the DNS infrastructure to handle this routing.

The format of queries must be precisely setup in order for accurate responses. All queries must include your Access Key followed by the IP Address you are seeking information about (in reverse-octet format) followed by the List-Specific Domain you are querying. Imagine, for example, you are querying for information about the IP address 127.9.1.2 and your Access Key is abcdefghijkl, the format of your query should be constructed as follows:

abcdefghijkl.2.1.9.127.dnsbl.httpbl.org
[Access Key] [Octet-Reversed IP] [List-Specific Domain]

Two important things to note about the IP address in the query. First, the IP address is of the visitor to your website about which you are seeking information. Second, the IP address must be in reverse-octet format. This means that if the IP address 127.9.1.2 visits your website and you want to ask http:BL for information about it, you must first reverse the IP address to be formatted as 2.1.9.127.

Note that you reverse the the order of the octets (the numbers seperated by the periods) you do not reverse the IP address entirely. For example, if you were querying the IP address 10.98.76.54, the following are examples of correct and incorrect examples of reverse-octet format.

Query: 10.98.76.54
Right: 54.76.98.10
Wrong: 45.67.89.01

Note that, in the future, http:BL will add support for multiple sub-types of lists and support other List-Specific Domains. In this case, the end of the query could be replaced with something other than dnsbl.httpbl.org. These sublists may identify only harvesters, only comment spammers, only search engines, etc. The dnsbl.httpbl.org List-Specific Domain combines all these sub-lists into a single list.

Query Responses

The DNS response provides details about the activity of the IP address being checked. Queries return IPv4 results with three of the four octets containing data to provide you information about the visitor to your site. The intention is for this to allow you flexibility in how you treat the visitor rather than a simple black and white response (e.g., you may want to treat known harvesters differently than known comment spammers: blocking the former from seeing email addresses while blocking the later from POSTing to forms).

Below is an example of a hypothetical query and hypothetical response which will be referenced throughout the rest of this section:

Query: abcdefghijkl.2.1.9.127.dnsbl.httpbl.org
Response: 127.3.5.1

Each octet, other than the first octet, in the IPv4 response has a meaning. The first octet (127 in the example above) is always 127 and is pre-defined to not have a specified meaning related to the particular visitor. If the first octet in the response is not 127 it means an error condition has occurred and your query may not have been formatted correctly.

The second octet (3 in the example above) represents the number of days since last activity. In the example above, it has been 3 days since the last time the queried IP address saw activity on the Project Honey Pot network. This value ranges from 0 days to 255 days. This value is useful in helping you assess how "stale" the information provided by http:BL is and therefore the extent to which you should rely on it.

The third octet (5 in the example above) represents a threat score for IP. This score is assigned internally by Project Honey Pot based on a number of factors including the number of honey pots the IP has been seen visiting, the damage done during those visits (email addresses harvested or forms posted to), etc. The range of the score is from 0 to 255, where 255 is extremely threatening and 0 indicates no threat score has been assigned. In the example above, the IP queried has a threat score of "5", which is relatively low. While a rough and imperfect measure, this value may be useful in helping you assess the threat posed by a visitor to your site.

The fourth octet (1 in the example above) represents the type of visitor. Defined types include: "search engine," "suspicious," "harvester," and "comment spammer." Because a visitor may belong to multiple types (e.g., a harvester that is also a comment spammer) this octet is represented as a bitset with an aggregate value from 0 to 255. In the example above, the type is listed as 1, which means the visitor is merely "suspicious." A chart outlining the different types appears below. This value is useful because it allows you to treat different types of robots differently.

ValueMeaning
0Search Engine
1Suspicious
2Harvester
4Comment Spammer
8[Reserved for Future Use]
16[Reserved for Future Use]
32[Reserved for Future Use]
64[Reserved for Future Use]
128[Reserved for Future Use]

Because the fourth octet is a bitset, visitors that have identified as falling into multiple categories may be represented. See the following table for an explanation of the current possible values.

ValueMeaning
0Search Engine (0)
1Suspicious (1)
2Harvester (2)
3Suspicious & Harvester (1+2)
4Comment Spammer (4)
5Suspicious & Comment Spammer (1+4)
6Harvester & Comment Spammer (2+4)
7Suspicious & Harvester & Comment Spammer (1+2+4)
>7[Reserved for Future Use]

IPs are labeled as "suspicious" if they engage in behavior that is consistent with a malicious robot, but malicious behavior has not yet been observed. For example, on average it takes a harvester nearly a week between when it finds an email address and when it send the first spam message to that address. In the meantime, the as-of-yet-unidentified harvester's IP address in the meantime is seen hitting a number of honey pots, not obeying rules such as those set forth by robots.txt, and otherwise behaving suspiciously. In this case, the IP may be listed as suspicious.

The table below shows some hypothetical responses from the http:BL DNS system and a brief explanation of their meaning.

ResponseMeaning
127.1.9.3This reponse means that the IP visiting your site is both engaged in "Suspicious" behavior and "Harvesting" behavior (signified by the "3" in the type octet). It has a threat score of "9". It was last seen by the Project Honey Pot network 1 day ago.
127.82.23.4This reponse means that the IP visiting your site is engaged in "Comment Spammer" behavior (signified by the "4" in the type octet). It has a threat score of "23". It was last seen on the Project Honey Pot network 82 days ago.
127.4.92.1This reponse means that the IP visiting your site is engaged in "Suspicious" behavior (signified by the "1" in the type octet). It has a threat score of "92". It was last seen on the Project Honey Pot network 4 days ago.

Note that, for some applications, it may be sufficient to determine that the IP address visiting a page has engaged in some suspicious or malicious behavior. In this case, the query can be simplified to look only at the last octet — without regard to the threat level or the number of days since the IP was seen on the Project Honey Pot network. If it's value is greater than zero then the IP is at least suspicious.

Non Results

A majority of IP addresses do not appear in http:BL's records. If a IP you query for does not appear, http:BL will return a non-result {NXDOMAIN}.

A non result does not mean an IP address is certified in any way to be non-malicious. Instead, a non-result simply indicates that no malicious behavior has been observed by the IP in the recent past. You should continue to exercise caution when letting any visitor onto your website, even if the visitor is not listed in http:BL's records.

Threat Scores

Threat Scores are a rough guide to determine the threat a particular IP address may pose to your site. Threat Scores should be treated as a rough measure. Threat Scores range from 0-255, however they follow a logrithmic scale which makes it extremely unlikely that a threat score over 200 will ever be returned.

To get an idea of what this metric measures, check out the Threat Rating information section.

Website administrators are encouraged to experiment with different threat scores and set them at what you determine is an appropriate level for your own site. Special care should be taken for "suspicious" IPs with low threat scores. Since these are often harmless robots, we recommend not completely blocking these IPs, but instead restricting content from them (hiding email addresses, turning off POSTing, etc.) or running them through a CAPTCHA or Javascript redirect.

Search Engines

Search engines represent a special case. Known search engines will always return a value of zero as the last octet. It is not possible for a search engine to be both a search engine and some kind of malicious bot. Search engines found to be harvesting or comment spamming will cease to be listed as search engines.

In the case of a known search engine the third octet becomes a serial number identifier for the specific search engine. The second octet is reserved for future use. For example, imagine the following returned result which will be referenced throughout the rest of this section:

Response: 127.0.1.0

The first octet (127 in the example above) remains 127 and does not have a particular meaning with regard to this search engine.

The second octet (0 in the example above) is reserved for future use and does not have a meaning at this time.

The third octet (1 in the example above) is a serial number identifier to a particular search engine. The list of serial numbers corresponding with each listed search engine can be found below.

The fourth octet (0 in the example above) is the type identifier. Because it is zero it means that this particular IP belongs to a search engine. Remember that the rules described in this section only apply if the fourth octet is a zero.

Search Engine Serials

The search engine serial is the third octet of a search engine result IP. For example, a result of 127.0.9.0 would indicate a Yahoo IP.

Serial NumberSearch Engine
0Undocumented
1AltaVista
2Ask
3Baidu
4Excite
5Google
6Looksmart
7Lycos
8MSN
9Yahoo
10Cuil
11InfoSeek
12Miscellaneous

Test Values

To test your application you may query the http:BL DNS system with certain IPs that have predefined responses. The table below outlines the IPs you can query and the response you should expect.

SIMULATE NO RECORD RETURNED
QueryExpected Response
127.0.0.1NXDOMAIN
SIMULATE DIFFERENT TYPES
QueryExpected Response
127.1.1.0127.1.1.0
127.1.1.1127.1.1.1
127.1.1.2127.1.1.2
127.1.1.3127.1.1.3
127.1.1.4127.1.1.4
127.1.1.5127.1.1.5
127.1.1.6127.1.1.6
127.1.1.7127.1.1.7
SIMULATE DIFFERENT THREAT LEVELS
QueryExpected Response
127.1.10.1127.1.10.1
127.1.20.1127.1.20.1
127.1.40.1127.1.40.1
127.1.80.1127.1.80.1
SIMULATE DIFFERENT NUMBER OF DAYS
QueryExpected Response
127.10.1.1127.10.1.1
127.20.1.1127.20.1.1
127.40.1.1127.40.1.1
127.80.1.1127.80.1.1
do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | CloudFlare Site Protection | Contact Us

Copyright © 2004–14, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email