 HTTP proxies
Author: A.Daviel   (24 Feb 05 3:59am)
I was looking at robots traversing my website and found that some of the Java
ones that ignored robots.txt were coming through a Squid proxy. By default, Squid adds an X-Forwarded-For header giving the address of the original requestor. It might be worthwhile logging this information in PHP. No doubt there are others using
proxies that don't add headers; I remember trouble a little while back with elementary schools in Korea running a misconfigured Apache proxy.

HTTP_USER_AGENT: Java/1.4.1_02
HTTP_VIA: 1.1 proxy.vianet:3129 (squid/2.5.STABLE8)

HTTP_USER_AGENT: Java/1.4.1_04
HTTP_VIA: 1.1 (squid/2.5.STABLE6)

HTTP_USER_AGENT: Java/1.4.2_06
HTTP_VIA: 1.1 (squid/2.5.STABLE7)
- this seems to be an open proxy (anyone can use it)
 Re: HTTP proxies
Author: M.Prince   (24 Feb 05 4:36am)
Interesting. Thanks for the tip. The trouble with x-forwarded-for is that it can be forged as often as it is legitimate. With v.0.2 of the scripts we plan on recording x-forwarded-for as well as tracking whether it, or remote_address appears to be reporting the correct IP for a visitor. This should both allow us to function on reverse proxies (where we currently have trouble) and to solve the problem you point out.

May be good to also record http_via for the same reason.

Thanks for the info!!


