Author: A.Daviel (24 Feb 05 3:59am)
I was looking at robots traversing my website and found that some of the Java
ones that ignored robots.txt were coming through a Squid proxy. By default, Squid adds an X-Forwarded-For header giving the address of the original requestor. It might be worthwhile logging this information in PHP. No doubt there are others using
proxies that don't add headers; I remember trouble a little while back with elementary schools in Korea running a misconfigured Apache proxy.
e.g.
HTTP_USER_AGENT: Java/1.4.1_02
REMOTE_ADDR: 213.252.239.3
HTTP_VIA: 1.1 proxy.vianet:3129 (squid/2.5.STABLE8)
HTTP_X_FORWARDED_FOR: 10.0.5.22
HTTP_USER_AGENT: Java/1.4.1_04
REMOTE_ADDR: 62.241.130.139
HTTP_VIA: 1.1 linux.egyptnetwork.com:3128 (squid/2.5.STABLE6)
HTTP_X_FORWARDED_FOR: 163.121.176.116
HTTP_USER_AGENT: Java/1.4.2_06
REMOTE_ADDR: 81.181.121.4
HTTP_VIA: 1.1 server.sharknet.ro:3128 (squid/2.5.STABLE7)
HTTP_X_FORWARDED_FOR: 81.180.144.72
- this seems to be an open proxy (anyone can use it)
|