by
Jennifer Vesperman
10/25/2001
Transparent proxying frees you from the hassle of setting up individual browsers to work with proxies. If you have a hundred, or a thousand, users on your network, it's a pain to set up each browser and to use proxies -- or to try to convince users to go into their preferences and type in these symbols they don't understand.
Using transparent proxying, you intercept their web requests and redirect them through the proxy. Nice and simple -- on the surface.
Transparent proxying (more commonly known as TCP hijacking) is like Network Address Translation (NAT) in some respects: It is to be avoided at all costs, and only used if there absolutely, positively, no other way.
Why? Because transparent proxying does not work very well with certain web-browsers. With most browsers you're fine, but if even a quarter of your users are using badly behaved browsers, you can expect your help desk costs to exceed any benefits you might gain from transparent proxying. Unfortunately, these browsers are in wide use.
These browsers behave differently if they are aware of a proxy -- all other browsers follow the standard, and the only change they make with a proxy is to direct the requests to a different machine and port. Badly behaved browsers leave some of the HTTP headers out of their requests, and only add them when they know there's a proxy. Without those headers, user commands like "reload" don't work if there's a proxy between the user and the source.
Transparent proxying also introduces a layer of complexity, which can complicate otherwise simple transactions. For instance, a web-based application that requires an active server cannot test for the server by making a connection -- it will connect to the proxy, instead.
So how does transparent proxying work?
A firewall or other redirector catches TCP connections directed at specific ports on remote hosts (usually port 80), and directs them to the local proxy server. The proxy server uses HTTP headers to determine where it is supposed to make a connection to, and proxies the request.
System administrators are often asked to also transparently proxy FTP and SSL, but these can't be transparently proxied. FTP is a more complex protocol than HTTP, and provides fewer hints as to the original destination of the request. SSL is encrypted and contains no useful data about destinations. Attempts to decode SSL are precisely what it's designed to prevent: decoding SSL to transparent proxy -- it would be indistinguishable from a "true" man-in-the-middle attack.
To perform transparent proxying, we need a server between the
clients and the destinations. This server must have the necessary facilities to
match and redirect traffic, such as
netfilter
and
iptables
. Any
firewalling system capable of Network Address Translation and traffic
redirection is suitable.
You will need to configure a rule to catch traffic destined for port 80 on external hosts, and redirect this traffic to the port of a proxy server on the intercepting machine.
You can have proxies which aren't on the intercepting machine, but these are more awkward. First, the source address of the request is no longer available to the proxy -- it's lost in the process of redirection. You can solve this by using destination NAT (Network Address Translation), but you then have to route the proxy traffic back through the translating server. If you attempt to have the proxy pass the HTTP response back directly, the client will be confused and (quite correctly) refuse to speak to the proxy. The proxy is not the machine the client thinks it's talking to -- the client thinks it's making the request of the destination web server. The proxy must route back through the interceptor, so it can translate the addresses back, and let the client continue to believe it's speaking directly to the web server.
HTTP/1.1 made life easier for transparent proxies, by making the host header mandatory. This header contains the name of the machine (as given in the URL) and allows virtual name-based web-hosting, by allowing the web server to use the host header to determine which page to respond with.
For transparent proxies, it provides the proxy with the host
name. Having received an intercepted port 80 connection, the proxy server needs
to understand that it is not receiving a fully qualified absolute URI (Uniform
Resource Identifier), but a relative URI. Normally, a proxy server receives
http://host/path
,
but if the client thinks it's talking to the server, not a proxy, it just asks
for
/path
.
The proxy server uses the
HOST
header to
reassemble the fully qualified URI, then checks its cache and does its usual
proxying.
Squid is suitable for transparent proxying because it is also designed as a reverse proxy (also known as an "HTTP accelerator"), and can read these abbreviated request headers. In accelerator mode, it fronts for the actual web servers and receives requests as if it were the web server, so it was designed with the ability to reassemble relative URIs. To use it as a transparent proxy, we enable this web acceleration behavior.
next >>