Saturday, October 2, 2010

Save on bandwidth with the Squid proxy server

Squid is a caching proxy for the web that supports HTTP, HTTPS, FTP and more. Its distinct advantages are caching frequently-requested pages to speed up web page load times and also reducing bandwidth by not having to re-request the same page over and over again. It can also be used as a reverse proxy to accelerate web servers by serving up cached content rather than permitting continuous hits to the web server for identical content to multiple clients.

To illustrate how to quickly set up Squid as a caching proxy, Fedora 13 currently provides a very recent Squid 3.1.4 and is easy to install:

# yum install squidOut-of-the-box, Squid will work as a web client proxy for the local host and local network. What you want to do is edit /etc/squid/squid.conf and look for the “localnet” entries, to comment out those networks that are not on your local network. For instance, if you use a 192.168 network at home, comment out the 10.0.0.0 and 172.16.0.0 lines:

#acl localnet src 10.0.0.0/8 # RFC1918 possible internal network#acl localnet src 172.16.0.0/12 # RFC1918 possible internal networkacl localnet src 192.168.0.0/16 # RFC1918 possible internal networkNext, start the Squid service. If you have a firewall enabled on the system, be sure to allow TCP access to port 3128.

At this point, you can test by using a command line browser on the local system by doing:

$ http_proxy="http://localhost:3128" elinks http://foo.com/And then look at the /var/log/squid/access.log file. If the browser did not complain about not being able to connect, and the log files show activity, then you have successfully set up Squid. The logs will look something like this:

1281203766.589 2626 ::1 TCP_MISS/200 18137 GET http://foo.com/ - DIRECT/1.1.1.1 text/html1281203767.186 595 ::1 TCP_MISS/200 4867 GET http://foo.com/skins/common/
commonPrint.css? - DIRECT/1.1.1.1 text/cssIf you were to execute the same browser command again, you would see the following:

1281204000.528 313 ::1 TCP_MISS/200 18137 GET http://foo.com/ - DIRECT/1.1.1.1 text/html1281204000.591 60 ::1 TCP_REFRESH_UNMODIFIED/200 4873 GET http://foo.com/skins/common/
commonPrint.css? - DIRECT/1.1.1.1 text/cssThis shows you the cache at work. The initial page is loaded again, but the CSS file is sent to the requesting browser using the cached copy. The next step is to try the same from another system that would also be using the cache (you can easily use the same command line browser command if available).

If you want to have a transparent proxy setup, so that no one will know the proxy is in use and cannot circumvent it, you can easily do so by adjusting iptables rules. If your firewall system is running Linux, this is easily accomplished. Note that if you do use a transparent proxy, you cannot use authentication on the proxy. If these aren’t important to you, setting up a transparent proxy is a fast and easy way to force everyone on the network to use it.

In /etc/squid/squid.conf you want to uncomment the “cache_dir” directive:

# Uncomment and adjust the following to add a disk cache directory.cache_dir ufs /var/spool/squid 7000 16 256and change

http_port 3128to

http_port 3128 transparentOnce these changes have been made and Squid has been restarted, you also need to change the firewall rules for your network’s firewall or gateway system by redirecting all output HTTP traffic to the proxy. This can be tricky, depending on whether or not your Squid install is on the firewall system or if it’s a separate system in the local network. It also depends on your firewall’s software. The Squid wiki has a section on Interception (i.e. transparent proxies) and how to set them up with Cisco devices, Linux, FreeBSD, and OpenBSD.

That same wiki page also has other example configurations. Squid can be used for more than just web page caching, and there are examples there on how to use it for Instant Message filtering, using it as a reverse proxy to cache web page requests on a web server, how to set it up with various forms of authentication, etc.

Squid is very versatile and can do quite a lot. For large organizations, Squid offers a surprisingly easy way to save on bandwidth, as well as provides an easy way to force authentication to be required in order to obtain outbound access to traffic. For simple web caching, Squid is pretty much ready to run as-is, and the wiki offers a lot of examples and help if you need to consider something a little more complex.

No comments:

Post a Comment