Squid's optimization guide for administrators

O MNIEPORTFOLIOPROGRAMYGALERIETEKSTYINNE STRONY

FOTOGRAFICZNE

Przeczytaj
Zajrzyj

The guide for improving Squid web cache server performance

Maciej Koziński

Although the web cache server should improve the access to the World Wide Web resources it is often bamed for slowing down the connection to the world. There are at leats these reasons for that:

some people and organizations dislike the web caching service, because it reduces the number of to their web pages - in commercial environment that seems to lower their prestige ;)
some users try to turn off web cache and they get web pages fetched faster; they does not pay attention to the simple fact that the free bandwith is available due to the other users using web cache ;)
some web caching installations are misconfigured and work slow :(

It means that it is not enough to compile and install squid for web caching to work fine, but you have to watch the service and improve it if necessary.

The general rule for tuning squid is to reduce the number of activities. You can simply live wihout caching some resources and having some information in your logs. You must remember that:

every little action costs something
thing done locally are (sometimes) less expensive than spreaded across network

for squid by properly planning the configuration for the clients. The clients should not go through the web cache while fetching the local resources or other resources available via fast links. You will not have any profit for caching the local resources while the number of requests could be relatively large and could slow down the transfer for other resources via squid.
The example below shows how to configure Netscape via proxy auto configuration file to avoid behaviour described above:


	var url;
	var host;
	function	FindProxyForURL (url,host)
	{
		if (shExpMatch(url,"*.your.local.domain/*"))
		{
			return "DIRECT";
		}
		else
		{
			return"PROXY
			my.squid.server.com:8080; " +"DIRECT";
	 	}
	};
	ret = FindProxyForURL(url, host);

This will leave more RAM and hard disk space, more CPU time and more socket for these connections that really require caching.

You can configure squid in several ways to stop caching the objects, which are available at low cost. First, you should use the always_direct directive in squid.conf file. The argument should be as above - your local domain/network. Construct proper access control list (ACL) e.g.:

acl local dst xxx.xxx.xxx.0/255.255.255.0

the one above is for pseudo-C class network (Ethernet segment with 254 effective IP addresses), or:

acl local dstdomain your.local.domain

The second one is less effective, because you should have involve DNS for each request to check out existence of reverse (IP->fully qualified domain name) DNS record. In fact there is one more trouble - you should have all your clients registered into primary and reverse DNS - if you forget about that, unregistered client will not grant the permission to use the web cache.

Having ACL local defined you should declare next two rules in squid.conf:

no_cache deny local

always_direct allow local

The first one forces squid to never cache object from specified ACL and immediately removes objects matching this ACL from cache. The second one forces squid to redirect the request to origin server - the squid will avoid fulfilling the local requests.

These simple configuration tricks save a lot of useful resources from unnecessary spending. These setting are necessary if you have many old clients, which are not able to be configured via proxy auto configuration (shown above).

First see http://squid.nlanr.net/Squid/FAQ/FAQ-11.html#ss11.17 to prevent slowing down the squid installation. This is done by too large hard disk space in comparison to RAM available to squid.

The general rule is to dedicate as much RAM as possible to squid. The reason is simple: retrieving objects kept in RAM cache is much faster than retrieving them from disk. When having many requests simultaneously, the RAM is used for buffering incoming and outgoing data - so having much RAM in large installation is required.

Regarding to the fact that memory hit ratio is rather small (as I had measured it is below 10%), the performance of squid strongly depends on the disk setup. Having the cache directories distributed among many disks improves the performance, as the NLANR research proved. It is also good idea to have more than one disk controller, hard disks connected to different controllers and cache directories spread among these disks. This could be default for PC's having more than one disk - they are now equipped with two (E)IDE controllers. Having the PC with Linux you can also deal with the program called hdparm, which could improve performance of your disks.

Having squid compiled and configured with async I/O threads seems to be also good idea. You will need libthread.a library - fortunately, it is now available in almost all modern Uni*es. This could be achieved by configuring squid before compiling this way:

./configure --enable-async-io

You should carefully balance with your hit ratio and speed - remember, that cache with big spool directory has usually very good hit ratio, but it's also usually slow. I suppose that even the squid does not need much CPU power in general, it requires al lot of power while processing the request and looking for object in it's database - when the spool grow up, processing each request consumes precious time - it is clearly visible on slower machines. I have had observed this at Polish POL34 w3cache hierarchy - the fastest cooperating machine was the one after disk crash - with almost empty spool.

Due to the fact that Squid requires a lot of power in very small periods of time, it is probably good idea to give the highest priority to it at startup. I have done that altering the RunCache script:

/usr/bin/nice -n -20 squid -Y $conf >> $logdir/squid.out 2>&1

This way of calling nice is useful with GNU nice (Linux, FreeBSD?) and also for Solaris 2.x. Why the highest priority? Web is almost interactive and waiting for appearing the next Web page is probably one of the most annoying things while working with Internet. To be comparable sometimes even with direct connections squid should react as soon as possible for requests. Giving it highest priority for very short period when it requires it could make it running faster.
Another interesting concept has been given to me by Piotr Auksztulewicz - he had used plock() for stopping squid process and data from swapping between RAM and virtual memory pools. It could be done by editing src/main.c and adding to the includes:

#include

and add to the beginning of the the function main() this call:

plock (PROCLOCK);

This will keep whole squid process in RAM all the time.

Logging

Logging is useful thing for gathering statistics and debug info, but it is sometimes awful, specially in workhorse installations. Look at calling the main process:

/usr/bin/nice -n -20 squid -Y $conf >> $logdir/squid.out 2>&1

I had cut out the -s parameter - this is unnecessary spending of time and resources when you don't use the separate log server with syslog. Everything you want to know about squid you could find in it's own logfiles.
It is usually also unnecessary to log ICP queries to the access.log - I don't know exactlly how much could it slow down squid itself, but it slows down logfile processing for sure :))
Turn it of in yor squid.conf this way:

log_icp_queries off

You should also turn off or limit ident loopkups for your clients if there is no certain need to do this. Squid has it turned off by default.

IDENT server

Many web sites will check out your ident before sending a document to you. That means - when you don't have and ident server on your machine - waiting for timeout before getting the Web page or spending additional resources for respawning ident copies.

My solution is to get free pident by Peter Eriksson <pen@lysator.liu.se> and compile it with thread support (again you will need libthread.a library) and run it from startup scripts, not from the inetd.conf. This would make one copy of identd resident in RAM and processing all the requests without forking and assosiated overhead. Get the pidentd from ftp://ftp.lysator.liu.se/pub/ident/servers/ .

Squid now has own caching nameserver. But it still needs to get DNS answer quickly during the initial connection.

If you are focused on speeding up fulfilling your web requests - use CD. Looking up the content in memory is much, much faster than querying via network remote host, waiting for remote lookups and all responses to decide where to get from. As I have measured ICP and CD with squeezer CD responses in similar conditions (link, workload) are several times faster than ICP.

This document is under constant construction and it is compilation of many people's observations and ideas. If you have any question, flame, suggestion or idea how to speed up web caching or you want to discuss issues described above - contact me:

The guide for improving Squid web cache server performance

Maciej_Kozinski@wp.pl