Tuesday, February 26, 2013

Raspberry Pi as a transparent squid caching proxy

Developing Openstack Heat means spending a fair amount of time building and customizing bootable cloud images. A lot of this time is spent waiting for RPMs, debs and tarballs to be downloaded by a vanilla guest OS running inside a VM. And given that I work from home with an average broadband connection in a remote country in the South Pacific, the result is some frustrating wait times.

Since the same packages are often being repeatedly downloaded, I would benefit from some local caching. This seemed like a good excuse to use a Raspberry Pi. I went with a Raspberry Pi B running Raspbian. The aim was to set it up as a bridge and run a transparent squid proxy between eth0 (the inbuilt network interface) and eth1 (a USB ethernet dongle).

Once I'd completed the initial installation, I installed the following:
$ apt-get install squid3 bridge-utils

eth0 and eth1 were set up to bridge on my network, and an iptables rule was set to direct any port 80 traffic that passes through the bridge to squid's default port.

The following changes were made to the squid configuration file. Since I'm interested in caching larger files the maximum_object_size has been set to 512MB. My Raspberry Pi is running on a 16GB SD card; for now I have configured cache_dir to use 8GB of that.

And did this actually help my image building time? Using diskimage-builder I ran an Ubuntu customization where the source image file was already cached locally. The first run populated the squid cache with apt repository packages and the second run had a hot squid cache. The build time went from (mm:ss) 04:20 to 01:20 which I'm pretty happy with.

Doing the same with heat-jeos (which is based on oz) managed to get some cache hits on the second run, but had little impact on the (mm:ss) 22:30 build time.


GAD said...

It looks to me that you updated your kernel to include iptables. I installed the wheezy image and it does not include iptables.

Ben Nichols said...

Is there any particular reason you choose to create a bridge interface for intercepting? Rather than simply redirecting all traffic as most people do with two interfaces? What is the benefit of doing it that way, and if so, what am I missing out on?

Heres how I do it usually.

-A PREROUTING -i eth5 -p tcp -m tcp --dport 80 -j DNAT --to-destination
-A PREROUTING -i eth5 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3128

dreamofccie said...

Interesting post. Have you considered using WCCP on the CISCO and setting up squid as an intercepting proxy? Do you think this is a viable idea? Also what was your performance like in the end? Any hit on this?

Squidblacklist said...

Squidblacklist.org is the worlds leading publisher of native acl blacklists tailored specifically for Squid proxy, and alternative formats for all major third party plugins as well as many other filtering platforms. Including SquidGuard, DansGuardian, and ufDBGuard, as well as pfSense and more.

There is room for better blacklists, we intend to fill that gap.

It would be our pleasure to serve you.


Benjamin E. Nichols

Random Ponderings said...

Yearss ago inthe 56k modem era I used JANA to give my loal network a common cache and it taught me two things :

1) the quality for browser caching has sucked more and more as faster connects and unlkmited bandwidth became more common .... western - especially US - developers tend to act as if all the word has unlimited T1 acces

2) every router sold ought to come with a cache