NDC Logical Firewall - Choosing Hardware

We've been developing and testing the logical firewall on a Dell Dimension 4100 with a 1GHz Pentium-III CPU, a 1.4MB 3.5" floppy drive, 256MB RAM, and a 3COM 3c905C-TX 10/100 Mbit/sec network interface card. See also the LFW with Gigabit Ethernet below.

The following aspects of choosing hardware should be considered:

How fast a CPU is needed?

This depends somewhat on how much traffic you intend to send through the firewall. A 1GHz Pentium 3 can forward about 40,000 packets/sec which should saturate a 100Mbit subnet where all traffic flows through the firewall (unless your applications send an unusually large number of small packets).

Even through unencrypted tunnels, a 1GHz Pentium 3 can forward roughly 32,000 packets/sec which, for large packets, is still more than enough to saturate a 100 megabit/sec full-duplex interface.

In units more familiar to end-users, we've measured 9.2 megabytes/sec file transfer through a 1GHz P3 logical firewall with a single 100 megabit/sec full-duplex interface--the same speed we measured without the firewall.

For a slower network (or a network where only a fraction of the traffic flows through the firewall) proportionally less speed will suffice.

If you're planning to setup encrypted tunnels between firewalls or use the firewall as a VPN server your CPU speed requirements per unit of throughput will be much higher for that fraction of data which needs to be encrypted and decrypted. A 1GHz Pentium 3 tops out at (very roughly) 40-80 megabits/sec when encrypting and/or decrypting traffic.

If you're buying a new system, it probably doesn't make sense to buy anything slower than an 800MHz Pentium 3 these days.

BIOS

To use Gibraltar, the BIOS must be new enough to be able to boot from CDROM and the boot order must be changed so that the CDROM is the first of the bootable devices present.

How much RAM is needed?

Gibraltar will run with 32MB but you probably shouldn't put a system into production with less than 64MB and 128MB or 256MB would probably make more sense.

Iptables uses only a few hundred bytes of RAM per connection for the state tables and except in pathological situations, this will probably not amount to more than a few MB. Extra memory is useful primarily for caching files from CDROM (so the CDROM can spin down and stay down) and for ramdisk.

Floppy Disk and/or USB Flash Memory

Traditionally (and most simply), the firewall's configuration is stored on floppy disk, so if possible, you should try to get one. However, it is now also possible to use a USB Flash Memory stick instead of a floppy.

Does linux have device drivers for the hardware?

Perhaps, the easiest way of shopping for hardware is to borrow an existing system and just try booting Gibraltar and running uw-setup. If it works, try to buy a similar system.

Since hardware manufacturers seldom develop linux device drivers for their hardware and it falls to dedicated volunteers to write them, it is usually good policy not to buy the "latest and greatest" device but instead buy a "tried and true" one which has been out for a while and is already supported by linux.

Don't use an LS-120 or any other non-standard floppy. Use a vanilla, 1.4MB 3.5" floppy drive.

Gibraltar has support for many network interface cards. To name a few:

3COM: 3c905, 3c905b, 3c905c, 3c900, 3c59x
INTEL: ether express pro 100
HP: J2585A, J2585B
StarTech: ST100S (works well and costs only $10 at UW bookstore)

Many others are supported (and/or detected) to some degree. See "/system/etc-static/pcitable" for those which may be automatically detected and see "/lib/modules/*/kernel/drivers/net/" for all the drivers which are available.

Using a Serial Console Instead of a Monitor and Keyboard

You need not have a conventional monitor and keyboard attached to your Gibraltar system. Instead, when necessary, you can connect a terminal emulator to your Gibraltar system's serial port and do everything (except change BIOS settings) via the serial port.
See enabling serial console access in the note and tips section of the LFW homepage.

The LFW with Gigabit Ethernet

Running Gibraltar 0.99.5 on one processor in a borrowed Dell PowerEdge 2650 (a 2.2GHz Pentium 4 Xeon with built-in Broadcom BCM5701 gigabit ethernet on a 64bit wide 133MHz PCIX bus) we were able to saturate a single gigabit ethernet interface bi-directionally at 35% CPU utilization forwarding about 82,000 1500 byte packets/sec (the logical firewall configuration). In the physical firewall configuration, we were able to saturate both interfaces bi-directionally forwarding about 164,000 1500 byte packets/sec at about 67% CPU utilization. Using small packets (128 Bytes) we were CPU limited forwarding about 211,000 packets/sec.

Our research indicates that a fast and wide PCI bus is necessary to achieve good gigabit ethernet performance.

Note: Gibraltar 0.99.5 failed to automatically load a driver for the Broadcom 5701 NICs, however a suitable driver is on the CDROM. To enable it, type:

    echo tg3 >>/etc/modules; modprobe tg3; /etc/init.d/networking restart

The Logical Firewall Running on Virtual Hardware (VMware Player)

The Logical Firewall can run under VMware including the Free VMware Player. This can be most easily accomplished by using one of the two pre-configured virtual machines offered here. See The Logical Firewall under VMWare for details.

With VMware, the host PC can be running either Windows or Linux and one host PC can run any number of distinct Logical Firewalls subject only to CPU, RAM, and disk limitations of the host. The protection offered by a Logical Firewall running under VMware (on an uncompromised host) is exactly the same as that offered by the firewall running on real hardware -- even to the physical host running VMware or to other VMware clients. For example, a Windows host and other Windows guests can all be protected as clients of a Logical Firewall running under VMware on the same physical PC).

NDC Logical Firewall - Filtering Bridge Performance

To try to answer the question: "what would happen if we put a large fraction of our campus behind a filtering bridge firewall?", I did the following experiments intended to answer these questions:

How does the firewall perform if there are a very large number of states in the ip_conntrack state table?
What would happen in a slammer-like attack where most packets do not take a short-cut through the iptables rules by virtue of being part of a "connected" state.
What is the maximum throughput a variation 4 filtering bridge firewall can sustain.

Since network and PCI bus bandwidth offer predictable firewall bottlenecks, measuring CPU utilization is ultimately the challenge. The most accurate way I know of measuring CPU use is to measure idle CPU cycles (by consuming them at low scheduling priority) and measuring how long it takes to get a unit of work done. I did this on the firewall with: "nice -20 idleproc" and noted it agreed with the CPU utilization more conveniently reported by: "vmstat 1".

To get a predictable traffic load to measure, I generated a stream of 100Mb traffic through the filtering bridge using:

    tcpblast -u  -d 0 -p 9999 -s 1024 dest-host 200000

On dest-host, I received it with:

    nc -l -p 9999 -u > /dev/null

The number of packets received on dest-host was obtained by running the following (on dest-host) before and after the test.

    netstat -s | sed -n '/^Udp:/ { ; N; p; }'

Approximately 11,000 packets/sec consumed about 33% of a 1GHz Intel P3 running Gibraltar 0.99.7a with a small set of rules produced by the variation 4 rule generator. Since virtually all packets sent were received, and the throughput was close to full wire-speed (100Mb/sec for this test), and the test was repeatable, the measurement is considered valid.

How does the firewall perform if there are a very large number of states in the ip_conntrack state table?

To answer this question, a large number of SYN_RECEIVED states were generated into the ip_conntrack state table by running this script on the firewall:

    #!/bin/sh
    echo 2097152 > /proc/net/sys/ipv4/ip_conntrack_max
    COUNT=200000
    while [ $COUNT -gt 0 ] ;do
      let HIGH=($COUNT/253)%253+1
      let LOW=$COUNT%253+1
      let PORT1=$RANDOM%64000+1
      let PORT2=$RANDOM%64000+1
      nemesis-tcp -fS -S 10.1.$LOW.$HIGH -D 10.2.$HIGH.$LOW -x $PORT1 -y $PORT2
    # nemesis-udp     -S 10.1.$LOW.$HIGH -D 10.2.$HIGH.$LOW -x $PORT1 -y $PORT2
      let COUNT=$COUNT-1
      case $LOW in 1) echo $COUNT;; esac
    done

The script generated over 190,000 state table entries between about 64,000 unique source and destination IP addresses between about 64,000 different ports.

Since the script needs to run "nemesis-tcp" for each packet generated, it is necessary to increase the timeout for SYN_SENT from the default of 2 minutes to 20 minutes to prevent some of the state from timing out before the script finishes. This can be done (on Gibraltar 0.99.6a) with this command:

    FILE=/proc/sys/net/ipv4/ip_conntrack_tcp_timeouts; awk '{$3 = 1200; print}' < $FILE > $FILE
  # FILE=/proc/sys/net/ipv4/ip_conntrack_udp_timeouts; awk '{$1 = 1200; print}' < $FILE > $FILE

The state created by the script above can be examined in /proc/net/ip_conntrack.

There was no measurable difference in CPU use for the 100Mb UDP test stream with 190,000 TCP SYN_RECEIVED entries in the ip_conntrack state table. The test was repeated (using the 2 lines commented out above) with identical results after creating 190,000 UDP UNREPLIED packets (and modifying the appropriate UDP state timeout).

What would happen in a slammer-like attack where most packets do not take a short-cut through the iptables rules by virtue of being part of a "connected" state.

To answer this question, dummy firewall rules were inserted (into the "mangle table") which would be tested (but not matched) before the test for "state connected". This script was used to insert the dummy rules:

    #!/bin/sh
    COUNT=500
    while [ $COUNT -gt 0 ] ;do
      let HIGH=$COUNT/253+1
      let LOW=$COUNT%253+1
      iptables -t mangle -I PREROUTING 1 -p tcp -s 1.1.1.$HIGH -d 1.1.1.$LOW -j ACCEPT
      let COUNT=$COUNT-1
    done

Testing each packet against these additional 500 iptables rules caused the 100Mb UDP test stream to consume about twice as many CPU cycles as it did without the rules. This shows that a DOS attack (or a portscan) can have a somewhat greater impact on firewall CPU use than normal connected traffic flow and that impact depends on how large the firewall's ruleset is, or more specifically, how early in the ruleset the unwanted traffic can be excluded.

What is the maximum throughput a variation 4 filtering bridge firewall can sustain.

To answer this question, we ran Gibraltar 0.99.7a on a Dell PowerEdge 2650 with a single 3.06GHz P4 XEON processor, two built-in broadcom gigabit ethernet adapters (on a 133MHz 64bit PCI bus). Using a "smart-bits" network load tester, we found that for maximum size packets (1518 bytes) with a minimal ruleset, the firewall was able to keep up at 95% of full bi-directional gigabit network speed at about 80,000 packets/sec (40,000 each way). For small (128 byte) packets, the firewall was CPU limited at about 220,000 packets/sec. Enabling/disabling "hyperthreading" had no measurable effect (which is not surprising since Gibraltar uses a uniprocessor Linux kernel).

Note that bridging performance of Gibraltar appears to be somewhat less than routing performance, or put another way, the 2.2GHz processor we tested before as a routing firewall performed slightly better than the 3.06GHz processor we just tested as a bridging firewall.

idleproc.c

    #include 
    #include 
    #include 
    /*
     * idleproc.c - a tool to measure idle CPU cycles by consuming them.
     * Corey Satten 5/2000
     */

    long ctr;
    volatile long sec;

    catch(int *s) {
	sec = ctr;
	}

    main() {
	int i, j, f, t;
	double use;
	struct timeval tv1,tv2;
	struct timezone tz1,tz2;

	signal(SIGALRM, catch);
	alarm(1);
	gettimeofday(&tv1, &tz1);
	while (sec == 0) ++ctr;
	gettimeofday(&tv2, &tz2);
	t = (tv2.tv_sec-tv1.tv_sec)*1000000 + (tv2.tv_usec - tv1.tv_usec);
	sec = sec * 500000.0 / t;

	for (f=0; ;++f) {
	    tv1 = tv2;
	    for (i=0; i 0) putchar('-');
	    printf("|\n");
	    tv1 = tv2;
	    }
	}

Changes Made to tcpblast.c

16c16
< , verstr[30]="FreeBSD + rzm ";
---
> , verstr[80]="FreeBSD + rzm ";
84c84
<       fprintf(stderr, "nblocks        number of blocks (1..9999)\n");
---
>       fprintf(stderr, "nblocks        number of blocks (1..999999)\n");
163c163
<               case 'p': strncpy(port, optarg, strlen(port)-1);        break;
---
>               case 'p': strncpy(port, optarg, sizeof(port)-1);        break;
197,198c197,198
<         if (nblocks<=0 || nblocks>=10000) {
<               fprintf(stderr, "%s: 1 < nblocks <= 9999 \n", argv[0]);
---
>         if (nblocks<=0 || nblocks>=1000000) {
>               fprintf(stderr, "%s: 1 < nblocks <= 999999 \n", argv[0]);

Corey Satten
Email -- corey @ u.washington.edu
Web -- http://staff.washington.edu/corey/
Date -- Mon Jan 28 12:25:56 PST 2008