Security in the Post-Internet Era:

The needs of the many
vs. The needs of the few

Terry Gray
University of Washington

Written: October 2003
Last update: 01 March 2004

DRAFT DRAFT DRAFT

NB: What lies before you is the unexpurgated, that is to say, verbose version of this document. That's because the verbose version is currently the only version. No doubt a good editor could cut the volume in half, but I'm notoriously bad at trying to edit my own text. Every time I try, it gets longer :) What usually happens is that I give up and write an executive summary. Eventually I hope to have time to do that here too...


Outline

  1. INTRODUCTION

  2. BACKGROUND

  3. ANALYSIS

  4. PRINCIPLES

  5. TRADEOFFS

  6. CREDO

  7. RECOMMENDATIONS

  8. CONCLUSIONS

  9. REFERENCES

 


1. Introduction

1.1 Purpose

This document is intended to:

1.2 Turning Point

2003 was one heck-of-a-year, at least security wise. The events were so dramatic as to cause some of us to call it a turning point for the Internet. It was arguably the year the concept of the Internet as an open utility came to an end. Hence we speak of security in the post-Internet era. For the same reasons (discussed below), 2003 was also the year that the concept of unmanaged autonomous personal computers became obsolete, or at least an albatross around the neck of enterprise IT support staff. And finally, because of spam and email viruses (and especially, anti-spam, anti-virus counter-measures) it was the year we stopped thinking that Internet email was a reliable communication medium. All-in-all, 2003 was not a good year for Internet enthusiasts.

Now it's time to take stock and question assumptions, prevailing wisdom, etc, because everything we know may be wrong.

A key theme is that the Internet has changed, so approaches to security must be re-evaluated in the light of current trends and anticipated threats. We'll begin by reviewing the events of 2003 and assess their implications, then revisit some basic design principles and tradeoffs in that context. Finally we'll identify some best practices and look at the security approaches currently being used or planned at the University of Washington.

1.3 Focus on Prevention

Not considered in this document are strategies for:

All of those are important; just not the current focus. Besides, prevention usually costs less than remediation.

1.4 Acknowledgements

This paper grew out of a presentation I gave at the Security At Line Speed workshop, and a follow-up presentation at the Fall 2003 Internet2 member's meeting. That may explain why this document looks a bit like a power-point outline with extra words added :)

Thanks to the National Science Foundation and Internet2 for sponsoring the workshop, and to workshop and I2 meeting participants, and colleagues at the UW for helping me refine these ideas. In particular, David Richardson (Washington), Steve Wallace (Indiana) and Matt Davy (Indiana) have led the work on a "flexible networking" architectue as an alternative to traditional "one-size-fits-all" approaches.

 


2. Background

2.1 Recent Events

Unquestionably, 2003 was a security annus horribilis.
Most notably, we saw the:

Major 2003 security events include:

2.2 Recent Trends

Significant 2003 security-related trends include:

2.3 Metamorphosis

Let's review the metamorphosis of the Internet paradigm:

In the beginning (1969, the year the ARPANET became operational), the goal was to replace separate access links to geographically-distributed time-sharing computers with a single resource-sharing network. Then in January 1983, when TCP/IP was deployed, the goal was to glue lots of disparate networks together with minimum impedance to traffic flow at the junctions. By the mid-nineties, we began to see some contrary objectives in certain areas: selective isolation, or "high-impedance" connections among certain networks became an objective, in order to gain some protection from Internet-based threats. In 2003, this trend escalated dramatically as the Slammer and RPC-DCOM exploits (e.g. "Blaster") wrecked havoc around the world. What will the future bring? Is it likely that the trend toward isolation and "closed" networks will be reversed? If not, the open Internet utility paradigm is history. That paradigm allowed users, system admins, and network operators to assume that if connectivity between any pair of Internet-connected hosts wasn't working, it was unintentional and something was broken. That paradigm is in fact over, never to return. We can no longer make any assumptions about connectivity between Internet hosts. The best we can hope for now is conditional openness in certain subsets of the Internet. This is a big change, with dramatic implications for innovation and network operations.

2.4 Other Voices

At the Nov 2003 IETF meeting, the Internet Architecture Board proposed the following topic for plenary discussion:

And the theme of the upcoming INET 2004 conference is: Indeed, the world has changed, and the challenges are large...

In a recent interview, Bill Gates was quoted as follows:

And in another recent interview, security expert Bruce Schneier was quoted as follows:

Next, in an article explaining why he was abandoning his peer-to-peer Internet telephony project (SpeakFreely), John Walker wrote:

He continues, in response to the idea that IPv6 will make NAT go away:

Finally, in a Microsoft document describing changes to their Internet Connection Firewall (ICF) implemented in Windows XP Service Pack 2, we read:

No kidding.

 


3. Analysis

3.1 Impact

Not that we'll need to shed a tear for Microsoft's revenue potential anytime soon, but this article on the impact of recent security disasters on Microsoft's sales includes some noteworthy comments, such as:

It would appear that security, or lack thereof, is costing everybody a lot of money... except, of course, the security product vendors and consultants.

We are at the end of an era... Say farewell to:

Instead, say hello to:

The needs of the many trump the needs of the few... but at what cost? On this point we can expect:

3.2 Observations

Viewpoints vary:

Truisms:

Missed opportunities:

Infrastructure vs. end systems:

Will large-perimeter blocking turn out to be necessary, not to protect end-systems, but to protect the network infrastructure? A similar thing has happened in the area of spam control. The idea of centrally tagging all spam, but letting users filter whatever they wish at delivery time, has been replaced (or at least, supplemented) by "early blocking" strategies. This approach is now needed to slow the growing cost of systems to do the mail forwarding and filtering of the spam tidal wave.

Moreover, there is a correlation between functionality, complexity, and risk. As general-purpose computing systems are used by ever-increasing populations, some of their potential will inevitably be abused. Will it be necessary to reduce functionality to reduce the risk of abuse? That is what is already happening with email: the risk of virus-infected attachments has led many institutions to block potentially dangerous .zip and .exe attachment types (among others). The impact is lower risk of infection, at the cost of lower email functionality. Similarly, peer-to-peer applications may fall victim to the same considerations applied to network firewalls. There is even advice in a well-respected book on firewalls and network security that H.323 video conferencing applications require major holes in a firewall, so it might be better just to prohibit them and encourage people to use the phone!

3.3 Drivers

Key drivers behind these changing times include the following factors.

3.4 What We Lost

Lost: the network utility model. The network utility model is dead --long live the NUM. Once upon a time, all Ethernet ports behaved the same. This state of the world had two important operational properties: (1) simple, and (2) easy to debug. Now you cannot assume that any two network ports/jacks will behave the same, because of bandwidth management polices and security policies. Gone are the days when the quality of a network was soley based on performance and reliability metrics. Transparency, the ability of the network to forward packets from source to destination with no intentional interference or impedance, was taken for granted.

Lost: predictable connectivity, network simplicity, for network operators. The principal consequence of losing the simplicity of the network utility model (and its assumption of predictable, intentional, pervasive connectivity) is the loss of operational integrity, because it becomes harder to maintain and manage a network with a variety of firewalls in the forwarding path, especially when those troubleshooting a problem had nothing to do with either defining or implementing the restricted connectivity policy. And that loss leads to:

These losses are in comparison to not having the policy-based connectivity barriers. Of course, the question is whether or not the security benefits of those barriers offsets these losses, which in turn depends on a) how effective they really are, b) the criticality of the resource being protected, and c) what security alternatives exist.

The higher costs come in two forms: first, network perimeter defenses implementing intentional or policy-based connectivity restrictions generally look to users like the network is broken, leading to extraneous network support calls; second, when the network really is broken, it is harder to debug when various security appliances are in the packet forwarding path.

Lost: network transparency (open connectivity) for application developers and users. Selective and/or unpredictable connectivity policies lead to:

3.5 How We Lost It

The present situation is attributable to a series of disconnects between different stake-holders in the security equation, regarding the failure of "computer security". In particular, there have been major disconnects between:

The loss of the open Internet is also a consequence of the following conflicts and tensions:

3.6 Inevitable Trainwreck?

One might conclude that the present situation was inevitable, a consequence of the following fundamental contradiction:

Now combine that fact with these conflicting roles: Finally, consider that, in this age, insecurity = liability

3.7 Consequences

The current mindset is: computer security has failed, so network security must be the answer. Said differently: the network brought all the trouble, so the network people should be the ones to make the pain go away.

As a result, there is considerable pressure to "close the net", or --since there is no unanimity as to what that means-- to implement separate/independent network environments for different classes of service or user, or to partition the network into zones that match organizational or policy boundaries rather than natural geographic topologies.

The "network of networks" concept has evolved:

The result? Heisenberg/Einstein --or "Heisen/stein"-- networking: that is, uncertain and relativistic connectivity.

In short, we are seeing firewalls and NAT (Network Address Translation) everywhere, and thus

Even though few people are feeling really secure these days, in spite of all the firewalls, NATs, and port blocks, it would be hard to argue that they have provided no value. No doubt the security situation would be worse without them... or would it? Is it possible that perimeter defense strategies have materially slowed the deployment of host-based security strategies? And if so, how does that affect the overall cost/benefit equation?

Alas, there is some evidence that the presence of border firewalls slows down efforts to secure end-hosts. On the other hand, absence of border firewalls does not effectively speed up those efforts.

In any case, it seems clear that security and liability concerns will continue to trump innovation/philosophy/operations costs, and that NAT will survive unless/until a better "unlisted number" mechanism takes hold.

3.8 Encouraging Signs?

Are there any? A few... For example:

3.9 Imagine the Future

Imagine a future world when most applications tunnel over port 80 or 443 (web), or use end-to-end IPSEC-ESP. It won't happen over night, but --for better or worse-- it is already happening.

What happens when IP port numbers can no longer be used for multiplexing applications running on a host because firewalls are blocking most of the ports? Some possibilities are:

Another implication of the future "closed" world is increasing use of trusted overlay nets (aka VPNs):

We can further expect the following:

3.10 Firewall Forecast

Some of the recent trends will have considerable impact on the security solution space. In particular, the traditional cornerstone of network security, perimeter defense firewalls, will need to be re-evaluated:

On the other hand, even though they will be useless against many next-generation attacks, they will --rightly or wrongly-- continue to offer some peace-of-mind regarding "legacy" attacks. Indeed, the Microsoft RPC exploit has guaranteed that the firewall industry has a bright future, even though conventional firewall approaches can do as much harm as good. (See Firewalls: Friend or Foe? for more on this point.)

Still, firewalls are an essential part of any "defense in depth" strategy, so the real question is where they should be placed. The Perimeter Protection Paradox described in the paper just mentioned explains that the value or efficiency of a firewall is proportional to the number of systems behind it, but the effectiveness is inversely proportional to the same number. In order to minimize the size of the vulnerability zone inside a perimeter defense, the perimeter must be small. This argues for centrally-managed host-based firewalls as optimal for protecting hosts, but recent attacks have had adverse affects on the network infrastructure itself, and therefore tend to reinforce the trend toward building closed networks. Aggressive attacks such as Blaster also illustrate the value of a network perimeter defense in configuring and patching a fresh system. At the height of Blaster, it was not unusual for a new system to be infected in a couple of minutes, much less time than was necessary to download the security patches needed to block Blaster.

"The best is the enemy of the good"... hence security fixes that address some of the most pressing problems will be chosen if "better" solutions take too much time or effort to implement.

 


4. Principles

4.1 Security Ecosystem

In a recent document, Microsoft identifies the security ecosystem as including:

They further relate these as a circle, indicating that security is dependent on all five elements. If we generalize the "Trust" element to be less Microsoft-specific, this is a good overall framework for thinking about security principles.

Focusing on the Network and Host elements, we identify the following basic "lines of defense" upon which our design principles follow:

  1. Host integrity. Making sure the OS is network-safe.
  2. Host firewalling. Adding host or "end-point" firewalling --just in case the OS is not network-safe at some point in time.
  3. Cluster perimeter defense. Surrounding groups of hosts, e.g. labs or server clusters, with firewalls for defense-in-depth.
  4. Network perimeter defense. For additional defense-in-depth where the collateral damage from network firewalls is justified by risk mitigation requirements.
  5. Real-time attack detection and containment.
  6. Network Architecture. Including redundancy and isolation strategies.
Much debate occurs over how much emphasis should be placed on each of these areas. Opinions vary because local practices and requirements vary, as well as perceptions that follow from the differing roles people have.

4.2 Context

Is Higher Ed really different? Security strategies that seem necessary and sufficient in industry don't always appear that way in research universities, so let's start by identifying some of the differences in the higher-ed (HE) environment.

On the other hand, it is sometimes said that security is not an issue in HE... after all, it's just about educational course materials that are usually on public web pages anyway... right?

Wrong. Taking the University of Washington as an example, we have all of the same I.T. and security concerns of any $1-2 billion/yr corporation --plus we have 40,000 students and two hospitals to worry about. Oh, and we have some classified government research going on as well as research subject to commercial/proprietary disclosure constraints. Security? Yes, it is a very big deal for us. And government regulations relating to student and medical center records make it a bigger deal for us than for many corporations.

4.3 Problem Space

4.4 Solution Space

We will revisit and reconsider three main prevention strategies, and one remediation strategy:

4.5 Seven Security Axioms

  1. Network security is maximized when we assume there is no such thing.
  2. Large security perimeters mean large vulnerability zones.
  3. Firewalls are such a good idea, every computer should have one. Seriously.
  4. Remote access is fraught with peril, just like local access.
  5. One person's security perimeter is another's broken network.
  6. Private networks won't help. (Isolation strategies are limited by how many PCs you want on your desk).
  7. Network security is about psychology as much as technology.
Bonus: never forget that computer ownership is not for the feint-hearted.

For elaboration on these points, please see this document.

4.6 Goals

Nirvana "back then"

Nirvana now?

4.7 Anti-Goals

4.8 Perspectives

Speaking of perspectives, many times security strategies are evaluated only from the perspective of security professionals, or those who are primarily focused on improving security. There are other important perspectives to consider. What do these constituencies want from a security policy (besides not being inconvenienced by the needs of other constituencies)?

End Users want:

Department Computing Staff want:

Network Operations Staff want:

Application Developers want:

CIOs and CSOs want:

4.9 Policy Enforcement Points

Perimeter defense can be implemented at many different points in a network topology, i.e. at various network perimeter policy enforcement points (PEPs). For example:

Perimeter definition. It's important to remember that the policy enforcement point does not necessarily define the effective (or logical) defensive perimeter. For example, a host-based firewall can emulate a border firewall by applying rules based on address ranges allocated to the enterprise. However, the host-based rules may need to be more complex than if an equivalent policy is implemented at the enterprise border, if for example, the goal is to permit unrestricted flow of an insecure protocol within the enterprise, but not outside.

Similarly, the topological or physical policy enforcement point does not necessarily coincide with where the relevant policy is determined or specified. For example, the PEP might be at the enterprise border, but the determination of what policy applies may be a function of an end-host characteristic, such as MAC or IP address, or a VLAN tag of the Ethernet switch by which the host is connected.

One example to illustrate the difference between policy enforcement and policy definition points. Consider a modern border router capable of policy routing (i.e. the ability to route based on the source IP address of an incoming packet, as opposed to the packet's destination address). Now configure that router to pass traffic originating from a specific IP address range to a specific interface with a policy enforcement device attached to it. That could be a firewall or a NAT box, or a proxy server. The decision on whether a particular host will be affected by that policy enforcement device is determined on the host configuration (or a central registration mechanism that determines the host's IP address). Thus, in this case, the policy enforcement point is at the border, but the policy selection or decision point is at the edge of the network.

4.10 Host-based Protection

Here are some of the host-centric approaches to making hosts, and the information on them, network-safe:

4.11 Defense in Depth

Let's consider the implications of Defense In Depth on protection, operations, and application innovation. There are at least two kinds of Defense In Depth:

  1. Horizontal or Topological/Perimeter DID for protecting devices:
  2. Vertical or Logical DID for protecting data/information via encryption:

In both cases, the defensive strategies can complement each other, but each has both cost and benefit, and the cost and benefit both multiply with the number of layers... affecting not only "Mean Time To Penetration" but also "Mean Time To Repair" and "Mean Time To Innovation". We want MTTP to be large... that's the time it takes an attacker to successfully penetrate the N layers of defense. Conversely, we want MTTR when a problem occurs to be as small as possible, and each layer of defense adds complexity and thus slows down the troubleshooting process. Similarly, MTTI --the time it takes an application developer to innovate, or successfully "penetrate" the defensive perimeters in the path of their applications, should be small, but may not be.

Gray's DID conjecture. Given N layers of device perimeter defense, the Mean Time To Penetration, Repair, and Innovation are all proportional to N-squared. That is, they are exponential functions of the number of defensive perimeters between attacker and target:

Why N-squared, and not cubed? Maybe it's cubed. This is, after all, just a conjecture, and the point is that it feels like the impact is a lot stronger than just a linear increase.

When considering the impact of "vertical" DID across network layers to protect data streams, the relationship is probably somewhat different, perhaps even linear --but there can be no doubt that troubleshooting a problem when multiple layers of encryption are involved is more difficult, as a function of the number of layers.

Note that horizontal and vertical DID strategies interact... vertical DID (encryption) approaches may defeat horizontal perimeter defense (firewall) approaches based on packet inspection.

4.12 Isolation vs. Convergence

This is about selective isolation of network segments in order to reduce the number of people, or the range of services, affected by a security problem or any other network-propagatable-fault (e.g. broadcast storms, multicast storms, slammer storms, spam storms, etc.) To an extent, such segmentation occurs naturally in any hierarchical network structure, but broadcast storms are no longer the only problem driving network segmentation strategies. Careful consideration must be given to modern phenomena that can, as an unintended side-effect, threaten network connectivity. This is not just a matter of firewalling, it is about designing network services for robustness in the face of collateral damage from both benign and hostile phenonema.

Terminology. The term isolation might be used in any of these contexts:

Topological isolation, segmentation, and blocking all relate to intentionally limiting connectivity among network segments and with it, vulnerability. Topological isolation usually implies little or no connectivity, whereas blocking or filtering typically embraces a range of connectivity degrees.

In the context of isolation vs. convergence, isolation relates to building separate infrastructure for different services or partitioning the network into policy equivalence classes vs. using a single unified (network) infrastructure for all services. Digital convergence (the fact that all media have become ones and zeros) is the enabler, and the temptation is to forego species diversity and put all eggs in one (network) basket, since that is the most economical thing to do. Unless the converged network ever fails...

Segmentation. Partitioning a network into zones, possibly linked by some form of firewall, may be motivated by fault containment goals, or in cases where portions of the network topology map cleanly to organizational boundaries, by the desire to easily apply a uniform per-organization connectivity policy to an entire zone.

Convergence. Isolation strategies are based on recognition that not all applications are well-served by full Internet connectivity. Even unrepentent open Internet enthusiasts (no names :) agree that separate/isolated networks are appropriate for certain services, e.g. real-time control systems or clinical applications where interruption of connectivity or compromise of attached systems might be catastrophic. Some computer-controlled machines simply should not be attached to the Internet --at least not directly. On the other hand, as noted above in the Seven Security Axioms, isolation strategies are constrained by how many separate devices one is willing to have on the desk (or in one's pocket/purse). Having one host connected to two completely separate networks means that host becomes the attack gateway between the two. It is also clear that critical servers should not share the same subnets as less-well-protected client systems, since subnets define a natural topological barrier for some classes of problems. Full convergence of services onto a single network infrastructure maximizes ROI: both Return On Investment and also the Risk Of Intrusion/Interruption.

Design rules. Given the necessity of selective topological isolation and perimeter blocking defenses, we need some guiding principles for network architecture and design that will illuminate the best and worst places to introduce network traffic barriers. The previously-described Perimeter Protection Paradox warns us against having large vulnerabilty zones ripe for propagating insider attacks, but operational and troubleshooting considerations argue for having "traffic disrupting" perimeters be few and at well-understood boundaries. Policy-based traffic disruption within the network (as compared to on the hosts themselves) can lead to perplexing behavior for end-users as well as support staff, because there is no way for the perimeter firewall to communicate what it is doing to the affected user. (To the user, it just looks like the network is broken.) Moreover, blocking by application port number is typically easier to manage (e.g. more predictable when diagnosing a problem) than blocking by arbitrary ranges of IP addresses. This is especially true when the policy is defined and implemented by a group that is different from the one diagnosing the problem.

There are two well-respected design principles in conflict here: the principle of least privilege, extended to perimeter protection policies, argues for the minimum connectivity necessary for a given circumstance. In contrast, the principle of least surprise, a cornerstone of good user-interface design, suggests that the network should be as transparent as possible. Finding the right balance requires assessment of three factors, integrated over the entire application space:

4.13 Isolation Options

Here is a taxonomy of options for isolating traffic according to network layer technology:

And here is a taxonomy of options for isolating traffic according to security or reliability policy:

Note that VPNs (not counting end-to-end IPSEC-ESP) are usually only used for protecting remote access, but remembering security axiom #4 (Remote access is fraught with peril, just like local access) one might reasonably ask: if VPNs are good for remote access, why are "insiders" usually left "unprotected"? In other words, why ignore insider attack risks within the security perimeter and only provide extra protection for external access? Unless, of course, the only real motivation for VPNs is to get around local or ISP network filtering policies! The idea of using a VPN server to protect clients within an enterprise has led to the terminology "upside-down" or "inside-out" VPN service.

Another twist on the isolation theme is the so-called DMZ (derived from "DeMilitarized Zone"). A DMZ network provides gateway connectivity between two different administrative domains. It may be a network segment that has both local and global network attachments to it, or it may refer to a server providing access to different domains.

In the next section we will consider the range of options between total isolation and totally unrestricted connectivity.

4.14 Degrees of Transparency

The term network transparency is used in multiple ways (e.g. network-transparent operating systems). Here we simply mean the polar opposite of isolation, specifically that there is an open, unobstructed path between communicating parties. However, it's not a binary choice; there can obviously be many degrees of openness or transparency, as a result of conditional or selective blocking (filtering). Of course, from a networking purist's perspective, filtering is all about degrees of intentional dysfunction :)

Here are some choices:

  1. Open: no impediments to full Internet connectivity
  2. Outgoing connections only (e.g. via NAT) "Unlisted number"
  3. Minimum protocol set: (see below)
  4. Closed: Internet access only via application proxies

For situations where a tight perimeter is desired, a "minimum protocol set" policy is in order. Here is one possible list:

We pity the poor researcher trying to get the latest peer-to-peer scientific collaboration tool working in this environment.

Of course, for an even tighter perimeter, the "minimum protocol set" policy could be combined with the "outgoing connections only" policy. Remember however that all perimeter defense caveats apply. In particular, you must assume that perimeter defense strategies are there to buy some time while applying the latest host patches and/or to reduce the noise in the IDS logs, and/or reduce bandwidth contention within the enterprise backbone during attacks. They should not be viewed as providing security, since there are too many ways for perimeter defenses to be bypassed, including VPNs, mobile computers, email, and all of the new apps that are migrating to port 80, just so the firewall doesn't get in their way.

4.15 Critical Questions

We close the Design Principles chapter with a set of questions that need to be considered when evaluating the tradeoffs of various security strategies.

 


5. Tradeoffs

5.1 Isolation Strategies

Here is a brief overview of tradeoffs for different isolation strategies. Some of these will be discussed in excruciating detail in subsequent sections.

A note about VLANs. Ethernet VLAN tagging can be used in many different ways, from simply multiplexing or separating traffic streams across a single point-to-point link, to constructing arbitrarily complex layer 2 overlay networks. Overlay networks in general are notoriously difficult to manage, since the typical topological-based tools are often inadequate for diagnosing problems in overlay networks. When VLAN overlay networks are extended to edge switches, the problem is even greater. One of the most powerful tools in problem analysis and diagnosis is systematic reduction of uncertaintly and ambiguity. What happens when debugging a problem where any Ethernet jack can be associated with any of multiple connectivity policies, by virtue of VLAN tagging? Technicians lose the simplicity of the network utility model wherein all ports are the same, and must worry about whether port labelling adequately describes the expected behavior of any given port. On the other hand, using VLANs for trunking different traffic classes or subnets within a network core does not introduce the same level of diagnostic uncertainty, although tools must still be made VLAN-aware in order to support such topologies.

VLANs can also be used to finesse some of the operational problems associated with subnet firewalls. When a conventional inline firewall is inserted between a router port and the switches used to distribute that subnet to individual wall jacks, the Network Operations Center is dependent on the firewall configuration to manage the downstream switches. This can be mitigated by using VLANs to provide a path to the switches for management that does not pass through the firewall. This allows to the NOC to manage the switches regardless of the firewall state. However, regardless of whether the firewall is in the path physically or via VLAN tagging, the troubleshooting demarc for connectivity problems becomes the firewall, unless the firewalls are managed by the networking folks.

5.2 Traffic Filtering Strategies

Criteria by which traffic can be blocked or filtered include:

We have already beaten to death some of the tradeoffs between host-based defense and perimeter defense. To review, here are some tradeoffs that apply to all network perimeter firewalls:

We have not previously focused on the tradeoffs between different topological choices; i.e. of various network perimeter policy enforcement points (PEPs) in comparison to host-based PEPs. For example:

As noted in the earlier section on Policy Enforcement Point definition, the effective (or logical) perimeter may not be determined by the topological location of the PEP, nor does the PEP necessarily coincide with where the effective policy is selected or determined.

Non-stateful blocking. One of the lessons of Slammer is that simple symmetric port-blocking is problematic for high-numbered ports. Slammer attacked Microsoft's SQL engine, a service listening on port 1434. An obvious defense was to block that port, but there was/is a "gotcha". Because 1434 is within the so-called "ephemeral" port range used by IP stacks to keep track of which host process should receive incoming packets, a simple block of all traffic sourced from or destined for port 1434 can block not only Slammer traffic, but legitimate traffic for many other services as well. The preferred approach is to block port 1434 traffic that is for flows not initiated by the hosts to be protected. This requires a stateful firewall that can keep track of whether the first packet in the flow was originated from inside or outside the defensive perimeter. Stateful blocking is also needed for providing an "unlisted number" semantic without using NAT.

5.3 Host-based (End-Point) Firewalls

Security axiom #3 is:

  • Firewalls are such a good idea, every computer should have one.

    But there are no panaceas in the security business, and the devil really is in the details.

    Our first line of defense is an up-to-date operating system, maintained by a centrally-managed configuration control system. Our second line of defense is a host-based firewall, again centrally managed --where "centrally" may be departmental or whatever span of control is appropriate for the organization. This may be implemented via operating system facilities for network access control, via integral firewall code, or via third-party firewall applications.

    All major operating systems today come with integral firewall capability, and soon they will all be turned on by default. The burning question is: when Windows XP Service Pack 2 is deployed (for the first time enabling XP's integral firewall by default), which applications will stop working, and how much load will be added to Client Services/Help Desk staff around the world?

    We can expect that there will be some problems, and that it will take several iterations for the OS vendors to refine this integral-firewalling concept. For example, in order to reduce the load on help desk staff, it is essential for the OS to inform the user in a simple way when an action they initiate conflicts with a security policy, and perhaps give them a way to temporarily suspend the policy (in an auditable way) if necessary to get a task accomplished. An outcome that must be avoided is when an application fails due to intentional security policy, but the user has no way of knowing that. One of the key advantages of host-based firewalls (done right), as compared to network/perimeter firewalls, is that the OS has the opportunity to detect such user-vs.-policy conflicts and communicate them to the user. This is much better than just leaving the user to think the network or application is broken, which is the standard consequence of perimeter firewalls.

    5.4 Firewall Matrix

    Comparison of Firewall Approaches

    
    EPFW = End-Point Firewall
    LFW  = Logical Firewall w/masquerading NAT  (See References)
    SFW  = Subnet Firewall
    BZFW = Border or Zone Firewall
    P172 = Project 172-phase III (Private addresses with NAT)
    
    
    		             IDEAL  EPFW   LFW    P172   SFW     BZFW
    
    Policy Enforcement Point?    Host   Host   Subnet Zone   Subnet  Zone
    
    Requires host reconfigure?   No     Yes    Yes     Yes    No     No
    
    Destroys E2E transparency?   No     No     No      No     Yes    Yes
    
    Can NOC manage net devices?  Yes    Yes    Yes     Yes    No*    No*
    
    User sees why app failed?    Yes    Yes    No      No     No     No
    
    NOC-Predictable semantics?   Yes    No     No      Yes    No     No
    
    Inherent "unlisted number"?  ?      No     Yes     Yes    No     No
    
    "unlisted number" possible?  Yes    Yes    Yes     Yes    Yes    Yes
    
    Constrains innovation?       No     Yes    Yes     Yes    Yes    Yes
    
    Adverse impact on internal
    network troubleshooting:     Low    Low    Med     Med    High   Low
    
    Adverse impact on external
    network troubleshooting:     Low    Low    Med     Med    High   High
    
    Size of vulnerability zone:  Small  Small  Med     Large  Med    Large
    
    
    * Can be mitigated by proper access lists and/or OOB connectivity
    
    

    5.5 Local Choice vs. OSFA

    A selective-connectivity-via local choice objective is in sharp contrast with either a One-Size-Fits-All (OSFA) "open network" strategy, or a OSFA "closed network" policy, wherein a single security policy is applied to all constituencies. While a single policy is desireable, it is evidentally unattainable in the R&E community without unacceptable collateral damage. If the single policy is for the network to be open, the burden for system security falls entirely on the system administrators, and/or local firewall administrators... and in a highly decentralized organization, this can be problematic. In contrast, if the single policy restricts connectivity, some constituencies discover that they must invest in costly mechanisms to work around the policy.

    Consequences of a "OSFA" perimeter blocking approach include:

    On the other hand, it is usually easier to manage a network with any single OSFA policy than one with a multitude of locally defined and/or administered traffic filtering policies enforced at random places in the network topology. Thus from a network management perspective, the local choice idea is a mixed bag. It adds complexity to the network, but preserves some, though not pervasive, open network connectivity as one of the options, especially useful for debugging. As noted in the previous section, the choice of "Policy Enforcement Point" can have significant operational impact --especially for implementing a locally-selected policy.

    5.6 Achieving Local Policy Choice

    For networks that are subdivided into zones, for different communities requiring different policies (e.g. residence halls, hospitals), rather than segmentation/partitioning for fault management, then it is possible to establish conventional perimeter defenses for each zone --with the obvious caveat that local autonomy for opting-out of the policy for that zone goes out the window (which may be considered a Good Thing, depending on one's perspective.)

    If it is OK to apply a "One-Size-Fits-All" policy to a network, or a network zone, then a border NAT box and/or a border firewall will do. However, our thesis is that providing local autonomy on security policy is a Better Thing (if it can be done without undermining network operations), and the more local the autonomy, the harder it is to implement at a large-perimeter enforcement point. A general framework for accomplishing the goal has two parts:

    This approach allows for each host to be independently associated with any one of several security policies. An easier problem to solve is to provide selective policies enforced at the border by subnet address range, rather than for each individual host. This is easier because it requires fewer rules in the border device. On the other hand, it makes it difficult for network support staff to debug the forwarding path (using a suitably configured device) independent of whatever traffic filters might be in the path.

    Thus, the question of how best to implement a flexible large perimeter enforcement point, to complement host-based strategies, depends on the capabilities of the border routers and/or firewalls. Modern high-end equipment is designed to support tens-of-thousands policy rules, thus raising the possiblity of policy choice at an individual IP level. Lesser equipment would fall over trying to cope with a fraction of that number of distinct rules, and limit the policy granularity to, at best, per-subnet rules.

    The strategy that is least demanding of advanced router capabilities is to allocate entirely distinct address ranges to a very small number of policies. For example, a filter strategy based on either a parallel IPv4 address range, or if extra IPv4 address space is not available, a NAT strategy leveraging private address space.

    At the chosen policy enforcement point in the network topology, a choice must be made: should the policy be implemented by the router or switch at that point in the topology, or should it be done by a separate device? If separate, the router/switch only needs to be good at routing or switching --although in contrast to a conventional in-line firewall design-- to achieve a flexible network architecture it is necessary for selected (but not all) traffic to be routed to the separate firewall (or NAT box).

    5.7 Authorization Strategies

    Should we be controlling access to the network, as well as controlling access to servers and the information on them? It depends on whether usage by unauthorized individuals is likely, and what the consequences of such use would be. The consequences could be "theft of service" or initiation of DDOS attacks via the institution's network, or launch of an attack against the institution's own servers from inside any potential perimeter defense. Indeed, reliance on perimeter defenses increases the need for network authorization, as do wireless access points and usage or subscription-based accounting for network access.

    On the other hand, network access control adds complexity for both users and system operators, especially when the network administrative domain differs from the user's computer support domain. The question for this discussion is whether or not network access control (i.e. requiring user authorization before connectivity is permitted) can enhance security. As noted, if an enterprise relies on perimeter defense as a primary host protection strategy, then controlling access to the internal (inside the firewall) network is essential. However, it's clear that authorized users can operate dangerous (infected) computers, and user authorization (by itself) does nothing to help that problem.

    Motivations for network access control:

    1. Accounting for use of resource
    2. Preventing outbound attacks
    3. Preventing inbound attacks

    The outbound attack concern is arguably less serious for fixed office computers, but is significant for uncontrolled labs and especially for wireless-enabled networks. The severity of the inbound attack scencario is a function of how important perimeter defense is in protecting the end hosts. Yet another reason why perimeter defense should be a secondary defense-in-depth strategy, and not primary.

    Users of mobile/wireless devices are somewhat tolerant of logging into the network infrastructure before they can use the network, even if they must login again to their workgroup servers. Users of fixed office computers are less so. This problem can be mitigated in homogeneous environments where there is a single authentication domain and operating systems with network-level login support. For heterogeneous and decentralized environments, as most universities are, the technical barriers and user acceptance barriers are much higher.

    Captive portal strategies, wherein the first use of one's web browser redirects to an authentication service, are commonly used for wireless hotspots and hotels, and have been used successfully for controlling access to university networks as well. The advantage of captive portals is that they work with any client; the disadvantage is that if the user doesn't use their web browser first, e.g. starts a mail program first, it looks like the network is broken. Another disadvantage is that the wireless access policies must permit, without prior restraint or authentication, whatever traffic is necessary to allow successful login to the computer and initiation of a web browser.

    The 802.1x authentication framework promises to provide a standard method for controlling access to network ports, wired and wireless, however it is not yet supported by all current OS versions, much less the legacy versions commonly found in university networks. There are also non-standard extensions required to control access on a per-user basis for wired networks, and agreement needed on which of many different authentication methods will be used within the 802.1x framework. Once the dust settles on these issues, edge switches will need to be upgraded or replaced to take advantage (as well as back-end services deployed, such as RADIUS bridges to Kerberos infrastructure). Conclusion: 802.1x has promise for homogeneous/centralized enterprises now, and for the rest of us in the future, but is not yet a no-brainer.

    5.8 Accountability Strategies

    An important ingredient in any security strategy is encouraging users to exercise safe computing practices --especially making sure their (self-managed) systems get patched promptly. One can imagine institutions providing incentives for good behavior, or --more likely-- sanctions for bad behavior. Some possible examples include:

    One version of this approach has been discussed recently in higher-ed circles: when a system is disconnected from the network because it has been infected, perhaps there should be a stiff fee for reconnection.

    At a recent lecture, Scott Charney (a senior security guy at Microsoft) spoke of personal accountability in the context of computer security and drew a parallel to tobacco use. When it became widely understood that second-hand-smoke could injure innocent bystanders, individuals' personal freedom began to be legally constrained. The idea was that you can kill yourself, but you can't kill other people. Scott wonders whether the same thing will happen in computing: a poorly managed host is a threat to others, so will those who fail to manage/control their computers become liable for harming others if their machine is infected?

    How important is individual accountability in improving security? Evidence suggests that it can make an enormous difference if the incentives are significant, and in this country that usually involves money. Consider the Cornell experience: early in 2003 they moved from flat-rate network pricing for their campus to usage pricing for external bandwidth. An unanticipated side-effect of that change was that hosts started getting patched much more aggressively. Users, especially students, were concerned about vulnerabilities that might lead to exploits which would generate more network traffic, and thus increase their monthly bill for network services. IT staff at Cornell report that there has indeed been an astonishing reduction in the number of vulnerable systems at Cornell as a result of their change in funding strategy.

    5.9 NAT: Network Address (and Port) Translation

    NAT is controversial. Originally invented to lessen the need for global IPv4 addresses, NAT fluorishes today as much for its security characteristics as for address conservation. NAT is controversial because it breaks certain applications, especially when combined with PAT (Port Address Translation) to permit a large number of hosts to share a single global/public IPv4 address.

    Opponents of NAT consider it a self-imposed Denial Of Service attack, but without the users knowing what they are giving up. They fervently hope that IPv6, with its giant address space, will eliminate any hint of address scarcity, and with it, any reason to use NAT.

    The counter opinion is that IPv6 will not kill off NAT (even if IPv6 is deployed in a way that lets individuals and organizations easily obtain ample provider-independent addresses). The reason: a side-effect of NAT is that machines "inside" the NAT perimeter are (mostly) hidden from external view, so that the usual address/port scans by attackers will fail to discover a computer behind a NAT box. This is more than "security through obscurity". If an attacker can't send packets to your host, the attack will fail. This benefit of being hidden to outsiders offsets the disadvantage of not being able to use certain applications, especially peer-to-peer applications that require global visibility of the end-hosts.

    It is probably fair to say that most users do not understand that NAT causes problems for some apps, but it is also fair to say that millions of users are happily living behind NAT boxes, either in corporations or at home via residential gateways. It is not clear that, given an informed choice, those users would prefer the open network option at the expense of NAT's hidden hosts and "no incoming TCP connections" security benefits.

    As with firewalls, NAT boxes vary in quality. A key parameter is how long the internal state for a connection is maintained before an outgoing packet must be seen in order to refresh the state. If that interval is too short (as it is on some commercial NAT boxes) then it will be impossible to maintain long-lived network connections without the applications having keep-alive messages flowing at very short intervals.

    5.10 Achieving Asymmetric Connectivity

    An alternative way to provide protection similar to NAT is via stateful firewalls that can be configured to only permit outbound connections. One tradeoff between the two strategies concerns the ease of allowing an unobstructed connection when needed. Firewall-based strategies may not permit an individual host to "choose" an open configuration, unless a specific firewall exception is created --a strategy that does not scale well. A NAT approach may permit a host to be configured locally for either open or asymmetric access, by choice of either a globally-routed or an internal/private address for that host.

    What, then, is best way to achieve "unlisted number" asymmetric connectivity semantic? The current choices are:

    As discussed above, option 1 (NAT) has the side-effect of breaking certain applications. Option 2 reduces the load on routers by allowing policy-routing decisions based on a totally separate address space, rather than individual /32 IPv4 addresses for option 3. Modern high-powered routers combined with high-end firewalls and an infrastructure to permit a self-service way of opting into a particular security policy for each host is now an achievable goal, though not an inexpensive one.

    5.11 Device Validation

    Independent of the question of authenticating user access to the network, but often linked together, is the question of validating the state of the device about to connect to the network. In particular, is it a device that is a) not likely to be a threat to others, and b) likely to be safe from network exploits? The idea is to limit a host's connectivity until its status can be successfully verified. For example, a newly-connected device might be isolated to a "quarantine" network, or its user might not be able to get past an enterprise authentication service until device integrity is validated.

    This is a conceptually attractive idea, but not without problems. For example:

    Commercial products exist today for implementing this idea, at least partially. For example, Microsoft Windows Server 2003, acting as a VPN server, can verify that connecting systems have all necessary patches before a VPN connection is permitted.

    Testing random devices for vulnerabilities before connecting them to an enterprise network is "harder than it looks", but some known signatures can be detected. Another challenging issue concerns the desktop systems that are connected for a long period of time. There must be a time interval after which validation of the device is repeated. Captive portal strategies can work reasonably well for initial connection scenarios (after training everyone to use the web before any other apps). For persistent connection scenarios, the system must periodically re-validate. If the result of that step is to quarantine the connection, how is the user notified? These concerns have led Ron Johnson (VP of Computing & Communications at UW) to propose an approach where device validation occurs whenever a user connects to the enterprise web authentication service. Feasibility of this strategy is still being evaluated, but it offers the promise of communicating directly with the user during the web authentication dialog, without requiring special OS software (e.g. 802.1X support.)

    5.12 Pre-auth vs. Post-containment

    The challenges of pre-validation (before connection is permitted) lead to comparison of that strategy against one of post-connection validation. In the post-audit scenario, there is clearly a greater risk that an infected machine can attack others until it is detected and isolated, but we need to ask how big a deal that really is. In a network computing environment where perimeter defense is the primary protection strategy, this is a Very Big Deal. On the other hand, in a quasi-open network environment where the focus is on keeping end-systems network-safe, the probability of infection from a device newly-connected to the enterprise network must be compared against the probability of infection from a random host anywhere in the Internet. In that case, it may be sufficient to rapidly identify and quarantine an infected host after connection. However, the credibility of this approach depends on the credibility of efforts to ensure that enterprise hosts are properly managed and patched.

    5.13 Encryption: End-to-End or Not?

    Half of the IT managers I meet from industry believe that end-to-end encryption is an important tool for protecting the integrity of information. The other half consider it the work of the devil, since it means they lose visibility into, and therefore control over, the nature of the traffic flowing over their networks.

    Within higher-ed, opinion is also divided (though perhaps less equally split, since university IT managers generally seem less desirous of being network traffic cops than their counterparts in industry). My own view is that the debate is largely moot, because end-to-end encryption is already being used by "unapproved" applications. The only issue is whether or not "approved" apps will get the benefits of end-to-end protection.

    Also, to the extent that perimeter defense strategies are deployed, use of VPN tunneling will be necessary --if not end-to-end, then over a growing fraction of the network topology. From an IT oversight perspective, secure application protocols offer one advantage and one disadvantage compared to end-to-end encryption at the network layer, e.g. IPSEC-ESP. App-level encryption protocols do not mask the application type, so perimeter policy controls based on IP port number (app type) can still function (not the case with IPSEC encapsulated traffic). On the other hand, IPSEC is transparent to user apps and provides information protection even for apps/protocols that are themselves insecure. My conclusion: end-to-end encryption is inevitable for unauthorized apps, so network content control is an illusion; thus we should celebrate its benefits, and use it to protect the Good Stuff. IPSEC is a good thing if you have the PKI, Active Directory, or Kerberos infrastructure in place to establish the needed trust fabric.

    5.14 Thin Clients

    The idea of thin clients is to reduce operational cost by replacing complex, insecure, hard-to-manage desktop computers with simple graphical terminal devices requiring minimal individual support, but dependent on multi-user servers to run their apps. This is essentially the modern equivalent of dumb terminals connected to time-sharing systems. The value proposition is that it's much cheaper to manage a multi-user system for dozens of people and dozens of nearly-maintenance-free terminal appliances, than support dozens of conventional desktop PCs. The security relevance is that the simple appliances without hard disks are expected to be immune to the threats that afflict normal PCs.

    However, there are thin clients, and there are thin clients. At a minimum they all seem to do away with a local hard drive, but beyond that, some of them can be pretty "thick". For example, here are several variants:

    In addition to remote screen display, a thin client may be able to do some things locally, e.g. web browsing, clock display, etc. Each of these options has different tradeoffs. The windows-based solutions are presumably most compatible with Microsoft's implementation of RDP, although the open source linux version of RDP is being used successfully by some. The linux-based solutions, however, can't run Internet Explorer locally, so if you have web-based apps built with gobs of active-X controls, you are either locked into the MS Windows choices, or forego the efficiencies of running a local web browser. In that case, the picture looks like this:

        web services  <--->  multi-user web-client system <---> thin client
    

    Within the MS category, the security differences may be huge between a winCE based device and an embedded XP based device. Indeed, if the majority of desktop support costs revolve around Windows (in)security these days, then it is not likely that the "thin" clients based on embedded XP are going to achieve their low TCO objective. (Recent empirical data confirms this conclusion: lots of support problems resulted from a large number of embedded-XP "thin" clients got infected and the process for upgrading/patching XP-in-Flash proved to be problematic.)

    5.15 Intrusion Detection/Containment

    Intrusion Detection Systems (IDS) are notorious for having a poor signal-to-noise ratio. That is, there have historically been a lot of false positives reported, which triggers wasteful firedrills.

    Intrusion Prevention Systems (IPS) use an IDS to identify an incoming threat and then shut it down. This can be a self-contained action if the IPS is in-line, or by sending control info to a network device that is in-line with the attack, e.g. dynamically adding a firewall rule or router access list. The BRO system developed by Vern Paxon at LBL is a prime example of an IPS. One caveat with IPSs in general is the unintended consequences of a false positive are more than just extra noise in a log file; they become self-imposed Denial Of Service attacks. The more specific the threat signature being watched-for, the lower the false positive rate --but that also means the system is only as good as the known-threat database. Statistical methods that compare baseline data against current data are more likely to find previously unknown attacks, at the expense of more false positives.

    A variation on the IDS/IPS theme is to focus on attacks originating from within the institution. Identifying and then isolating hosts that are emitting attack traffic can be both a Good Neighbor policy and a prudent backup for local security policy enforcement. While prevention is preferable to remediation, rapid containment of infected hosts minimizes collateral damage from the infected host.

    In order to achieve the "minimum collateral damage" goal, detection and isolation/containment must be automated. One approach: tools to examine router Netflow data for known attack traffic, and then inject null routes for the host's IP address into the routing infrastructure.

    Other IDS strategies include honeypots, honeynets, and sinkholes. All of these are intended to look like perfectly good targets for attackers, but are actually ways to validate whether other security defenses are working, and in the case of sinkholes, to attempt to divert attack traffic away from production systems. By careful monitoring, they also serve as an early warning system for previously unknown attacks.

    5.16 Redundancy

    Redundancy is a fundamental tool for improving system availability. It relates to security in a couple of ways. First, security incidents represent an important cause of system failures and one would like to have alternative paths/services that work around the attack-induced failures. Second, security mechanisms such as firewalls may contain transient state information. In order to avoid security devices becoming a single-point-of-failure (SPOF), it is necessary to consider the complexity of preserving state across redundant elements, vs. the implications of temporary loss of the transient state info. Redundancy can also make troubleshooting outages more difficult, since the true cause of a problem may be masked by the alternate path or system. Thus there is increased demand for sophisticated monitoring tools that can discern fundamental failures being masked by redundancy.

    5.17 Support Impact

    Mechanisms to improve security often have unintended side-effects. For example, automated detection and containment mechanisms can, when a widespread vulnerability such as MS RPC is exploited, result in enormous backlogs for Network Ops and Security Ops staff. After all, each system that has been isolated (either via null-routing or shutting down the Ethernet port) must be fixed and verifed before full connectivity is restored.

    Self-service. One way to ease the load on staff is to provide a self-service method of restoring connectivity. A web-page offering self-scan options and then a re-enable button can work very well if the system has been null-routed or quarantined, but still has some local network connectivity. Shutting down the Ethernet port, necessary for some exploits such as slammer due to its collateral damage potential, is not compatible with this network-based self-service strategy.

    Intentional network failures. Automated containment/isolation of infected systems looks to the user like the network has failed. Communicating with users in these cases is important, and much easier if they have partial connectivity to internal support servers and/or email. Captive portal strategies for controlling access to the network have the same undesirable property of looking like a broken network, unless the user remembers to use a web browser before any other application. Needless to say, these "intentional" connectivity failures as a consequence of security policies can represent a significant problem for support staff.

    Where the type of infection requires disabling the port, making decontamination tools and patches readily available off-line (e.g. on a CD-ROM) is a good idea, but doesn't necessarily reduce the load on the NOC or Security Operations, since users don't know why their computer doesn't work, so they (as usual) assume the network is broken and call for help.

     


    6. Credo

    6.1 Past

    In mid-2000, we developed a Network Security Credo that described our basic principles and approaches to the problem of securing computer systems. The gist of the message was:

    Looking at the security problem from the perspective of computer, information, and network infrastructure protection, the approach was as follows:

    Some key points were that: Thus, the first line of defense was to manage the host; the second was to add local perimeter defense as needed, preferably in a way that did not destroy the "network utility model". Toward that end, a Logical FireWall strategy was presented (see References). LFWs preserve some level of "network utility" transparency; in particular, with LFWs in use for local perimeter defense, a technician can still debug the forwarding infrastructure without the local security policies affecting the test results.

    The year 2000 UW Network Security Credo was a stunning success except for one thing: Hardly anyone embraced it. Indeed, the percentage of UW's 70,000 hosts that are under positive central configuration management today still appears to be alarmingly low.

    According to an article on the BBC News website, Microsoft's David Aucsmith has stated that few, if any, security exploits have appeared before a corresponding OS patch was available --meaning that a pervasive and agressive host management and patching regime would have made 2003 a non-event, security-wise. And as previously noted, a Microsoft paper on XP SP2 admits that having had the XP firewall turned on by default would have made a big difference in containing attacks such as Blaster.

    Largely because of the failure to aggressively manage hosts at UW, there is a much stronger need for perimeter defense than would otherwise be the case. While many departments are using logical firewalls successfully, units with large numbers of statically addressed computers have found the prospect of reconfiguring all hosts to work with LFWs to be daunting, resulting in pressure for transparency-busting inline firewalls.

    In addition, 2003 security incidents revealed a surprising number of insecure devices that cannot be fixed, and thus require some form of external perimeter protection. This is especially true in medical centers where applying security patches is said to invalidate FDA certification, but also applies to certain printers or other devices that have faulty code in ROM. While the Credo anticipated this possibility, and asserted that such devices needed to be protected behind something that was network-safe, we expected that a handful of individual firewalls, combined with lab or server sanctuary firewalls, would take care of the problem. Instead, we saw quantities and geographic diversity such that medium or large-scale perimeter defense would be the only realistic way to protect them.

    Finally, we saw during the Blaster attacks that in an open network environment with a Mean-Time-To-Infection of just a few minutes, it was pretty hard to bring a new system up and get it patched without finding it was already compromised. Protection for systems before they were ready to confront life in the hostile Internet became a significant issue!

    6.2 Post Mortem

    A strong case can be made that failure to embrace the principles of the original Network Security Credo (especially failure to embrace group/central management of hosts) is the reason that the attacks of 2003 were so troublesome for so many departments at the UW... especially since those who did embrace them had a much better time of it. Lack of a positive computer management regime is also the principal reason that an open network policy is under extreme duress, and why those of us who believe passionately in the simple, manageable, open network utility model are in despair. However, there are other reasons why even open-network purists are reconsidering enterprise perimeter defense strategies:

    Now, some three years after it was written, it's fair to ask how recent events should affect that Credo. We remain unrepentant in arguing that every effort should be made to ensure that hosts are network-safe, because many attacks are transmitted via machines within the enterprise perimeter, or connected from "outside" via VPN servers --which effectively extend the enterprise border to include machines that can become a "attack gateways" for threats from "outside". We are encouraged by the fact that those who embraced the Credo and put their hosts under positive central configuration management (with good security policies) have fared very well with respect to security problems.

    Unfortunately, since systems under positive group configuration management are still the exception (not the rule) in our environment, and in deference to some of the other issues listed above, the role of perimeter defense must be reexamined. The status quo of a completely open enterprise network combined with largely-unmanaged hosts is not acceptable in the increasingly hostile Internet world. On the other hand, causing hosts to become managed is not a technical problem, and has proven to be a very difficult one to solve.

    Thus, the principal failure of the UW Network Security Credo was that it underestimated the institutional barriers to getting computers under positive configuration management, preferably with integral firewalling enabled. That problem remains today, in spades. Absent evidence of progress on this front, we conclude that a single open-network policy is problematic for UW at this point in time.

    We also know that a One-Size-Fits-All closed border policy would cause hardship for those who are already "taking care of business" and tightly managing their hosts. So, even if we could provide a choice of "open" or "closed" networking at the border (to avoid the OSFA problem), is such an effort worthwhile, or is it already "overtaken by events"? In particular:

    6.3 Lessons

    We know that:

    Our approach going forward is based on the following observations:

    Because offering only an "open network" environment for all units doesn't seem to be wise at this point, and because some ISPs have abandoned the open network philosophy, we are forced to accept that the traditional Internet utility model (i.e. the assumption of an open, transparent Internet connection) is indeed dead. However, we have not abandoned the other basic principles of the Credo. While lamenting the loss of network transparency and the impact of that loss on network operations and Mean-Time-To-Repair, we can still try to protect the "needs of the few" in a world where the "needs of the many" may be well-served by a (semi-) closed network environment.

    One way to allow open networking for those who need it is to provide a parallel network infrastructure. However, that can be expensive. The key change needed to protect the "needs of the few" within a single universal network is to embrace "flexible networking" alternatives to support both units that desire an open network and those who --for whatever reason-- have insecure hosts they must accommodate. We also want to provide additional options for providing defense-in-depth, especially when it comes to protecting critical services, such as those involved with patient care, and revisit the question of perimeter protection in the context of network infrastructure protection. The trick is to do this without making it even harder to support the network.

    6.4 Redux

    A revised Network Security Credo should recognize the following:

    6.5 New options

    A key question is whether the local choice goal can be achieved without some of the constraints of the Logical FireWall and P172 solutions, and without adding undo complexity in the network core --such complexity further undermining the goal of operating a high-availability network service.

    The original Credo focused on how to have security while preserving the virtues of an open Internet utility. Having failed in that mission, and recognizing that more extensive perimeter defense usage is both unavoidable and appropriate (necessary though definitely not sufficient), the new focus must be on minimizing and mitigating the collateral damage caused by network perimeter defense --both in terms of creativity/innovation and operations/troubleshooting. This is especially true while OS vendors are still shipping unsafe systems, and successful attacks occur without the attack vector being readily apparent.

    In other words, a new credo must either make the case that the original was essentially correct and focus on new incentives for implementation, or it must identify some new options for embracing widespread perimeter firewall deployment that are a) more easily implemented than our existing "open-net preserving" perimeter defense approaches, and b) less detrimental to our supportability and innovation objectives than conventional approaches.

    Of course, it's also possible that in another 3-5 years we'll look back and conclude that we had it right the first time --that conventional perimeter defenses really were useless against next-gen attacks, which will rely on clever email and web-based social engineering to get malicious code inside the perimeter, then use encryption on pervasively-used ports to hide their nefarious activities. Not to mention the false-sense-of-security those perimeter defenses will give --notwithstanding the fact that the 2003 Blaster attacks should have put to rest forever the claim that a border firewall makes you safe.

    Previously we have disparaged large-perimeter firewalls in part because they represented an inflexible OSFA policy solution (and also because of their inherent large vulnerability zone). Now, having lost the battle for the open net utility, and given some operational experience with subnet firewalls under departmental control, the challenge of diagnosing network problems in such an environment makes more evident the virtues of a single large-perimeter policy enforcement point. But in the past that has usually meant sliding into the dreaded OSFA policy pit. Can we combine the virtues of local choice with the virtues of a (small number of) large perimeter policy enforcement points? That is the goal of the "flex net" strategy outlined later, and represents one of the key departures from the earlier tome.

    Technologies change along with threats and circumstances. Routers completely adequate for forwarding packets at an enterprise border may not be capable of providing policy enforcement --much less selective enforcement of multiple policies. State-of-the-art routers can do this, and therefore enable the flexible networking concepts not previously considered.

     


    7. Recommendations

    7.1 Strategic Best Practices

    1. Properly managed hosts! Including:
      • Controlled host configuration (centrally/group managed), and/or
      • Host-based firewalls (centrally/group managed)
      • Regular vulnerability scanning
    2. Defense in depth to avoid "single point of (security) failure"
      • Device protection, at local/cluster perimeters
      • If needed, additional device protection (via network topological perimeters)
      • Information protection (via encryption at multiple network layers)
    3. Flexible network architecture, featuring multiple security policy enforcement options, e.g.
      • Local firewall or NAT options
      • Global "flex-net" policy options
      • Constrained connectivity (isolation) for specific network segments
    4. High-availability network architecture, featuring redundant elements and selective network isolation/segmentation by service or zone, e.g.
      • Residence halls
      • Patient monitoring systems
      • Real-time process control systems
      • VoIP backbone
      • WiFi elements
      • R&D/test backbone
    5. Real-time automated response to attacks
      • Inverted IDS to identify errant hosts within the enterprise
      • auto-blocking/isolation of errant hosts
      • Tools for quickly identifying host owner/operator
    6. Network infrastructure protection

    What follows are specific approaches that the UW's Computing and Communications organization has embraced to improve the security and availability of network-based systems.

    7.2 Host-based Host Protection

    This is the cornerstone. There is still no adequate substitute for making the hosts network-safe. This effort is arguably not a sufficient strategy, but it is a necessary one. We use and recommend two approaches: The very successful Nebula desktop management system is proof that host security can be achieved in an open network environment. Alas, managing hosts is not cost-free, so adoption of this paradigm has been slow.

    Some units that are not able to do full desktop configuration managment have had good success with centrally-managed host-based firewalls. They particularly like the IDS features.

    7.3 Perimeter-based Host Protection

    These options are to provide defense-in-depth, and to protect devices that have no hope of being network-safe themselves. In a few cases we have arranged for conventional inline firewalls to be deployed by departments at the subnet level, but this is not considered a good general solution, as it shifts the network troubleshooting demarc from the wall jack to the firewall. It also requires modern infrastructure if done in a way that permits the NOC to manage downstream switches.

    Efforts are currently underway to implement the flexible-networking architecture described previously, wherein a choice of several policies could be offered to individual hosts, or at worst, to individual subnets. This is an extension of the Project 172 concept of permitting a host to be in either an open or closed network (for some value of closed). In the flex-net scheme, the proposed Policy Enforcement Point would be the campus border, using upgraded border routers to direct filtered traffic to associated (but separate) policy-enforcement devices. This should permit a low-impedance open network option to exist for applications needing extremely high-bandwidth (higher than the associated filter rule boxes could accommodate.)

    7.4 Information Protection

    This is about protecting information as it is transmitted over the network. Our approaches include:

    7.5 Network Protection

    This is about protecting the network infrastructure itself: the routers and switches, plus the various support servers for DNS, DHCP, NTP, network management, etc. Under investigation:

    7.6 Network Segmentation (fault zone minimization)

    In some cases an isolation strategy corresponds to a user population --though this is increasingly difficult to do-- and in other cases it corresponds to a type of network service. For some of these segments, a "closed" --or at least, "less open"-- network policy is appropriate. Examples of segmentation/isolation strategies in the University of Washington network include:

    Application design plays a key role as well. Apps that are (wisely) designed with the assumption that portions of the network will occasionally fail, and will tolerate such failures, are highly desired.

    7.7 Redundancy Strategies

    This is about reducing user-visible outages (i.e. improving availability) via redundancy.

    7.8 Rapid Response

    This is about reacting to attacks to quickly identify and isolate exploited hosts.

    An Intrusion Detection System (IDS) seeks to alert network operators to an attack in progress. An Intrusion Prevention System (IPS) goes one better, and attempts to block the attack in progress. The achilles heel of most IDS products is a low accuracy rate which either misses attacks, or equally troubling, identifies non-existent attacks. For an IPS the latter is particularly annoying because a false positive in an IPS results in a self-imposed denial-of-service against a legitimate user or system.

    The signal-to-noise ratio of an IDS can be improved if the tool is focused on recognizing current threat signatures from within the enterprise. For example, trying to rapidly identify a slammer or blaster infected host within an organization can be done with pretty high confidence, and thus automatically shutting down network connectivity to the offending devices is a viable strategy. However, there are some gotchas. For example, if there is an aggregation device (VPN concentrator, web proxy server, wireless access point, logical firewall, local switch, etc.) between the infected host and the network instrumentation, the system may shut off whatever edge port is associated with the attack flow --even if lots of other hosts are shut down at the same time.

    When infected hosts are detected, but the threat to the network infrastructure and other hosts is not deemed extreme, the problems cited above can be mitigated by null routing traffic to and/or from the infected host. This protects the rest of the Internet and hopefully gets the attention of the user, while allowing access to local patching servers to repair the problem --which would be impossible if the local Ethernet port were disabled.

    In order to reduce the load on security and network operations' staff, UW has implemented a "self re-enable" capability for hosts that have been null-routed. The user is directed to a web page where they can download patch and clean code, and then re-enable their Internet access. (Up to three times --after that, they have to call for help.)

    Traceability. Rapid reaction strategies depend on both network port and user identification. The latter is important if the user is connected via wireless or dialup and there is no physical port dedicated to that user's machine. Some advocate device registration in order to have a name associated with every machine, but our current preference is to correlate authentication service logs with network address mapping databases, since this provides up-to-date information on who is using the computer. The system can also distinguish serial reuse situations, as in a lab.

    7.9 Server Sanctuaries

    The purpose of "server sanctuaries" is to provide the best possible environment for protecting critical or sensitive servers. The idea is to locate such servers where there is:

    7.10 Vendor Engagement: Security-related Router Requirements

    This is about conversations with router vendors on what features we need to move forward.

    7.11 Other Activities

    Under investigation: Many of these ideas have obvious merit; the question is whether their advantages outweight their disadvantages and/or costs.

     


    8. Conclusions

    8.1 Nostalgia isn't what it used to be

    Times change; not always for the better. While still infinitely useful, the Internet has become a very dangerous place. In response, the technical character of the Internet and network computing has also changed and will continue to do so. Given that human nature doesn't seem to evolve on the same timescale as technology, should we be surprised that the Interconnected Cyber Commons are a victim of their own success?

    Clearly, the open Internet (or Internet utility model) of yester-year really is dead and buried. Whether you are an end-user, a system manager, or a network operator, you can no longer make any assumptions about the transparency of the network path between a pair of arbitrary end-points. The best we can now hope for is a set of local practices that will permit efficient network troubleshooting and the ability to provide users with a transparent network path when needed. Or at least an easy way for users to determine whether a given application has a snowball's-chance-in-you-know-where of working. (Ken Klingenstein calls my proposed protocol for accomplishing this goal "Terry's Transparency Tester".)

    Belatedly, the era of unmanaged/autonomous computers in enterprises is also coming to an end. The fact that MS RPC exploits quickly penetrated many firewalled sites (e.g. via VPN connections or mobile laptops) was an object lesson that underscored the importance of trying to make computers network safe, even when rigorous network perimeter defenses were in place.

    It is because of the failure of OS vendors to ship network-safe systems "out of the box" that individual unmanaged PCs have become such a liability, that IT organizations have had to spend enormous resources on security, and that --most depressingly-- the days of the open Internet utility have come to an end. In their (slight) defense, it can be said that Microsoft sold what customers asked for, and that was convenience, not security. Nevertheless, the world would be a very different place today if Windows XP had shipped in 2001 with its integral firewall turned on instead of off by default.

    8.2 Key Lessons

    Some of the big lessons here are that:

    8.3 Key Predictions

    8.4 Never Say Die

    For Internet veterans and purists, the current picture is bleak, but the essence of the Internet, as many have observed, is to route around blockages, barriers, obstructions, and setbacks. Accordingly, we choose to end on a note of optimism. There are important choices still to be made, and if we make the right ones, we may be able to forestall some of the more grevious outcomes.

    Final thoughts:

    These principles of "local choice, local control, local complexity, local containment" will not reverse the trend toward closed networks, but by providing mechanisms to support multiple policy options, we can avoid the one-size-fits-all policy syndrome, and may help avert unfair cost shifting within institutions. The alternative is to have only closed nets, and endless disputes over policy exceptions. These strategies may represent our best chance to protect the needs of the few while quite properly tending to the needs of the many.

     


    9. References

    SALS Workshop and I2 Meeting

    Presentations

    Related Papers

    News articles

     


    TEG HOME