This article is a working draft.
Questions and comments are eagerly
solicited by dors@cac.washington.edu.
Making Web Servers Versatile
What makes a food processor so versatile? Is it the accessories and blades? Or perhaps the motor? It is arguably neither. The versatility comes from a canny little interface between the motor and the accessories. As long as an accessory clicks into the interface, the motor will drive it, allowing you to knead dough, slice carrots, and whip cream all with the same kitchen appliance.
Web servers and food processors share very little in common, except a similar design concept, extensibility, which allows them both to add new features easily. In the case of many Web servers this extensibility owes itself to a simple interface known as the Common Gateway Interface (CGI).
Processing form submissions, querying databases, generating Web pages on-the-fly, these are all tasks that can be added to a Web server through the use of CGI. It is a lot like the food processor and its ability to use different accessories. A separate computer program is written to perform some task not already supported by the Web server. As long as the program fits the rules of CGI, the server can invoke it, thus adding to the Web server's capabilities. Such a program is known as a CGI program.
Adding Versatility Increases Risk
Web servers are designed primarily to respond to requests for files. They do this well: quickly and securely. Most Web servers undergo rigorous reviews to insure their efficiency and security. By supporting CGI, Web servers gain their versatility but also open themselves up to possible security vulnerabilities. Because, while CGI itself is secure, each CGI program may or may not be secure and, therefore, must be scrutinized in the same manner as the server itself.
Sanitizing Input Reduces Risk
A typical CGI program takes input and acts on it. The first rule of writing secure CGI programs is not to trust this input. It is not always known where the input originates or what it contains. Assumptions can lead to security vulnerabilities. A CGI program is said to have a "hole" in it if it can be exploited to run arbitrary commands on the computer running the Web server. Most holes are exploited by hiding commands within the CGI program's input.
A carefully written CGI program will scan its input to make sure each character in it is safe. This is known as sanitizing the input. It can be done a couple of ways. First, dangerous characters can be removed from the input. This requires an exhaustive list of all dangerous characters. A better approach--one that errs on the safe side--is to define what characters are safe, such as alphanumeric characters, and ignore everything else. It is not trivial either way. The programmer must know what characters present problems within the context of each CGI program.
Where, specifically, is unsanitized input dangerous? Where it can gain control of the step-by-step instructions of a CGI program. For example, the biggest "gotcha" in CGI programming occurs when unsanitized input is passed to a shell interpreter. If the input contains special shell meta- characters, additional commands can be embedded in the input and the unsuspecting shell will execute them as a sequence of instructions. This kind of hole can be plugged by locating where a CGI program might invoke a shell and sanitizing any input passed to it. Again, not a trivial matter.
Tip for Perl Programmers: The system, exec, open, and eval functions, as well as backquotes, all invoke shells. Sanitize any input originating outside your program (i.e. that comes from the Web) that is passed to one of these functions.
Does it really matter?
Why does any of this matter? Why would anyone exploit a hole in a CGI program in the first place? For several reasons: to steal, modify or destroy information on a Web site, to disrupt or deny service to a Web site, or simply to gain a foothold on the Web server in order to access other resources. Keep this in mind the next time you incorporate a CGI program into your Web site.