Input Validation - Crunchy on the Outside By Alex T
It seems that every category of security flaw lists "Input Validation" as one of the solutions. Check the OWASP Top Ten or the CWE/SANS Top 25. There is even a free software library from OWASP to do input validation, and functionality built into frameworks such as Apache Tapestry.
So, what is it, and why is it such a hot topic, even though there are libraries that solve it?
Consider these well-known exploit techniques: buffer overruns are exploited by sending an oversized chunk of binary data to a program when it's expecting something small; SQL injection is done by submitting an SQL code fragment where a simple string is expected.
In both cases the system fails to recognize invalid input, and is compromised when it tries to process it.
The solution?
Definition
Input Validation: Examining all inputs to the system, and refusing to process them unless they conform to expectations.
This definition may work for a manager or architect, but to a coder is just a reminder of good coding practices. So let's get concrete; input validation is using rules to check your input, such as, naively:
- A phone number may only contain numerals, spaces, '(', ')' and '+'
- An email address must match the grammar in RFC 822
- A name can't be 5k long with nulls.
Great, now where do they go in the code? Clearly they should apply as soon as possible, because the longer data goes unvalidated the more potential there is for pwnage in the system internals. Input validation is the crunchy exterior protecting the squishy interior.
Squishy Interior. That's interesting.
Going back to our first examples: you fix buffer overruns by doing bounds checking immediately before copying data; you prevent SQL injection by properly escaping query parameters. You don't deal with them by doing perimeter checking for oversized input or unexpected quotes. So, input validation doesn't actually fix anything.
So, why bother with it then?
There are several good reasons:
- An extra safeguard. All software has flaws (except perhaps TeX) and input validation can stop them becoming security breaches.
- Preventing DoS (Denial of Service). Processing bad input can be expensive, even when it's handled correctly.
- Usability. Immediate clear reports about what's wrong with your input are more useful than obscure error messages with stack traces.
Excellent, so we need input validation, but the story's just beginning. Next time we'll explore techniques for input validation: whitelisting, blacklisting, and sanitizing.



