5 injection vulnerabilities hackers don't want developers to know about (and how to prevent them)
There are specific topics in security (e.g., cryptography) that only security-focused engineers typically need to worry about. However, there are a few topics that every developer should understand because it’s so easy to introduce vulnerabilities into web applications otherwise. Injection vulnerabilities fall into this category.
The most widely-known forms of injection vulnerabilities are SQL injection (SQLi) and cross-site scripting (XSS), but the same pattern shows up in a variety of places you might not think of. I'll cover the general practice first, and then go over a few specific types of injection vulnerabilities and how to avoid them.
What is an injection vulnerability?
An injection vulnerability arises whenever untrusted input can alter the semantics of an operation. Untrusted input means any input an attacker might be able to control. This includes anything in an incoming HTTP request (URL, headers, body), responses from third-party APIs, and previously stored untrusted inputs (e.g., user data read from a DB), among others. If you're not sure whether to trust some data, it's best to assume it's untrusted.
Parsers and untrusted inputs should never meet.
Example parsers are:
browser rendering raw HTML
SQL database parsing a query
REST API handler routing a request based on path, query params, or body
Shell (e.g., Bash) interpreting a command string
OS resolving a file path to a specific file on disk (e.g., interpreting slashes and `..` to have specific meanings within a path string)
The best protection against injection vulnerabilities is for the semantics of an operation to be determined before untrusted input ever enters the picture. For SQL injection, this means using prepared statements or an ORM to define queries. For XSS, this means inserting untrusted input into the client-side DOM only as specific element text properties (or using frameworks like React that do this for you unless it's explicitly disabled via dangerouslySetInnerHTML).
In cases where it's not possible to statically define an operation without the untrusted input (e.g. when returning rendered HTML from the server), the untrusted input must first be sanitized. The rules for sanitizing an input (i.e., which characters need to be escaped and how they should be escaped) depend on where that input will be used. It's notoriously difficult to write effective sanitizers, and it's best to assume that a skilled attacker is better at circumventing your sanitizer than you are at thinking of every possible edge case.
OWASP maintains a lengthy XSS Filter Evasion Cheat Sheet just to give you an idea of the difficulty involved in writing a sanitizer.
Input Sanitization
Try very hard to avoid the need for a sanitizer by keeping untrusted input away from parsers. It's worth investing resources to do this whenever possible.
Use a mature, widely distributed, and thoroughly tested sanitizer when (1) isn't possible. Writing your own sanitizer is asking for trouble.
Structure your code such that sanitization happens in as few places as possible, and write tests to make sure problematic inputs are sanitized as expected. If every line in your codebase that does a SQL query or other sensitive operation has to call a sanitization function, someone will eventually forget to do that!
Injection Methods
SQL Injection (SQLi)
SQL injection vulnerabilities happen whenever untrusted input is inserted into a query without proper sanitization. A popular example is xkcd's Little Bobby Tables. Consider the query:
db.query(`SELECT * from students WHERE first_name = '${request.first_name}'`)
When the code above interpolates Little Bobby Tables's first name Robert'; DROP TABLE Students;--
into the query, the query becomes:
SELECT * from students WHERE first_name = 'Robert'; DROP TABLE Students;--'
SQL treats --
and anything after it as a comment, so this is a valid SQL query. However, the query will now drop the entire table!
To avoid SQLi, use prepared statements such as:
SELECT * from students WHERE first_name = ?
With a prepared statement, the SQL DB engine parses the query before user input is ever passed to it. When the prepared query is executed with specific parameters, those parameters have no way to alter the semantics of the query, avoiding an injection vulnerability.
A typical ORM similarly separates input values from query semantics and usually avoids injection issues.
Cross-site Scripting (XSS)
XSS is a complex topic I'll only touch on briefly here. Like other injection vulnerabilities, XSS arises when untrusted input can be injected (into HTML, in this case) prior to parsing/rendering. The untrusted input can come from server request parameters such as the query string (reflected XSS), previously-stored values (stored/persistent XSS), or manifest entirely within the client (DOM-based XSS).
XSS is effectively a client-side remote code execution (RCE) vulnerability in which an attacker can insert their own JavaScript into a page. That JavaScript can access anything within the page's origin, including cookies and local storage that may include sensitive secrets such as session tokens. The attacker's code can also make HTTP API requests from the page, which will include even HttpOnly cookies. Essentially, the attacker can do anything the page can do.
The most robust protection against XSS is to use a client-side framework like React that lets you intuitively render untrusted inputs on a page without risking XSS. Be sure to avoid using dangerouslySetInnerHTML, which bypasses XSS protections.
Sanitization is particularly difficult for protecting against XSS since the sanitization rules are context-dependent (i.e., the rules for escaping untrusted inputs in a <div>
body, input.value
property, or <style>
body are all different). If you need to insert untrusted input into raw HTML, use a well-tested sanitizer such as DOMPurify.
Setting a strong Content Security Policy without unsafe-inline
or unsafe-eval
in the script-src
or default-src
directive is an effective defense-in-depth measure to prevent modern browsers from executing attacker code even if the attacker is able to insert <script>
elements into the page.
HTTP API injection
RESTful APIs, GraphQL, and other HTTP-based APIs are ubiquitous. When a web application makes an API call to another service, injection vulnerabilities are possible when that request includes untrusted input.
Consider a contrived example in which a web app integrates with a payments service that has a REST API endpoint for creating a subscription: POST /subscriptions/{product_id}?price_usd=<price>
where price_usd
is optional, and a pre-configured price is used if omitted. If an attacker controls the value of product_id
and passes a value of desired_product_id?price_id=0
, the web app would end up making a request to POST /subscriptions/desired_product_id?price_id=0
, which would allow the attacker to sign up for a free subscription.
In JavaScript, the standard way to sanitize untrusted inputs in URL paths is encodeURIComponent, which replaces problematic characters such as ?
and /
with safe percent-encoded sequences. When inserting untrusted input into URL query parameters, new URLSearchParams(queryParams)
provides a convenient, safe interface for building a query string from a JavaScript object of key-value pairs.
Shell injection
Backend APIs sometimes need to execute external commands on the machine where they run. Consider an API that performs WHOIS lookups for a requested domain by executing the whois
command locally.
Consider the following vulnerable Node.js code:
const whois = child_process.execSync(`whois ${whoisRequest.domain}`);
If an attacker can pass the domain reddit.com && rm -rf /
, the backend will execute the command whois reddit.com && rm -rf /
. The child_process.execSync
function passes the command string to the shell (/bin/sh
by default on Linux), which parses && rm -rf /
as a subsequent command to wipe the filesystem.
To avoid this issue, never pass untrusted input to a shell. Instead, use an interface such as child_process.execFileSync
that executes a specific binary (which shouldn't be a shell!) and passes arguments as an array:
const whois = child_process.execFileSync("whois", [whoisRequest.domain]);
Now, even if the user passes a domain reddit.com && rm -rf /
, that entire string will be passed as the command-line argument to whois
, which will exit with an error but will not cause any harmful side-effects. Perhaps an even better solution would be to use a library to perform WHOIS queries without needing to execute a separate command.
Astute readers may point out that validating the domain against a regex would also likely prevent shell injection in this case. However, avoiding the possibility of shell injection by using a safe interface that keeps untrusted input away from a shell's command parser is a more robust solution that avoids shell injection in all cases.
Path traversal
Finally, a path traversal vulnerability arises when an untrusted input is inserted into a filesystem path, which can cause the wrong file to be read or even written. Consider a backend API that reads a file at the path /teams/${team_id}/${report_name}.csv
. If an attacker controls the value of report_name
but not team_id
, they could pass a report_name
of ../other_team_id/private
. This would cause the file /teams/team_id/../other_team_id/private.csv
(resolved to /teams/other_team_id/private.csv
) to be read, leaking data from a different team.
To avoid path traversal vulnerabilities, never use untrusted input in file or directory names. It's safest always to control the names of files and directories, including IDs that you generate and control (e.g., UUIDs, KSUIDs, etc.). If the name of a file or directory absolutely must be derived from untrusted input, consider hashing it (e.g., using SHA-256) or at least encoding it into a format that doesn't include dots or slashes (e.g., URL-safe base64).