How to Find Vulnerabilities in Code: Bad Words

2020-09-19

featured , offsec , code-review

lock

Overview

This is the first of several posts about how to find vulnerabilities in code. At a high level, the process looks like this

find dangerous functionality
find a path from input you control to that dangerous functionality
craft input to the program that makes it misbehave

We’ll start with the first part: how to find dangerous functionality. In my experience, 80% of the bugs are in about 20% of the code. If anything, it’s more like 90 / 10. Since you usually need to understand the code thoroughly to find novel vulnerabilities, deciding which 20% to focus on is critical. One way I do this is by focusing on clusters of “bad words”.

I’ll show you what I mean with a quick story. Recently during a red team operation, I was looking through a massive terraform repo, when I came across this seemingly innocent line:

driver.raw_exec.enable = 1

I wasn’t even trying to find vulnerabilities at this point, but that line leapt off the screen. I had to stop and figure out what it was doing. Turns out, it was configuring a job scheduler called Nomad from Hashicorp. Like any diligent red teamer, I immediately reached for the Nomad docs. There, I found an ominous warning:

“this allows you to run jobs with no isolation; disabled by default for security reasons.”

My heart raced, as I quickly assembled the simplest Nomad job I could muster. A few minutes later, I had root access to the cluster and a gold mine of credentials. An early end to the operation was in sight.

Before this, I had never even heard of Nomad, let alone this configuration option, so what was it about that line that made me want to dig deeper? It was the combination of 2 security bad words hovering around the same section of code, namely “raw” and “exec.” These bad words help you hone in on the most security-critical sections of code, so you can leverage your attention effectively.

Common Bad Words

raw

Raw implies you’re accessing a lower level abstraction. This becomes a problem when your security controls are enforced at a higher level, allowing users of this “raw” interface to bypass them.

Examples:

CAP_NET_RAW is a linux capability that allows you to create raw sockets and use them to bypass typical process isolation restrictions.
The raw_exec driver in Nomad allows you to create jobs that run outside containers with the permissions of the nomad agent.
Many ORMs have a rawQuery or rawSQL method that allows you to execute a query directly. The queries generated by the ORM are generally not injectable, but it’s up to the user to prevent SQLi when using the “raw” interface.

eval | exec | run

Combining user input with code written in a dynamic language (Javascript, SQL, bash, etc) is usually a recipe for injection attacks. Attackers can submit code as input, causing the interpreter to misbehave. Running this code is often called “executing”, “evaluating”, or “running”.

Examples:

raw_exec runs a Nomad job without isolation
conn.cursor().execute(sql) runs a SQL query in many python database drivers
exec(code) is a python method that runs code passed to it
eval(code) is a function provided by many dynamic languages, like Javascript that runs the code you pass it. Python has an eval function too but it’s only for expressions.

*This one will return a lot of false positives because as Steve Yegge predicted, it seems like every verb is being turned into a noun with a run(), execute(), or justDoIt() method.

process | system | popen | exec | spawn

These words can indicate the creation of a child process. If the child process spawns a shell, you might be able to inject shell commands. Even if it calls the execve syscall directly, you can still add / modify arguments to the program.

Examples:

the subprocess module in python
the child_process module in node
the os/exec package in golang
the os.system method in python
the popen module in ruby

privilege | permission | capability | role | rbac | policy | authorization | claims

These words will help you find the code responsible for granting privileges to users, containers, processes, files, EC2 instances, etc. Use any highly privileged entities to do your bidding or even bypass authz entirely.

Examples:

The docker –privileged flag gives the container functional root privileges on the host.
the linux kernel split root user privileges into “capabilities” that you can assign to a program, allowing it to do things like create raw sockets, debug processes you don’t own, or bypass file ACLs.
Kubernetes uses an api extension called RBAC (Role Based Access Control) to authorize access to k8s resources
Many cloud providers use the term “role binding” for granting a principal a set of permissions
JWTs have “claims” that tell the consumer about the privileges of the user, and consumers verify them with functions like jwt.ParseWithClaims

reflect | klass | constantize | forName

Many programming languages let you look up functions, classes, methods, variables, etc. by their names (and even instantiate / invoke them). This is commonly known as “reflection”. If a user can control the name of a method that gets invoked, or a variable that gets returned, they can potentially cause the program to misbehave.

Examples:

the Reflect object in Javascript
the ruby String#constantize method
the java Class.forName method
klass is a common variable name for classes looked up via reflection (because “class” tends to be a reserved word)

pickle | yaml | serialize | marshal | objectinput

These words indicate that a program might be deserializing data using a format that supports complex objects. This can allow an attacker to read files, send HTTP requests, and even execute arbitrary code, depending on the serialization format and which objects are available to the runtime (classes on the JVM classpath, packages on sys.path in python, etc.).

Examples:

python’s pickle format
the node-serialize package
most YAML parsers
Java’s ObjectInputStream
php’s unserialize function

parse | open | request

These words can be interesting for the same reasons as eval() and friends: attackers can input metacharacters recognized by the parser in question to alter its behavior. The main difference is rather than running code in a dynamic language, you’re leveraging parsers to get access to resources like files, or URLs.

Examples:

controlling input to URL parsers can result in SSRF, bypassing proxy restrictions, off-by-slash vulns, etc.
controlling input to file path parsers can result in LFI, RFI, and local file reads / writes.

unsafe | insecure | dangerous

Occasionally, API developers like to call attention to dangerous APIs by including “insecure” or “unsafe” right in the name.

Examples:

unsafe {} blocks in Rust
InsecureSkipVerify in Go’s TLS package
dangerouslySetInnerHtml() in React
the unsafe package in Go

todo | fixme | xxx

As code evolves, developers add comments to remind themselves to implement features, fix bugs, or clean up some code they don’t like. Sometimes these comments can lead you to important bugs, missing features, etc. that you can exploit.

Examples:

One time, I found a todos.txt file in the web root of an Apache server. It contained a lengthy list of unpatched security vulnerabilities.
Another time, I found a FIXME comment that mentioned a performance problem. It turns out this was a very difficult to find but trivial to exploit ReDoS vulnerability.

merge | clone

These words usually indicate that an object, dict, map, etc. is being merged with another or cloned into a new object. This can result in interesting security issues like Javascript prototype pollution vulnerabilities, mass assignment vulnerabilities, etc.

Examples:

_.merge in LoDash
_.clone in LoDash

alloc | free

This is a good clue that manual memory management is occurring. This is notoriously difficult to get right and can result in vulnerabilities like buffer overflows, use-after-frees, double frees, etc.

Examples:

malloc()
free()
The [object alloc] message in Objective C

AES | RSA | DSA | DES | CBC | ECB | HMAC | GCM

These are cryptographic primitives and can indicate that the authors are rolling their own crypto system instead of using a higher level abstraction. There are many subtle ways to use these insecurely, so read carefully and consult a cryptographer.

Examples:

aes.NewCipher(key)
new RSAPrivateKey(keyBytes)
HMAC.new(secret, digestmod=SHA256)

JWT | JKS | JWK | JKU …

JSON Web Tokens are a standard for transferring data securely and are very commonly used in modern application stacks. There are many ways to use them insecurely, so it’s worth paying attention to code dealing with JWTs.

Common JWT issues:

the none algorithm
manipulating the alg header
not verifying the aud or iss claims
not verifying the validity period (exp and nbf claims)
signing but not encrypting sensitive data

Examples:

JWTVerifier
jwt.ParseWithClaims
jwt.verify

password | private | token | secret | key | Authorization

These words are good indicators that you might have some secrets hard coded into the repository, like API keys, database passwords, encryption keys, etc.

Examples:

BEGIN RSA PRIVATE KEY
AWS’s “secret access key”
Django’s SECRET_KEY setting

validate | verify

These words usually indicate business / security rules are being enforced. Examine these closely for input that passes the validation but could also result in a vulnerability. The types of input they are trying to ban can also give you clues about potential vulnerabilities.

Example:

app.get('/signup', (req, res) => {
    // verify! this probably means that only users with certain
    // emails are allowed to sign up. I wonder what it's
    // verifying?
	if (!verifyEmail(req.body.email)) {
    	res.send('unauthorized');
        return;
    }
    
    register(req.body.email);
    
    res.redirect('/dashboard');
});

// looks like it's verifying the email belongs to a user
// on company.com. can you think of a way to make this return
// true without having a @company.com email address?
//
// what about will@company.com.btlr.dev?
function verifyEmail(email) {
	return email.includes('@company.com');
}

XML | xerces | SAX | etree | xpath | DocumentBuilder

Parsing attacker-controlled XML can lead to a number of security problems ranging from local file reads to denial of service attacks.

Examples:

DocumentBuilderFactory.newInstance();
SAXParserFactory.newInstance();
xml.etree.elementtree