How to Find Vulnerabilities in Code: Bad Words
Overview
This is the first of several posts about how to find vulnerabilities in code. At a high level, the process looks like this
- find dangerous functionality
- find a path from input you control to that dangerous functionality
- craft input to the program that makes it misbehave
We’ll start with the first part: how to find dangerous functionality. In my experience, 80% of the bugs are in about 20% of the code. If anything, it’s more like 90 / 10. Since you usually need to understand the code thoroughly to find novel vulnerabilities, deciding which 20% to focus on is critical. One way I do this is by focusing on clusters of “bad words”.
I’ll show you what I mean with a quick story. Recently during a red team operation, I was looking through a massive terraform repo, when I came across this seemingly innocent line:
driver.raw_exec.enable = 1
I wasn’t even trying to find vulnerabilities at this point, but that line leapt off the screen. I had to stop and figure out what it was doing. Turns out, it was configuring a job scheduler called Nomad from Hashicorp. Like any diligent red teamer, I immediately reached for the Nomad docs. There, I found an ominous warning:
“this allows you to run jobs with no isolation; disabled by default for security reasons.”
My heart raced, as I quickly assembled the simplest Nomad job I could muster. A few minutes later, I had root access to the cluster and a gold mine of credentials. An early end to the operation was in sight.
Before this, I had never even heard of Nomad, let alone this configuration option, so what was it about that line that made me want to dig deeper? It was the combination of 2 security bad words hovering around the same section of code, namely “raw” and “exec.” These bad words help you hone in on the most security-critical sections of code, so you can leverage your attention effectively.
Common Bad Words
raw
Raw implies you’re accessing a lower level abstraction. This becomes a problem when your security controls are enforced at a higher level, allowing users of this “raw” interface to bypass them.
Examples:
-
CAP_NET_RAW is a linux capability that allows you to create raw sockets and use them to bypass typical process isolation restrictions.
-
The raw_exec driver in Nomad allows you to create jobs that run outside containers with the permissions of the nomad agent.
-
Many ORMs have a rawQuery or rawSQL method that allows you to execute a query directly. The queries generated by the ORM are generally not injectable, but it’s up to the user to prevent SQLi when using the “raw” interface.
eval | exec | run
Combining user input with code written in a dynamic language (Javascript, SQL, bash, etc) is usually a recipe for injection attacks. Attackers can submit code as input, causing the interpreter to misbehave. Running this code is often called “executing”, “evaluating”, or “running”.
Examples:
-
raw_exec
runs a Nomad job without isolation -
conn.cursor().execute(sql)
runs a SQL query in many python database drivers -
exec(code)
is a python method that runs code passed to it -
eval(code)
is a function provided by many dynamic languages, like Javascript that runs the code you pass it. Python has an eval function too but it’s only for expressions.
*This one will return a lot of false positives because as Steve Yegge predicted, it seems like every verb is being turned into a noun with a run(), execute(), or justDoIt() method.
process | system | popen | exec | spawn
These words can indicate the creation of a child process. If the child process spawns a shell, you might be able to inject shell commands. Even if it calls the execve syscall directly, you can still add / modify arguments to the program.
Examples:
- the
subprocess
module in python - the
child_process
module in node - the
os/exec
package in golang - the
os.system
method in python - the
popen
module in ruby
privilege | permission | capability | role | rbac | policy | authorization | claims
These words will help you find the code responsible for granting privileges to users, containers, processes, files, EC2 instances, etc. Use any highly privileged entities to do your bidding or even bypass authz entirely.
Examples:
-
The docker –privileged flag gives the container functional root privileges on the host.
-
the linux kernel split root user privileges into “capabilities” that you can assign to a program, allowing it to do things like create raw sockets, debug processes you don’t own, or bypass file ACLs.
-
Kubernetes uses an api extension called RBAC (Role Based Access Control) to authorize access to k8s resources
-
Many cloud providers use the term “role binding” for granting a principal a set of permissions
-
JWTs have “claims” that tell the consumer about the privileges of the user, and consumers verify them with functions like jwt.ParseWithClaims
reflect | klass | constantize | forName
Many programming languages let you look up functions, classes, methods, variables, etc. by their names (and even instantiate / invoke them). This is commonly known as “reflection”. If a user can control the name of a method that gets invoked, or a variable that gets returned, they can potentially cause the program to misbehave.
Examples:
-
the
Reflect
object in Javascript -
the ruby
String#constantize
method -
the java
Class.forName
method -
klass
is a common variable name for classes looked up via reflection (because “class” tends to be a reserved word)
pickle | yaml | serialize | marshal | objectinput
These words indicate that a program might be deserializing data using a format that supports complex objects. This can allow an attacker to read files, send HTTP requests, and even execute arbitrary code, depending on the serialization format and which objects are available to the runtime (classes on the JVM classpath, packages on sys.path in python, etc.).
Examples:
-
python’s
pickle
format -
the
node-serialize
package -
most YAML parsers
-
Java’s
ObjectInputStream
-
php’s
unserialize
function
parse | open | request
These words can be interesting for the same reasons as eval() and friends: attackers can input metacharacters recognized by the parser in question to alter its behavior. The main difference is rather than running code in a dynamic language, you’re leveraging parsers to get access to resources like files, or URLs.
Examples:
-
controlling input to URL parsers can result in SSRF, bypassing proxy restrictions, off-by-slash vulns, etc.
-
controlling input to file path parsers can result in LFI, RFI, and local file reads / writes.
unsafe | insecure | dangerous
Occasionally, API developers like to call attention to dangerous APIs by including “insecure” or “unsafe” right in the name.
Examples:
-
unsafe {}
blocks in Rust -
InsecureSkipVerify
in Go’s TLS package -
dangerouslySetInnerHtml()
in React -
the
unsafe
package in Go
todo | fixme | xxx
As code evolves, developers add comments to remind themselves to implement features, fix bugs, or clean up some code they don’t like. Sometimes these comments can lead you to important bugs, missing features, etc. that you can exploit.
Examples:
-
One time, I found a todos.txt file in the web root of an Apache server. It contained a lengthy list of unpatched security vulnerabilities.
-
Another time, I found a FIXME comment that mentioned a performance problem. It turns out this was a very difficult to find but trivial to exploit ReDoS vulnerability.
merge | clone
These words usually indicate that an object, dict, map, etc. is being merged with another or cloned into a new object. This can result in interesting security issues like Javascript prototype pollution vulnerabilities, mass assignment vulnerabilities, etc.
Examples:
-
_.merge
in LoDash -
_.clone
in LoDash
alloc | free
This is a good clue that manual memory management is occurring. This is notoriously difficult to get right and can result in vulnerabilities like buffer overflows, use-after-frees, double frees, etc.
Examples:
malloc()
free()
- The
[object alloc]
message in Objective C
AES | RSA | DSA | DES | CBC | ECB | HMAC | GCM
These are cryptographic primitives and can indicate that the authors are rolling their own crypto system instead of using a higher level abstraction. There are many subtle ways to use these insecurely, so read carefully and consult a cryptographer.
Examples:
-
aes.NewCipher(key)
-
new RSAPrivateKey(keyBytes)
-
HMAC.new(secret, digestmod=SHA256)
JWT | JKS | JWK | JKU …
JSON Web Tokens are a standard for transferring data securely and are very commonly used in modern application stacks. There are many ways to use them insecurely, so it’s worth paying attention to code dealing with JWTs.
Common JWT issues:
- the none algorithm
- manipulating the alg header
- not verifying the aud or iss claims
- not verifying the validity period (exp and nbf claims)
- signing but not encrypting sensitive data
Examples:
JWTVerifier
jwt.ParseWithClaims
jwt.verify
password | private | token | secret | key | Authorization
These words are good indicators that you might have some secrets hard coded into the repository, like API keys, database passwords, encryption keys, etc.
Examples:
- BEGIN RSA PRIVATE KEY
- AWS’s “secret access key”
- Django’s
SECRET_KEY
setting
validate | verify
These words usually indicate business / security rules are being enforced. Examine these closely for input that passes the validation but could also result in a vulnerability. The types of input they are trying to ban can also give you clues about potential vulnerabilities.
Example:
app.get('/signup', (req, res) => {
// verify! this probably means that only users with certain
// emails are allowed to sign up. I wonder what it's
// verifying?
if (!verifyEmail(req.body.email)) {
res.send('unauthorized');
return;
}
register(req.body.email);
res.redirect('/dashboard');
});
// looks like it's verifying the email belongs to a user
// on company.com. can you think of a way to make this return
// true without having a @company.com email address?
//
// what about will@company.com.btlr.dev?
function verifyEmail(email) {
return email.includes('@company.com');
}
XML | xerces | SAX | etree | xpath | DocumentBuilder
Parsing attacker-controlled XML can lead to a number of security problems ranging from local file reads to denial of service attacks.
Examples:
DocumentBuilderFactory.newInstance();
SAXParserFactory.newInstance();
xml.etree.elementtree