Exploitable Path Analysis

Exploitable Path – Advanced Topics

Alex Livshiz — Wed, 24 Mar 2021 07:58:14 +0000

This is the third and final blog on Exploitable Path – a unique feature that allows our customers to prioritize vulnerabilities in open-source libraries. In the first blog, we introduced the concept of Exploitable Path and its importance. The conclusion was that a vulnerability in a library is considered exploitable when:

The vulnerable method in the library needs to be called directly or indirectly from a user’s code.
An attacker needs a carefully crafted input to reach this method and trigger the vulnerability.

In the second blog, we discussed some of the challenges in developing such a feature, and our unique approach. Mainly:

Using a query language over the CxSAST engine for the abstraction of queries over source code. This allows a more language-agnostic approach, so that Exploitable Path works for every programming language supported by CxSAST.
We walked through the various CxSAST queries that are required to build a full call graph of a user’s source code and its libraries’ source code. By crossing it with vulnerability data, we can know if a vulnerability is exploitable or not.

In this last blog in the series, we will cover more advanced topics we faced during the development of Exploitable Path.

Challenge no. 1 – Supporting Multiple Library Versions

The public data on a CVE usually contains affected versions, but how can we use this information to support Exploitable Path across versions? Meaning, if the source code of a library changes between various versions, how can we have the required data for Exploitable Path for each of those versions?
Let’s assume we have a user’s source code that uses a single open-source library. This library contains a vulnerability, and using Mitre, we can figure out the affected versions.
To be able to assess if the vulnerability is exploitable, we need the following for each version on the library:

A call graph of the library’s code. This can be done automatically using CxSAST.
Is the current version vulnerable?
- If it is, the inner method in which the exploitation occurs is required.

Now the question is, “how can we find this inner method for each vulnerable version”? Going over each version manually is not practical, especially since a library can have hundreds of versions.
The first part of the solution is to find the inner method that’s vulnerable. Usually, a vulnerability goes together with a specific method (or methods) that are responsible for a certain logic. Pull requests and commits for the relevant CVE, help our Analysts uncover the relevant method.
Next, we generate a fingerprint of the fix – if a version contains the fix, we can mark it as not vulnerable to this CVE. This is where our powerful static code analysis tool comes into play again, making it easy to re-assess hundreds of library versions for the vulnerability.
Re-assessing the affected versions of a vulnerability is crucial. As it turns out, this data on public websites like Mitre is often not precise. Versions that are marked as vulnerable can be safe and vice versa. It can be the result of human error, or even a slight difference in the version tags between the public registry and the git repository on which the library is developed. By searching for the fingerprint of the fix, we can ensure the quality and accuracy of our vulnerabilities data.
Using the in-depth analysis process, the vulnerable method is marked for every affected version, eventually resulting in a very accurate Exploitable Path scan.

Challenge no. 2 – Data Flow

Just because your code calls a vulnerable method, that doesn’t mean you are automatically at risk. To assess the risk properly (and avoid false positives), it’s crucial to have both a call graph and a DFG (Data Flow Graph) of a code to assess its exploitability
Let’s start with an example, and assume that a method called parse(content) has a DoS (Denial of Service) vulnerability given the right input. If parse() is only called with a constant value, meaning parse(CONSTANT_VALUE), there is no attack surface for an attacker to exploit it and cause a DoS. On the other hand, if a user of the application controls the input parameter of parse(), it’s a different story. For example, this input can be a comment or other data provided by the user. In such a case, the attacker can easily exploit the vulnerability and craft the required input.
The reality is more complex, as there are various ways data can be transferred in code:

Input parameters
Global or class members
The return value of another method invocation

Also, not all data options are necessary for exploitation. For example, a method parseRequest(HttpRequest request, Config config) can be vulnerable for exploitation using only the HttpRequest.Content member in the request parameter.
Now we understand the importance, but how do you incorporate DFG in the process of assessing a vulnerability? To be more specific, how can we know that a vulnerability is exploitable from a data flow point of view?
First, we use CxSAST to build a DFG. We start at the vulnerable method and trace back the origins of data point. Eventually we’ll reach one of the following cases:

A constant value. This is not exploitable, of course.
An input parameter of a method that is not called by other methods. This is a potential data flow compromise, as in the context of the static code scan, we don’t know how the method is invoked.
An internal method of the language is called, such as fopen() in Python.
A method of a different library is called, and its source code is not available.

The last two cases are the most interesting ones, and have two complementary approaches:

As a rule of thumb, mark those methods as a potential for data flow compromise since the inner implementation is unknown.
Mark specific methods as definite data flow compromises. For example, reading contents from a database pipe file. The same goes for parsing HTTP packets, pulling a message from a message queue, etc.

These two approaches are the basis for DFG support in assessing a vulnerability for exploitability.

Summary

In this blog we covered two additional advanced topics in Exploitable Path. We started with the problem of supporting various library versions, and how this is solved using the in-depth analysis process. Then, we discussed the integration of DFG in the vulnerability evaluation process, and how to backtrack the flow of data in the code.
With CxSCA, Checkmarx enables your organizations to address open source vulnerabilities earlier in the SDLC and cut down on manual processes by reducing false positives and background noise, so you can deliver secure software faster and at scale. For a free demonstration of CxSCA, please contact us here.

Exploitable Path – How to Solve a Static Analysis Nightmare

Alex Livshiz — Wed, 03 Feb 2021 15:10:36 +0000

In my previous blog, I walked you through the reasoning and importance of the Exploitable Path feature in the Checkmarx CxSCA solution. We discussed the challenges of prioritizing vulnerabilities in open source dependencies and defined what it means for a vulnerability to be exploitable:

The vulnerable method in the library needs to be called directly or indirectly from a user’s code.
An attacker needs a carefully crafted input to reach the method to trigger the vulnerability.

Now that we know the scope of the problem, let’s dive into how uncovering an exploitable path is done.

Prerequisites

1. A SAST Engine

Every programming language has its set of quirks and features. Some use brackets; some don’t. Some are loosely typed; others are strict. To be able to develop an Exploitable Path, we needed a certain level of abstraction for example, a “common language.” This is particularly hard when high level concepts like “imports” behave differently across languages.
To solve this issue, Checkmarx uses its powerful CxSAST engine. CxSAST breaks down the code of every major language into an Abstract Syntax Tree (AST), which provides much of the needed abstraction. Imports, call graphs, method definitions, and invocations all become a tree.

2. An AST Query Language

Having an AST, the next step is having a query language capable of even further abstractions. Checkmarx uses CxQuery that can run queries to answer various questions, for example:

What are all the import statements in a codebase?
Which methods have no definition but only usage?
What’s the namespace of every file?

With a tool like CxQuery, you can get results in a unified format regardless of the programming language, such as, C#, Java, Python, etc.

Assumptions

1. Vulnerable Methods Are Known

Usually, the public data on a CVE provides a CVSS score, affected products, and versions, etc. However, the inner method in which the vulnerability is triggered is usually unknown. To help with this dilemma, the CxSCA Research Team has application security analysts on board who are responsible for analyzing CVEs and finding the method in which the vulnerability occurs. So, for the rest of the post we can assume that for every CVE, we know the method that triggers it.

2. A SAST Scan Is Limited to One Project

You can think of a project as a folder containing all source code without the third-party package’s code. This makes life easier since there’s a clear distinction between a user’s code and the dependency’s code.
For example, in case there’s a user code that requires a single third-party package, two scans can be made:

A scan on the user code.
A scan on the third-party package.

Static Analysis Steps

Now that we’ve covered the prerequisites and assumptions, let’s understand the challenge itself by looking at the following example, written in Python.
Here’s a simple code, importing an open source library and calling a method in it. This method in turn calls a vulnerable method.

The code of OSLib will be:

Here are the steps:

1. Find Unresolved Methods in User’s Code

The user code is parsed with CxSAST and a query is run to detect all methods that are called and are missing a definition – hence unresolved and belong to a third-party package. In our example, there are two calls:

foo() – is defined in the user code and hence resolved.
lib_foo() – is defined in OSLib and hence an unresolved method must be imported.

In our case, there’s a single import to OSLib, so it’s obvious where the method was imported from.
Usually, there will be multiple imports, in which case a signature of the method is collected and searched across imported libraries. Assuming the code is functional and works, there will always be a single match.

2. Find Exported Methods in Package Code

The code of package OSLib is also parsed with CxSAST, and a query is run to find all exported methods. In languages like C# and Java, an exported method is a public method in a public class that can be used by the user’s code. In Python, all methods are public so the exported methods in our example will be lib_foo() and inner_vuln_method().
This data is essential since it’s used to match unresolved methods in the step above.

3. Call Graph

A query for a call graph is run on both user’s code and package code.
For the user’s code, the graph is:

For the package code, the result is similar:

4. Find Exploitable Path

Using all the data collected so far, a full call graph is built:

All methods in the graph are checked for exploitability. In our example, inner_vuln_method() is the exploitable method, and so an Exploitable Path is found.

Further Topics

The example above provided a simple demonstration of how Exploitable Path is analyzed, but in reality, this problem is much harder. Some other research questions we faced, which are not discussed in this blog post, are:

Detecting Exploitable Path in a dependency of a dependency
Matching challenges between user’s code and package code
Integration of DFG (Data Flow Graph)

Summary

By using CxSAST with queries written in CxQuery, we created an abstraction layer to statically detect vulnerabilities that are exploitable. A single algorithm can detect Exploitable Path across multiple programming languages, and unlike other solutions on the market, CxSCA can easily extend support for more languages. Currently, Java and Python are already supported, with many more languages to follow.
With CxSCA, Checkmarx enables your organizations to address open source vulnerabilities earlier in the SDLC and cut down on manual processes by reducing false positives and background noise, so you can deliver secure software faster and at scale. For a free demonstration of CxSCA, please contact us here.

In the next post in this series, we’ll look at some of the challenges we faced as we developed the Exploitable Path feature.

Software Composition Analysis: Why Exploitable Path Is Imperative

Alex Livshiz — Wed, 20 Jan 2021 07:20:38 +0000

If you look at the way code is written today vs. a few years back, one of the major changes is the transition to open source. What was once considered an unsafe methodology has grown and matured, and now almost every software project uses open source libraries. Today, software engineers prefer to use existing open source code instead of writing everything themselves.

Open source code’s benefits are significant

Code development can be faster:
It’s now more about welding existing pieces together, rather than building them yourself. Open source libraries solve fundamental engineering problems, allowing engineers to focus their time on more complex tasks.
Tools like package managers make it easy to manage and add third-party dependencies:
Every programming language or IDE comes with an integrated package manager support.
Over time, the way APIs are exported and used becomes clearer and simpler:
Open source maintainers offer clear APIs, simple documentation, and code samples.

Every new technology has its risks, though, and attackers can exploit weak points in software that uses open source. An attacker can gain information about open source libraries used by an application, and in other cases, can simply maintain an arsenal of exploits for popular open source packages and attempt to use these until one succeeds. In the case of open source packages, attackers have full access to:

Its code, which they can scan for zero-day vulnerabilities.
Issues and security tickets that are managed on GitHub, GitLab, etc., which can help find vulnerable areas for exploitation.
Current and past vulnerabilities, which can be very helpful when the library in use is not up to date. These vulnerabilities have detailed descriptions and advisories, and even the patches themselves are open source. An attacker can utilize those vulnerabilities and attempt to attack the application, and if the library uses an old version, the attack will succeed.

To manage such risks, a software composition analysis (SCA) tool such as Checkmarx CxSCA detects your third-party libraries and versions in use and informs you of existing vulnerabilities. It’s important to recognize that not all libraries in a project may apply since some may not be in use.

Prioritizing Vulnerabilities

Tracking existing vulnerabilities is important, but it’s not enough. The average project has dependencies that in turn have their own dependencies. Overall, there can be hundreds or thousands of libraries with hundreds of vulnerabilities in your project.

Nowadays, solving those vulnerabilities can take lots of time, while developers need to put efforts into developing new features as well. Managing security vulnerabilities of third-party packages is often not a one-time thing, but rather an on-going process, so it’s important for an SCA tool to prioritize the risks. This way, developers know what the most crucial risks to solve are.

But how do you prioritize a vulnerability?

The popular method is to prioritize vulnerabilities by the CVSS—a score given to a vulnerability based on the impact, how easy it is to exploit, etc. Every vulnerability that is made public has this score. However, this methodology is too simplistic, since exploitability is the most crucial aspect.

Exploitability of a Vulnerability

Let’s assume that a vulnerability is triggered by a foo() method in a library you’re using. If your code doesn’t call foo() in any flow, either directly or indirectly, the vulnerability is in fact not exploitable. If so, the priority of fixing it is low and efforts should be redirected to exploitable vulnerabilities instead.

Looking at it from an attacker’s perspective, for a vulnerability to be exploitable:

The method foo() needs to be called. This can require a carefully crafted input, the processing of which will trigger a call for foo().
The attacker needs to control the data flow for foo(). Usually, calling a method with “regular input” won’t trigger any unwanted behavior. Unwanted behavior is triggered when a carefully crafted input from the attacker reaches foo(), meaning the vulnerable method needs to be callable and its input controlled.

Developers today can use an entire library for a single API method out of dozens of APIs. Also, libraries they use have their own third-party libraries, with only a partial use of available APIs. This means that given a vulnerability in one of your dependencies, the probability of exploiting it can be below 5%. This has serious implications:

Current prioritizations of vulnerabilities are defocusing. Instead of fixing exploitable vulnerabilities first, efforts are put into risks that may be irrelevant.
They may be considered false positives. You would assume that a critical risk is a top priority, but if the relevant code flow can’t be reached, there’s nothing critical here.
The true number of vulnerabilities that need to be addressed is actually much lower than assumed, and that’s good news for developers. Fewer vulnerabilities means far less effort to remediate them.

By using our SAST scan (CxSAST) to statically analyze the project’s source code and the source code of all its used packages, when examining the call graphs and data flows, the exploitability and risk can be evaluated.

With CxSCA, Checkmarx enables your organization to address open source vulnerabilities earlier in the SDLC, and cut down on manual processes by reducing false positives and background noise, so you can deliver secure software faster and at scale. For a free demonstration of CxSCA, please contact us here.

In the next blog post, we’ll dig deeper into the research behind Exploitable Paths, sharing challenges and insights we collected along the way.