Denial of Service

Your Smart Scale is Leaking More than Your Weight: Privacy Issues in IoT

David Sopas — Mon, 04 Feb 2019 10:57:56 +0000

These days IoT devices are an easy entry point for malicious users to invade users‘ privacy. With that in mind, we tested the AEG Smart Scale PW 5653 BT, specifically the Bluetooth security (Bluetooth Low Energy or BLE). We also tested the mobile applications Smart Scale for Android and Smart Scale for iOS. To complete our tests, we used our commercial static application security testing tool CxSAST, some tools we made ourselves, and open source software. The Checkmarx Security Research Team found several security issues that have impact on the clients using the smart scale, its associated apps, and for the company itself.

Security Issues by Severity

We ranked the security Issues we found from high to low based on the Common Vulnerability Scoring System Calculator Version 3 (CVSS 3) score. Here’s an overview of the issues we found and what the possible attack scenarios might be.
Denial of Service (Medium – CVSS 7.1)
Possible attack scenario: An attacker could trigger a special request via BLE that crashes the smart scale. The victim needs to remove the batteries or wait until the batteries run out. Keep in mind that the device loses most of its information during this crash.
Changing privacy settings (Medium – CVSS 5.3)
Possible attack scenario: A malicious user – within BLE range – could track the victim because the device keeps the MAC address fixed due to a configuration in the Generic Attribute Profile (GATT). GATT establishes in detail how to exchange all profile and user data over a BLE connection.
Changing device name (Medium – CVSS 5.3)
Possible attack scenario: An attacker – within BLE range – could change the name of the device to something offensive or even to trick innocent users. Also it can be used to better identify the specific device to aid in combining this attack with other attacks.
Mobile application (Smart Scale) Man-in-The-Middle (Medium – CVSS 4.8)
Possible attack scenario: Some requests made by the mobile application don’t use HTTPS, which might allow malicious users to intercept the information sent between the mobile application and the host.

Hardware Reconnaissance

Next, we investigate the hardware — in this case, the scale itself. Here’s an overview of what we learned.
BD ADDR
98:84:e3:36:a7:56
Characteristics/services with write permission:
READ WRITEREAD WRITEREAD WRITEREAD WRITEREAD WRITE

*Handle*	*Characteristic*	*Service*	*Properties*
0003	78667579-a914-49a4-8333-aa3c0cd8fedc	?	WRITE
000c	29f11080-75b9-11e2-8bf6-0002a5d5c51b	?	WRITE
0013	00002a02-0000-1000-8000-00805f9b34fb	Peripheral Privacy Flag	READ WRITE
0015	00002a03-0000-1000-8000-00805f9b34fb	Reconnection Address	READ WRITE
001e	78667579-db57-4c4a-8330-183d7d952170	?
0020	78667579-5605-4f75-8e54-fceb7ea465a9	?
0022	78667579-d0fd-4b77-9515-d03224220c29	?
0024	78667579-e255-4c76-8a12-7be9b176e551	?
0029	78667579-ae48-4e5b-ae14-b8eb728398ec	?
002b	78667579-5773-439a-bbcd-7672550a181b	?	READ WRITE
0042	00002a06-0000-1000-8000-00805f9b34fb	Link Loss -> Alert Level	READ WRITE
0045	00002a06-0000-1000-8000-00805f9b34fb	Immediate Alert -> Alert Level	WRITE NO RESPONSE

Permissions required by the mobile application:
android.permission.BLUETOOTH_ADMIN
android.permission.BLUETOOTHandroid.permission.WRITE_EXTERNAL_STORAGE android.permission.MOUNT_UNMOUNT_FILESYSTEMSandroid.permission.VIBRATEandroid.permission.ACCESS_NETWORK_STATE android.permission.ACCESS_WIFI_STATEandroid.permission.READ_PHONE_STATEandroid.permission.INTERNETandroid.permission.ACCESS_FINE_LOCATION android.permission.ACCESS_COARSE_LOCATION

Vulnerabilities

Now that we’ve gone over the security issues by severity and the hardware settings, let’s take a look at the vulnerabilities. One of the things we try in every assessment is to fuzz some bytes to see if it triggers things that should not happen. In this case we noticed that the service Immediate Alert -> Alert Level allowed us to write. So we tried to write one byte with the value 1. A single request didn’t result in any issues, but after a couple of requests, it crashed the smart scale. It was a bit difficult to replicate, but we found a pattern. When the smart scale was in standby and anyone sends the request below, it will crash the smart scale.
char-write-req 0x0045 01
We wrote a small proof-of-concept exploit (using pygatt python lib) to replicate it.

It connects to the device.
Sleeps for 5 seconds (meanwhile the device enters standby mode).
Sends the request.
Crashes the smart scale.

Now the only way to get the smart scale working again is to remove one of the batteries or wait until they run out, because the screen is frozen with the light on. We kept it for 30 minutes and the smart scale never went off. It’s also important to mention that resetting the smart scale removes information, such as other configuration steps the user took in the past. So we must also consider that the integrity of those settings is at stake also.
Watch a quick video showing what happens:

Changing Privacy Settings

The privacy flag defines if the device sends the original BD address (the unique 48-bit identifier assigned to each Bluetooth device by the manufacturer) in the BLE advertisements packets. If privacy is enabled, the device changes the BD address randomly, making it very hard to track the user. By default, this smart scale has Privacy Disabled, so it’s possible to track a user.
This option should not have the property set to WRITE and should be enabled by default. If a user has experience with connecting to Bluetooth devices, he or she can use GATTTool to change it with the following request:
char-write-req 0x0013 01
Changing Device Name
An attacker could change the device name to another. Originally it has the value:
VScale
You can change the name using GATTTool:
char-write-req 0x001e 064f574e000000
This request changes the device name to OWN. For some reason, 06 must be an operation code, because you will need that byte to change the name.

Smart Scale Man-in-The-Middle

Monitoring the mobile application Smart Scale we noticed that some requests are sending packets without https:
log.umsns.com
alog.umeng.com
ex.mobmore.com
ex.puata.info
open.yixin.im
vt.lotuseed.com
t2.qpic.cn

Mobile Application

The mobile applications (Android and iOS) were developed by a Chinese company named VTrump, which makes IoT applications for companies like AEG, Texas Instruments and Realtek. One of the things we noticed about the mobile application, which is referred to in the AEG Smart Scale package, is that many users are reporting that some mobile antivirus tools are blocking the app.

Smart Scale for Android App

So let’s take a look at the permissions needed on Android:
android.permission.BLUETOOTH_ADMIN
android.permission.BLUETOOTH
android.permission.WRITE_EXTERNAL_STORAGE
android.permission.MOUNT_UNMOUNT_FILESYSTEMS
android.permission.VIBRATE
android.permission.ACCESS_NETWORK_STATE
android.permission.ACCESS_WIFI_STATE
android.permission.READ_PHONE_STATE
android.permission.INTERNET
android.permission.ACCESS_FINE_LOCATION android.permission.ACCESS_COARSE_LOCATION
Immediately some of these permissions triggered alerts for our research team, such as ACCESS_FINE_LOCATION. WRITE_EXTERNAL_STORAGE, and ACCESS_COARSE_LOCATION. Why does this smart scale app need to know the user‘s location or have permission to write on external storage? Also the app has the ability to change the permissions for MOUNT_UNMOUNT_FILESYSTEMS, which allows it to mount and unmount file systems for removable storage. Why does a smart scale app need that?
One of the packages loaded by the apps is umeng.com. This package has a service that allows it to download an Android Package Kit (APK) from the Internet: com.umeng.common.net.DownloadingService.

com.umeng.common.a.c(DownloadingService.t, String.format("saveAPK: url = %1$15st|tfilename = %2$15s", new Object[]{this.k.c, this.d.getAbsolutePath()}));HttpURLConnection a = a(new URL(this.k.c), this.d);
Besides com.umeng, another package also requests the International Mobile Equipment Identity (IMEI) from the user: android.telephony.TelephonyManager.getDeviceId.The IMEI is the unique numerical identifier for each mobile device. Again, why does the smart scale app need that information?

umeng
tencent

Smart Scale for iOS App

Switching to the iOS ap, we noticed the following POST request to gather.lotuseed.com (http):

Here’s what we learn from decoding the information that is sent:

mid2sid@H2yEAABwWVrVfAQAAAE8yNjI1ZGIzYTU5MjFiNTVlYzRjMGJhYTcxMDRlZWEyMGJiMDgwNjU4et 0ei Eem1510575869000+0mid1sid@H2yEAABCqVrVfAQAAAE8yNjI1ZGIzYTU5MjFiNTVlYzRjMGJhYTcxMDRlZWEyMGJiMDgwNjU4cm 0st 2sv 1.2.0ac Unknownav 1.3.7.1ak com.vtrump.vscalecc PTcl enlt 1510575876630+0ca
[26806]MEOct wifiMAC 02:00:00:00:00:00MAC2 11:22:33:44:55:66ssid REDACTEDmr
   280870912mt 25842688tr 1351158784tt    240294912cr 1bl  0.550000mid 4sid`H2yEAABCqVrVfAQAAAE8yNjI1ZGIzYTU5MjFiNTVlYzRjMGJhYTcxMDRlZWEyMGJiMDgwNjU4

What we’d like to focus is that private information is sent to a server in China (without https):
Application version: 1.2.0
Navigator: 1.3.7.1
Application: com.vtrump.vscale
Country: PT
Language: EN
Phone Operator: MEO
Router MAC: 11:22:33:44:55:66
Wifi SSID: REDACTED
On Android, the information sent is much more complete.

Decoding that information, we get the following:

mid  3sid@@NDMDAO9Q8blfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=ac portalav 1.2.4MAC 02:00:00:00:00:00IMEI REDACTED IMEI REDACTED AID 8c58689033753714BTA 02:00:00:00:00:00db
   SELECLINEdm S6S4IN3Gmn AUCHANhn sp7731c_fs280_32v4kn 1cis armv7lcfeHswp
half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32cm 41cbm 2419.91cc 2cfr 1200.0dfÏ Filesystem               Size
   Used     Free   Blksize
/dev                   229.1M    72.0K   229.0M   4096
/sys/fs/cgroup         229.1M    12.0K   229.1M   4096
/mnt                   229.1M     0.0K   229.1M   4096
/system                  2.1G     1.5G   680.6M   4096
/data                    4.6G   262.5M     4.3G   4096
/cache                 143.6M   240.0K   143.4M   4096
/productinfo           928.0K    84.0K   844.0K   4096
/sale                  928.0K   172.0K   756.0K   4096
/storage               229.1M     0.0K   229.1M   4096
/storage/emulated        4.6G   262.5M     4.3G   4096
/storage/67A7-1A18       1.8G     1.1M     1.8G   32768
/storage/self          229.1M     0.0K   229.1M   4096
mr 469136sd 1dsts@ 1vr 800hr 480sz 4,3rr  60sst@ 1n MC3XXX 3-axis
Accelerometerm 19.6133v  MC3XXXsst@ 5n CTP Light sensor(Noexist)m 1.0v
 CTPsst@ 8n CTP Proximity sensorm 1.0v  CTPfw
fv@ 6.0bv6TM_BASE_W16.43.2|sc7731C_CP0_modem|10-19-2016
14:00:16kvG3.10.65
release2@ww-linuxf4 #1
SMP PREEMPT Tue Nov 15 11:09:41 CST
2016bn 876803_V1.0_20161115blv unknownbfDSELECLINE/S6S4IN3G/S6S4IN3G:6.0/MRA58K/W16.44.4-14:user/release-keysal 23rf
 0mid  2sid@@NDMDABy3cVRfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1508950293201+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDABy3cVRfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1508950302268+1pi"com.vtrump.smartscale.MainActivityso
 0mid  2sid@@NDMDABy3cVRfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1508950324717+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDABy3cVRfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1508950326254+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAM8YiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1sf 1cc PTcl ptlt 1509639592179+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 57714um 57714mr 0mt 0tr 22368tt 22752bl  0,530mid
 2sid@@NDMDAM8YiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509639592732+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAM8YiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509639601525+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAFHLiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1cc PTcl ptlt 1509639637847+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 103321um 103321mr 0mt 0tr 37216tt 37600bl  0,520mid
 2sid@@NDMDAFHLiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509639637841+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAFHLiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509639669965+1pi"com.vtrump.smartscale.MainActivityso
 0mid  2sid@@NDMDAFHLiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509639688223+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAFHLiH1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509639730975+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAF7Din1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1cc PTcl ptlt 1509639766888+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 232347um 232347mr 0mt 0tr 37216tt 37600bl  0,510mid
 2sid@@NDMDAF7Din1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509639766878+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAF7Din1fAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509639830621+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAOHyMpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1sf 1cc PTcl ptlt 1509969556322+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 95844um 95844mr 0mt 0tr 37216tt 37600bl  0,500mid
 2sid@@NDMDAOHyMpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509969558374+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAOHyMpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509969590073+1pi"com.vtrump.smartscale.MainActivityso
 0mid  2sid@@NDMDAOHyMpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509969599371+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAOHyMpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509969609660+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAM7ERpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1cc PTcl ptlt 1509970855130+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 1393567um 331637mr 0mt 0tr 37216tt 37600bl  0,490mid
 2sid@@NDMDAM7ERpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1509970855118+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0mid  2sid@@NDMDAM7ERpFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Pem 1509970894521+1pi"com.vtrump.smartscale.MainActivityso
 0mid  1sid@@NDMDAKE+hJFfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscaleof 1cc PTcl ptlt 1509974884006+1ct UnknownMAC 02:00:00:00:00:00ur -60ut -1886MAC2 00:00:00:00:00:00ssid em 5422422um 401017mr 0mt 0tr 37216tt 37600bl  0,480mid
 1sid@@NDMDAO9Q8blfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=cm 30000st 1sv
   1.4.0.3.0ac portalav 1.2.4an Smart
Scaleak com.vtrump.smartscalesf 1cc PTcl ptlt 1510653120785+1apn
"REDACTED-2g"ct wifiMAC 02:00:00:00:00:00ur 1330ut -6MAC2 48:7b:6b:b5:12:94ssid
"REDACTED-2g"em 212844um 212845mr 0mt 0tr 14025847tt 432395bl
 0,980mid  2sid@@NDMDAO9Q8blfAQAAAU0wMjowMDowMDowMDowMDowMEkzNTM4OTYwODQ0MTM0NDY=et 0ei Rem 1510653121547+1pi"com.vtrump.smartscale.MainActivityso 0sl
 0

Here’s a more readable version of the data sent to the Chinese server:
MAC: 11:22:33:44:55:66
IMEI: REDACTED
IMEI2: REDACTED
Phone brand: SELECTLINE
Filesystem: /dev, /sys/fs/cgroup, /mnt, /system, /data, /cache, /productinfo, /sale, /storage, /storage/emulated, /storage/self
Phone functions: Light sensor, Accelerometer, Proximity sensor
Kernel: release2@ww-linuxf4 #1 SMP PREEMPT Tue Nov 15 11:09:41 CST
Wifi SSID: REDACTED-2g
Lotuseed is a mobile data analysis software platform based in China. This data is sent without https, meaning that the communications between the app and Chinese server are not encrypted. After we notified VTrump about our findings, they declined to make the changes we suggested. Later, however, we tested again and found that they “fixed” the app by adding encryption, however, they were still sending the same private information. I don’t believe that this type of information is necessary for a smart scale to collect, much less send to a third party for data analysis.

Smart Scale Fails at Security & Privacy

The Checkmarx Security Research Team advised AEG to launch a patch that fixes clients’ smart scales to prevent malicious users from damaging the hardware. We tried to contact AEG to determine if they sold the brand rights for this type of equipment to any Chinese company, but we didn’t get any response to that question. We also noticed during our research that a lot of clients have issues with the mobile applications, especially because of the bad reputation of the URL used by the app. Based on our research findings, we recommend that you do not use either the Android or iOS app. The permissions these apps require are beyond what is necessary for a smart scale, and the apps share private data insecurely with third-party clients.

hbspt.cta.load(146169, ’15ec501b-a098-46b8-9c07-f53efa2e338f’, {});

Diving Deep into Regular Expression Denial of Service (ReDoS) in Go

Erez Yalon — Mon, 07 May 2018 14:14:42 +0000

Go Programming Language (also known as Golang) is an open source programming language created by Google. Go is compiled, is statically typed as in C (with garbage collection), with limited structural typing, memory safety features and CSP-style concurrent features. In this blog post, we’ll recap Go’s security posture facing Regular Expression Denial of Service (ReDoS) attacks. But first, let’s start by explaining the concept of ReDoS and how such attacks can be exploited and mitigated. This blog post includes a set of practical examples using different programming languages, aiming to show how the Go implementation avoids ReDoS.

The topic of this report was motivated byongoing research on the topic of Go security, where we aim to discover vulnerabilities lurking in Go packages.
func sqli() {
username := r.Form.Get(“username”)
sql := “SELECT * FROM user WHERE username='” + username + “‘” row_fullname := db.QueryRow(sql)
fmt.Printf(“Welcome, %sn”, row_fullname)
}
Listing 1: Golang SQLi example

ReDoS

Regular Expression Denial of Service (ReDoS) is an algorithmic complexity attack that provokes a Denial of Service (DoS). ReDos attacks are caused by a regular expression that takes a very long time to be evaluated, exponentially related with the input size. This exceptionally long time in the evaluation process is due to the implementation of the regular expression in use, for example, recursive backtracking ones.
A regular expression, better known as a ‘regex’, is a sequence of characters that defines a search pattern, used to search for one or more characters within a string. One of the handy usages of a regex is information validation, i.e., ensuring that only properly formed data is being submitted.
For example, let’s pretend that we want to apply a regular expression over the username input of listing 1. Thus, a simple regex could be:
/^[a-zA-Z0-9_-]{3,10}$/
Listing 2: Regex example 1
The regular expression is contained between the slash characters and in the pattern 2 regex. We start by telling the parser to find the beginning of the string (ˆ), followed by any lowercase letter (a-z), uppercase letter (A-Z), number (0-9), an underscore, or a hyphen. The {3,10} section makes sure that the entered string has a length between three and ten characters. Finally, the $ represents the end of the string. For this regex, if we used the input ”checkmarx” it would match the pattern:

Figure 1: Example – Regex 1

On the other hand, if we used a string like ”checkmarx’ OR SLEEP(10)–” it would not match the pattern.

Figure 2: Example – Regex 1 Fail

Evil Regular Expressions

Even though the benefits of using regexs for input validations are great, depending on the way they are written and the engine used, a malicious user can leverage it and make the application or service unavailable. Thus, evil regexs are the root cause of the ReDoS issue. They are considered evil or malicious if they can stuck on crafted input. To understand this better, let’s consider the following regular expression:
/A(B|C+)+D/
Listing 3: Evil Regex example 1
In this scenario, this regex pattern starts by searching for the character ’A.’ Then, the following string must either be the character ’B’ or one or more ’C’s, (B|C+). The next + indicates that it can search for one or more occurrences of the previous string. Finally, the ’D’ ensures that the string is terminated by the character ’D’. To match this regular expression, any input of the following type would be accepted:
ABCD
ABCBD
ACD
ACBD
Listing 4: Valid input for Evil regex 1
To show the differences between the implementations used by different languages, we created simple programs in four different languages: Python, JavaScript, PHP and Go. All created programs use the regex from example 3. This benchmark was done incrementing passed inputs, allowing us to visually understand the different behaviors of the program depending on the variations of the inputs.
In the example code from listing 5, we show a simple Python implementation to evaluate the evil regex. We start by testing the valid inputs from listing 4. Then we send malicious inputs to try to get the program stuck.
regex = r”A(B|C+)+D”
test_str = raw_input(“Enter the string: “)
matches = re.finditer(regex, test_str) (…)
print (“It took: %s seconds” % elapsedTime)
Listing 5: Python Regex compiler
To craft malicious inputs, we started by incrementing valid and invalid inputs, and tweaking it according to the time differences between them. At some point, we found some relevant discrepancies. In figure 4, we show the attempted malicious payloads and the matching elapsed time for evaluation. We see that when a malicious input payload of type AC+E, where + represents one or more occurrences of the character C, is sent with more than 20 Cs, the elapsed time starts to double for each new C.

The same principle was applied to the other languages and we maintained the input cases in order to compare the results. The next example is for JavaScript. Listing 6 shows the JS code snippet:
const regex = /A(B|C+)+D/g; (…)
let m;
while ((m = regex.exec(str)) !== null) { (…)
} (…)
console.log(“It took: ” + seconds + ” seconds”);
Listing 6: JS Regex compiler
The obtained results for the JavaScript implementation are as follows:

The next example is PHP:
$re = ‘/A(B|C+)+D/’;
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0); (…)
echo $timediff.” seconds”;
Listing 7: PHP Regex compiler
Where the results from figures 7 and 8 produced these results:

Finally, an example for Go. We created the following program:
func main() {
re := regexp.MustCompile(`A(B|C+)+D`) (…)
for i, match := range re.FindAllString(str, -1) { fmt.Println(match, “found at index”, i)
} (…)
fmt.Println(“It took:”, elapsed.Seconds(), “seconds”)
}
Listing 8: Go Regex compiler
And produced the following results:

We can see that the results from PHP (8) and Go (10) implementations are much different from the Python (4) and JavaScript (6) results. The malicious inputs used in Python and JavaScript generated exponential time increments, versus the linear time responses for PHP and Go.
In the PHP regex implementation, we used the Perl Compatible Regular Expressions (PCRE) library 1, which uses backreferences. The official regex package [2] implemented in Go uses the RE2 engine 2, which does not support backreferences, and guarantees a linear time execution while avoiding regex denial of service.
In table 1, we summarize the obtained results for each programming language, where it is displayed the elapsed time in seconds for each input that was tested. For the purpose of this post, we only tested ten inputs, starting with 20 ’C’ characters and incrementing one unit. This clarifies that for the Python and JavaScript implementations the time doubles when a new C is added.
Finally, in the chart seen in figure 11, it becomes visually clear what the discrepancies are between the results and places of the performances of PHP and Go side by side.

Behind the Curtains

The most commonly used algorithms to implement regular expression matching are:

Perl-based
NFA-based

So far, we have seen how the PHP implementation uses the PCRE, Perl-based, and Go uses the RE2. In any case, to accomplish the regular expression matching, the engine builds a Nondeterministic Finite Automaton (NFA), which is a finite state machine where for each pair of state and input symbol, there may be several possible next states. Hence, for each input symbol, the NFA will transit to a new state until all the input symbols have been consumed. This will try all paths of the NFA until it reaches an accepting state, that is, where a match occurred or all the paths were attempted but with no match.
Considering the regex ˆ(a+)+$ and its correspondent NFA:

We can use the same methodology of inputting different sizes in order to understand the NFA behavior. To this, if we choose the input aaaaX, 16 possible paths will exist in the graph from figure 12.
If we modify the input to aaaaaaaaaaX, it will have 1024 steps. And if we change it to aaaaaaaaaaaaaaaaX, 65536 possible paths will be generated. Each additional ”a” doubles this number. This behavior is an extreme case and happens because the algorithm will go through all the possible paths until failing.
What happens behind the curtains is that any time a symbol is being tested by the engine and it fails to match the next one, it will backtrack and look for another way to compile the previous symbol. If this path gets too long, the number of backtracking steps will eventually become very large, resulting in catastrophic backtracking, leading to a possible denial of service.
If we take the example of the regular expression from listing 3, and with the help from the regex101 website, we can resume this behavior in a table, where displayed is the number of steps taken for a target input string.

Table 2: PCRE (PHP) – Benchmark

From table 2, it is clear that each time we incremented the number of C’s by one unit, the engine took twice the number of steps. Using this engine from the regex101 website, if more than 998 C’s are used, it will respond with a catastrophic backtracking message:

This is the turning point between the PHP and Go implementations. In subsection 2.2, we saw that for the input used, the results in terms of elapsed time were very similar, but we did not test for extremely large inputs, as seen in figure 13, crashes the PHP implementation. This is avoided in Go.
As a matter of fact, all programming language engines (from this website) will have disastrous behaviors with this input – except for Go. JavaScript will respond with a timeout and the Python with a catastrophic backtracking. As for Go, it will resolve the string input in approximately 218ms. These results can be consulted here.
2.3.1 Easter Egg
It is also important that websites testing regular expressions can properly detect catastrophic cases:

Figure 14: Catastrophic Backtracking on Pythex

A malicious user could take advantage of the lack of validation in this website to provoke a denial of service. Another example happens in the https://www.debuggex.com/ website. When a vulnerable regular expression is used with a malicious input, it will hang the page.

Conclusion

In this blog post, we recapped what a regular expression is and how can it be leveraged to provoke a denial of service. We go through a set of examples where behaviors of different engines are shown. Specifically, we emphasize the Go behavior.
Despite possible recommendations and workarounds to avoid ReDoS, which revolve around the usual input sanitization, the best measure is to target the root cause, and so, focus on the implemented algorithm.
The implementation provided by the Go package (regexp) is guaranteed to run in time linear in the size of the input. A property that is not guaranteed by most open source implementations of regular expressions [3].

Using open source, but not sure what versions and components are in use?
Get a single holistic view of your application portfolio

References

ReDoS, available at https://www.owasp.org/index.php/Regular expression Denial of Service – ReDoS.
Package regexp, available at https://golang.org/pkg/regexp/.
Regular Expression Matching Can Be Simple And Fast, available at https://swtch.com/~rsc/regexp/regexp1.html