Third-party Software Dependencies: You have to look beyond the download metrics
Package download metrics are easy to manipulate
Disclaimer: I reported this issue to the Microsoft Security Security Response Center (MSRC), but according to their assessment ( which I agree with), the report would not meet the definition of a security vulnerability.
I’ve been very keen to understand what is security scrutiny that Package Managers (such as NPM Registry, Maven Central, NuGet among others) perform at the time of publishing a library.
Targeting third-party libraries is a common technique in which threat actors steal credentials or run arbitrary code or deploy cryptocurrencies mining tools. This is covered by MITRE ATT&CK technique T1474 Supply Chain Compromise. Security researchers have discovered multiple techniques used by adversaries to deceive software engineers to install modified libraries. Typosquat attacks are still effective, and even the trained eye has difficulties identifying those spelling mistakes.
Metrics is a good decision-making tool until they are manipulated
Package Management tools such as NuGet, Maven Central, NPM among others, provide metrics about the most downloaded packages. If you are a .NET Software Engineer, you wouldn't be surprised that newtonsoft.json is the most downloaded package at NuGet.
With JSON being heavily adopted as the de-facto media type for data exchange, it is not remarkable that .NET Developers are heavily using this library. In fact, by looking at those statistics we can see that this dependency is actively used during the build stages (which has an exponential growth with the advent of CI/CD pipelines).
Metrics are an eye-catching indicator, we can argue that the fact a library is actively being downloaded gives some confidence. This metric is then adduced within the Visual Studio Development Environment:
What happens when metrics can be manipulated?
For addressing this concern, I’m focusing just on the NuGet package management. Certainly, other package managements implement a similar model, but I’ve verified this just on this product.
As a Security Engineer, I hypothesize that a threat actor can potentially create a malicious NuGet package and then manipulate the download metrics to ramp up the library to show first. Downloads are shown in descending order of the number of downloads, they also figure in other spots such as most downloaded packages.
As I was trying to validate my point, I created a NuGet package that does not have any kind of documentation. The name of the library is AbuseIPDB.APIv2 and you can see the statistics from here.
In fact, this library could potentially have some malicious code internally, for instance, I could have been exfiltrating the API Key when someone creates an instance of the library.
As you can see, a library that has no documentation, that does not have a GitHub repository, has ~ 3 million downloads. In fact, with sustained growth in the number of downloads, at some point, it could replace Newtonsoft.
To manipulate these metrics, I created a simple cURL script that I run randomly at my workstation (and from the same IP address):
This script simply downloads the NuGet package and outputs the file to the /tmp folder. Initially, I could download the file but I realized that the client name became either (unknown) or Scripted Downloads. This is not good, ultimately if we want to deceive the developer, the last resort you want to use is to give hints that some bots are being used to alter such metrics.
I’m a big fan of the quote “Curiosity leads us to become better engineers”. As a Security Engineer, I’m aware that HTTP Request headers are a rich set of valuable information, I also know that the User-Agent request header is very easy to spoof.
However, the right question to ask is which value to use? For solving this, I cloned the NuGet client application from GitHub and did a quick search on the user agent string. It turns out user agent is an important piece on this puzzle; it has its own class with some hardcoded value, as you can see the default user agent string is set to “NuGet Command Line”
Still, a question remains: If I install a package from Visual Studio .Net, what user agent it uses in a way that the NuGet backend service counts it as a download that is not scripted? For addressing this question I ran Fiddler2 at the time I was installing the package.
The NuGet client uses cache, so you want to make sure either you have a fresh install or clear the NuGet cache. As its name implies, when a package has been downloaded, it is cached so no new downloads are performed (unless a new version is generated which would invalidate the cache).
As I ran this test a few times, I used this command to clear out the NuGet cache:
dotnet nuget locals all --clear
Now let’s see what Fiddler2 has to show us:
Great, we know what HTTP headers Visual Studio uses when downloading a package ( you can see them from the HTTP Header section below) so we have:
- user-agent : The name is lower case and separated by a dash. The distinction is relevant because based on the HTTP RFC 2616, the name of the header figures always in upper case “User-Agent”
- X-NuGet-Session-Id: Which seems to be a GUID added by Visual Studio (I could not confirm if this value has anything to do with the authenticated user in Visual Studio, nevertheless I did change the GUID a bit)
- NuGet-Client-Version: This represents the client version which for Visual Studio is 5.10.0
Once I’ve learned this configuration, I crafted the cURL command and then just waited for the metrics to get collected. It seems they are collected once per day so I had to wait few days until those values were aggregated by NuGet. I then realized that those values actually modified the statistics at NuGet by classifying the downloads appropriately.
With this simple test, I also realized that a malicious actor could abuse the way metrics are generated to ramp up or popularize a malicious package. If metrics are the one factor a development team relies on then we have a serious security issue here.
I know what you are thinking, there are no issues here as ultimately the file is being downloaded and then the metrics reflect such action.
The truth of that matter is the abuse activity and the lack of rate limiting, not at the number of downloads, certainly, we have legitimate cases similar to Newtonsoft in which on a single day you can have millions of downloads product of nightly builds and development pipelines.
Security Recommendation
The truth of the matter is that metrics are useful but should not be the only criteria on which a decision-making process relies. Of course, we want to use a library that is being downloaded by many parties and has good metrics.
Here are some security recommendations on how to improve the Software Development and the associated dependencies:
- Implement a Software Bill of Materials SBOM: Put it simply sBOM or Software Bill of Material is an inventory (or analogous recipe) that makes up software. As described on the US Executive Order 14028 of May 12, 2021, Improving the Nation’s Cybersecurity: “ sBOM means a formal record containing the details and supply chain relationships of various components used in building software. Software developers and vendors often create products by assembling existing open source and commercial software components. The SBOM enumerates these components in a product. It is analogous to a list of ingredients on food packaging. An SBOM is useful to those who develop or manufacture software, those who select or purchase software, and those who operate software. Developers often use available open source and third-party software components to create a product; an SBOM allows the builder to make sure those components are up to date and to respond quickly to new vulnerabilities. Buyers can use an SBOM to perform vulnerability or license analysis, both of which can be used to evaluate risk in a product. Those who operate software can use SBOMs to quickly and easily determine whether they are at potential risk of a newly discovered vulnerability. A widely used, machine-readable SBOM format allows for greater benefits through automation and tool integration. The SBOMs gain greater value when collectively stored in a repository that can be easily queried by other applications and systems. Understanding the supply chain of software, obtaining an SBOM, and using it to analyze known vulnerabilities are crucial in managing risk.”
- The Open Web Application Security Project (OWASP) provides some more recommendations in the so-called A9:2017-Using Components with Known Vulnerabilities such as:
- Remove unused dependencies, unnecessary features, components, files, and documentation.
- Continuously inventory the versions of both client-side and server-side components (e.g. frameworks, libraries) and their dependencies using tools like versions, DependencyCheck, retire.js, etc. Continuously monitor sources like CVE and NVD for vulnerabilities in the components. Use software composition analysis tools to automate the process. Subscribe to email alerts for security vulnerabilities related to components you use.
- Only obtain components from official sources over secure links. Prefer signed packages to reduce the chance of including a modified, malicious component.
- Monitor for libraries and components that are unmaintained or do not create security patches for older versions. If patching is not possible, consider deploying a virtual patch to monitor, detect, or protect against the discovered issue.
3. When choosing a library, check the bug fixes ratio so you can understand how fast the maintainer fixes a bug and what their backlog looks like.
References
https://www.federalregister.gov/documents/2021/05/17/2021-10460/improving-the-nations-cybersecurity
https://owasp.org/www-project-top-ten/2017/A9_2017-Using_Components_with_Known_Vulnerabilities