Threat modeling the TRAIL of Bits way

Contents

What TRAIL is Why a TRAIL threat model provides value How TRAIL works Model building Threat scenarios What you get from a lightweight threat model Findings and follow-on work What you get from a comprehensive threat model Applying the results Informing further security reviews Remediation Updating your threat model I like how a TRAIL threat model sounds, how do I get one?

Our threat modeling process is a little bit different. Over time, multiple application security experts have refined this process to provide maximal value for our clients and to minimize the effort required to update the threat model as the system changes.

We call our process TRAIL, which stands for Threat and Risk Analysis Informed Lifecycle. TRAIL enables us to trace and document the impact of flawed trust assumptions and insecure design decisions through our clients’ architectures and the systems and processes that support them. Mitigating system-level findings like these squashes whole classes of vulnerabilities, which means fewer one-off bug reports and fixes to worry about.

What TRAIL is

We’ve all used a variety of threat modeling methodologies over the years; each has its strong suit, but none perfectly fit our clients’ needs, so we combined the best parts of what we knew and iterated to build our own process. TRAIL initially extended Mozilla’s single-component Rapid Risk Assessment (RRA) process to whole systems (large and small), incorporating parts of the NIST SP 800-154 Guide to Data-Centric Threat Modeling and the NIST SP 800-53 security and privacy controls dictionary.

While RRA’s data dictionary inspired our approach, TRAIL enables us to model all in-scope parts of the system and their relationships with more rigor. When following TRAIL, we systematically cover each connection between components. We don’t just uncover direct threats to the data that each component handles, but also emergent weaknesses that arise from improper interaction between components, and other architectural and design-level risks.

Security patching can easily become a cycle of receiving a security report, making a one-off fix, and then getting yet another ticket that documents yet another instance of exactly the same problem. Structured threat modeling breaks this cycle of treating the symptoms over and over. A proper threat model exposes design-level weaknesses (of which individual vulnerabilities are symptoms) so they can be remediated.

Why a TRAIL threat model provides value

TRAIL has three goals:

Document the current system’s architecture-level and operational risks;
For each risk, provide our client with both practical, short-term mitigation options and long-term strategic recommendations;
Enable our client to update the threat model themselves as they mitigate risks, and the system otherwise changes over time.

Throughout the software/systems development life cycle (SDLC), application security review results in a better product. The design phase of the SDLC is an ideal time for collaborative threat modeling exercises involving both security engineers and the people building the system: there aren’t yet users relying on particular system features, but requirements are mostly set in stone, so it’s easier to make design improvements. But the second-best time to plant a tree is, naturally, now. Threat modeling work provides value in every SDLC phase since it improves developers’ understanding of the consequences of design choices.

How TRAIL works

Model building

TRAIL’s foundation is in first building as accurate a model as possible. We work with our client to identify all in-scope system components. Then, we’ll place a trust boundary anywhere that security controls gate connections between components (or should, as per security requirements and design). We’ll group components that share trust boundaries into trust zones.

We’ll talk extensively with our client and read their system documentation to build knowledge of the system and its SDLC, uncovering and documenting previously unwritten assumptions. Then, we establish relevant combinations of connections and threat actors, especially for those connections that cross trust boundaries. We call these connection-actor combinations threat actor paths.

While our discussion of potential threats with the client throughout this process is relatively free-form, building threat actor paths ensures we stay rigorous and don’t miss a way that an attacker could maliciously escalate their privilege or cause data to move between components or out of the system.

Threat scenarios

Our core model-building work allows us to identify design-level and operational risks that our client could have otherwise missed. We’ll document these risks in the form of threat scenarios. Each threat scenario describes a potential way that an adversary could exploit a single connection crossing a trust boundary between two components in the system. Putting threat scenarios together and doing further confirmation research enables us to write findings, but we’ll discuss findings later. For some threat modeling exercises, we will stop refining our system context at this point and will wrap up our work with summary-level remediation recommendations—we call this type of review a lightweight threat model.

What you get from a lightweight threat model

A lightweight threat modeling engagement results in an end-to-end, high-level overview of the risks inherent to a system’s design, illustrated with a handful of threat scenarios plus recommendations. Our clients typically use the results of lightweight threat models to guide further security review and remediation. Here are a few threat scenarios from the 2023 Trail of Bits assessment of the Arch Linux package manager, Pacman:


Scenario	Actor(s)	Component(s)
An environment variable affects the Pacman package manager’s libcurl dependency. For instance, Pacman redirects its HTTP connections through the proxy defined in the `http_proxy` environment variable. If an attacker injects an environment variable into Pacman’s runtime environment — a difficult prospect, given that it runs as root during installs — they could cause Pacman to exhibit exploitable or undesirable behavior.	Local root	Pacman package manager
An attacker attempts a substitution attack, bumping versions on a popular package through a compromised local network repository or remote repository. Pacman will always install the latest version of a package across all repositories it has access to. As such, if a user has both local and remote repositories enabled, an attacker who can introduce an identically named, higher-versioned package into one of the remote repositories can easily induce the user to install this version of the package. Similar attacks may also be possible via DNS confusion (e.g. if an attacker registers a domain that shadows a local network domain name). See this GitHub blog post on substitution attacks against npm.	Repository administratorExternal attacker	Pacman package managerLocal network repositoryRemote network repository
An attacker compromises a packaging key and produces different but valid signatures for a package to introduce malicious changes. In this case, Pacman would install the new package version normally, and the user would be entirely unaware. Currently, there is no way to enable a warning when a package’s signature changes.	PackagerInternal attacker	Pacman package managerPackaging keys

Table 1: Example threat scenarios from our 2023 assessment of Arch Linux Pacman

Figure 1: The modeled data flow of packages and their signing data from Arch Linux’s root of trust to the host machine on which Pacman runs

More lightweight threat models can be found in audit reports in our Publications repo, including in the reports from our assessments of CoreDNS, Eclipse Jetty, Kubernetes Event-Driven Autoscaling (KEDA), and others.

Findings and follow-on work

When a client wants a more granular security review but isn’t sure how best to target it, we can do a lightweight threat model and use its results to scope a follow-on secure code review, infrastructure review, or fuzzing work to just a few threat scenarios or system components.

Or, instead of stopping with the high-level overview provided by a lightweight threat model, we can alternatively do a comprehensive threat model to produce system-level findings. A threat model finding concretizes threat scenarios with deeper, targeted investigation, evaluates the severity and difficulty of exploitation by different possible threat actors, and concludes with tailored recommendations on how to remediate those threats.

What you get from a comprehensive threat model

In a comprehensive TRAIL threat model, we’ll continue past the endpoint of a lightweight threat model, putting our identified threat scenarios together and doing more research to ultimately present findings and finding-specific recommendations. Here are summaries of a few findings from our Linkerd engagement:

At the time of the Linkerd engagement (in 2022), the destination service, which served routing information to sidecar proxies within a Linkerd-integrated Kubernetes cluster, lacked built-in rate limiting. This could have allowed an attacker with sidecar proxy access within one of the cluster user application namespaces to easily cause a denial of service by repeatedly requesting routing information, or to change the destination service’s availability status to force updates in the Linkerd controller component.
We also discovered that nothing prevented infrastructure operators from using the Linkerd CLI tool to fetch YAML definitions, including sensitive information, over unencrypted HTTP. This cleartext data flow would weaken the overall security posture of an infrastructure operator’s system.
Also at that time, the linkerd-viz web dashboard lacked access controls. This meant that any attacker who learned the Linkerd dashboard’s network address by simply running a scanning tool on a Linkerd-configured Kubernetes cluster could then gain detailed knowledge about the namespaces, services, pods, containers, and other resources in the cluster by accessing this dashboard, and could use this information as a basis for targeting the software running on top of the cluster.

Figure 2: The modeled data flow of a representative Linkerd deployment

The table below includes some of the threat scenarios that we used to build the findings summarized above:


Originating Zone	Destination Zone	Actor(s)	Description
External	User Application Namespaces	Infrastructure Operator	User applications share a pod with their sidecar proxies and respective init containers. Therefore, operators of user application infrastructure should be aware that if a user application is compromised, lateral components such as the sidecar proxy could also be compromised. This may expose routing information and certificates within the namespace.
External	Linkerd Namespace	Internal Attacker	An internal attacker with access to an external service that hosts an infrastructure operator’s YAML files may be able to manipulate the underlying infrastructure.
User Application Namespaces	`linkerd-viz` Namespace	Internal Attacker	Internal attackers with access restricted to the application namespace could reach Prometheus endpoints to obtain metrics data that could give them insight into other cluster components that they would not otherwise have visibility into.

Table 2: Example threat scenarios from our 2022 Linkerd comprehensive threat model

Other comprehensive threat model reports in our Publications repo include even more threat actor paths and the findings we built with them; our reports for Curl and Kubernetes are great examples.

Applying the results

Once we’ve mapped your whole system, identified security control gaps in its design, explored potential threat scenarios together, and provided our findings and recommendations, what’s next?

Informing further security reviews

We internally use our threat models’ outcomes to provide context and direction for further Trail of Bits reviews of the same system, improving efficiency and outcomes on subsequent audits. If you are interested in both the results of a threat model and another type of security engagement we offer, why not book both engagements back-to-back with the threat model first? This retrospective blog post from 2024 on our work with OSTIF gives several excellent examples of this pairing!

Remediation

Our practice is to include short-term (immediate stopgap) and longer-term (to achieve the ideal state) mitigation suggestions for each finding in a comprehensive threat model. Where possible, we recommend several overlapping mitigations per finding, since a single mitigation could fail or be subverted by a resourceful attacker. We also include a high-level summary of our recommendations in both comprehensive and lightweight threat models.

Updating your threat model

A threat model must change as the system evolves. We provide an appendix with every report that includes directions to help you periodically modify your threat model so it remains relevant as your system’s design and requirements change over time. We’ll also discuss how and when to update your threat model in our next post!

I like how a TRAIL threat model sounds, how do I get one?

Please use our contact form to get in touch. We’d be delighted to learn about your system and your needs!

Special thanks to Stefan Edwards, Brian Glas, Alex Useche, David Pokora, Spencer Michaels, Paweł Płatek, Artem Dinaburg, Ben Samuels, and everyone else who has worked on threat modeling engagements at Trail of Bits for your awesomeness and contributions to the evolution of TRAIL.

Source link