Detection Engineering Maturity Matrix

Kyle Bailey
5 min readApr 26, 2021

Update: I did a talk on this maturity matrix at the SANS Blue Team Summit 2021. If you want more context on each of the sections laid out below, I recommend watching the video. The recording can be found here (link).

Detection engineering has long been a function of the incident response team, however over the last several years it has gained momentum becoming a dedicated and more well defined function within many security operations teams. Many great articles and presentations exist (see below) on what the purpose of detection engineering is and where it fits into a broader Security Operations team. The goal of this article is to help the community better measure the capabilities and maturity of their detection function and provide a high level roadmap for organizations looking to either build a team or to expand an existing one.

I will cover 2 high level topics. The first are three detection pillars, three areas I have found are important to focus the efforts of a detection team in order to make the function as productive as possible within the lens of security operations. Second, a maturity matrix to better describe the phased approach to building and maturing a detection engineering team over time.

Pillars

The core pillars my team has found useful are Detection-as-code, the Incident Response Experience, and the detection logic and infrastructure itself. Let’s dig into each of these a bit further.

Detection-as-code

This principle is the idea that we should treat our detection logic (think SIEM query, EDR rule, Zeek script, Suricata and yara rules, etc.) as code and do our best to incorporate software engineering principles into our detection logic and detection creation workflow. For example, version control is a capability that is missing from most SIEM’s and EDR products. Too often we found ourselves asking questions like when was this alert changed? What was the change? Was the change reviewed, approved and tested? Building workflows around detection logic that insert version control and a CI/CD pipeline are core to this principle. By doing this you can enable seamless review, static and dynamic testing and approval prior to detection logic hitting production, helping ensure your team is delivering a consistent product to incident responders.

A useful byproduct of having your detection living as code is the ability to store alert metadata (think Mitre TID(s) the logic detects, test cases, or even historical fidelity) along side your detection in an organized way that then enables the ability to continuously test use-cases, calculate metrics, Att&ck heatmaps, etc., all in a programatic way. We learned the value of this the hard way and are currently working through how we migrate unstructured alert metadata into version control in a centralized, structured form.

It is important to note that building out these workflows and processes generally require custom code, making it important to have resource(s) on your detection team (or that you can lean on within your organization) that can write code to glue systems like your version control repositories and your SIEM together.

The IR Experience

Is this alert worth someone spending time reviewing? By asking this simple question, it can frame your detection logic and new use-cases in a different light. Defining worth will look different for every organization, and will evolve naturally over time. My team often reviews detection we created in the past only to realize it no longer has worth. This can be brought on by a number of factors, from a changing environment, threat landscape or team.

The most basic idea behind this principal is to “begin with the end in mind” or “begin with the IR team in mind”. Prior to work ever beginning on a use-case the detection engineer needs to be thinking about how this use-case will be responded to, what action will the response team take if this alert successfully fires? Does the alert provide the context necessary? Is there a way we can automate the resolution of the alert or send it to the end-user for a decision? Close and constant contact between the detection and response teams throughout the detection lifecycle is critical to making this work. Projects in this pillar are typically around low fidelity alert review, developing frameworks for the continuous monitoring of low fidelity and time consuming alerts or documentation and process uplifts.

Coupling this with detection-as-code, giving IR leaders and senior responders the authority to approve or deny detection logic prior to merging it into production can also be a good way to transfer control to the responders.

Detection Logic & Infrastructure

Do I have the data (visibility) I need to alert on activity X? Is the data timely? Do we have the skills and platform to build and execute the logic? Do we know what detection we should be building? SIEM ingest limitations are something every org faces (even those with “unlimited licenses”). Defining the data sources critical for detection and response can help prioritize what should be kept and what should be trimmed or dropped.

Detection prioritization (i.e., what should we be working on right now?) can be a particularly difficult formula to get right, but tightly integrating these decisions with partner organizations like threat intelligence, incident responders, and security engineering teams is ideal, as each team brings a different perspective and understanding of your environment to the table.

Purple teaming specific Att&ck TTPs is also important to perform regularly. This generates real data on how detection is performing and (usually) uncovers cases where some piece of the detection pipeline is not performing as expected. Purple teaming can be a great driver to answer the “what should we be working on?” question without needing to build a backlog prioritization framework. There are many great articles/talks/whole conferences on purple teaming and different ways to execute a purple team program.

--

--