Detection-as-Code — Testing

Kyle Bailey
6 min readJul 23, 2021


Let’s talk about testing. The often overlooked and less loved sibling of development. Testing is something that most organizations do as they either build or receive (from a vendor or elsewhere) new detection logic. Testing has traditionally been done with an eye for false positives with the concern of overwhelming the IR team due to detecting benign activity in your environment. This is completely necessary but let’s take it a bit further and try to answer two additional questions: does the detection logic actually detect the threat now? and does the logic that was deployed (yesterday, last week, last month…) still detect the threat?

Detection-as-code is the concept within threat detection engineering that you should treat detection logic (SIEM queries, IDS, EDR & YARA rules, etc.) as you would code. I found the term popping up in articles as early as 2019 and over the last year it has begun to surface in talks and articles from SIEM vendors, with some even building these capabilities into the core of their platform.

Let’s break detection-as-code down into the following domains:

  • Version Control & Collaboration: The source of truth for detection logic is in your version control repository, giving the team the ability to quickly identify changes, enforce peer reviews, enable CI/CD, and revert if something goes wrong.
  • Agile Processes: The detection engineering team follows a structured, agile workflow (kanban, scrum, etc.).
  • Static & Continuous Testing: Ensuring logic is functional and error free as it is pushed through the CI/CD pipeline as well as periodically once the logic is live.
  • Code Reuse: Re-using pieces of detection logic for use-cases that are similar in function. Though, depending on your SIEM it can be difficult to achieve “true” code-reuse as we define it in the software engineering world.
  • CI/CD Pipeline: Automated deployment of detection logic into your SIEM or other monitoring tools.

Testing Philosophy

This post will cover testing specifically. Testing has many facets so lets dig a little deeper into each.

Static Testing

Static testing is the process of analyzing the literal text of your detection logic. This is traditionally performed by engineers, who (as we all are) are prone to human error. As your team grows or turnover occurs this can be especially true. Typical “devops” CI/CD pipelines will run code through a variety of linters which statically analyze the code for a variety of different reasons (generally anything from stylistic formatting to security checks).

By running detection logic through a linter, we can achieve the same goal: programmatically check for inconsistencies or errors in order to prevent them from being deployed into production.

As an example, your team might include metadata in the logic of an alert that helps pre-populate certain known fields in a ticketing platform, including Mitre Att&ck data. We follow a similar process at Box, and what we noticed over time was that these fields could easily be overlooked, resulting in inconsistencies in ticket metrics among other issues. By using static analysis to validate the metadata exists as an alert is promoted through the CI/CD pipeline we can maintain consistency in our logic and downstream processes that rely on our alert data.

Other examples could be to check for searches that over use wildcards or otherwise could degrade SIEM performance, searching through the configuration metadata to look for misconfigurations specific to your environment (i.e. was the correct alert action configured?, is it running on the correct time interval?, etc.) The most powerful use of static linting is that it gives a detection team the ability to write these checks as corrective and preventative actions after mistakes are uncovered in the future.

Dynamic Testing

Dynamic testing is the act of executing code to identify run-time errors that static analysis alone cannot find. Depending on your SIEM and set-up there are several ways to go about achieving dynamic testing coverage.

Testing in the CI/CD pipeline: In the pipeline, as a search is being deployed, having the CI/CD pipeline execute the detection logic (in the SIEM) and check for errors returned is one way to perform dynamic testing at the time of promotion. Again, this will confirm the detection logic and external functions it uses are syntactically correct from the SIEM’s perspective. Other data points that could be collected during this phase are search duration and even the number of results returned (a value too large for either could be a bad sign and a reason to block alert promotion).

This still isn’t quite true “dynamic testing” since we aren’t providing the logic a known input and confirming the expected output is generated (more to come on this below).

Monitoring SIEM errors: Most SIEMs have an internal log which will include alert errors. Monitoring this log source is an easy way to watch for run-time errors that are introduced after deployment.

Dynamic Attack Testing

This is where things get fun and we answer the all important question of “does my alert detect the threat? and, is it continuing to detect the threat?” A key component to making this work is the idea of “attack tests”; small test cases that simulate malicious activity the detection logic is designed to catch. RedCanary’s Atomic Red Team Framework is probably the most well known (and comprehensive) collection of attack scripts (link below). As is true in software engineering, the accuracy of your attack test results are only as good as the tests themselves. For this reason it is important these tests receive a peer review similar to the detection logic (or maybe they are even built in part by your Red Team) to ensure the logic is truly testing your detection.

Architecture: One way to achieve automated & continuous attack testing is to create a system which will execute attack scripts on a set of test systems that mirror your environment, including the security stack. The tests are executed on a known, scheduled interval, the scheduler then confirms if the test was detected (by monitoring detected events from the SIEM) and finally correlates the test and result, recording a success or failure of each test for tracking.

The concept is simple enough on the surface and can give your detection team the ability to monitor detection coverage and health in an automated way. By running these tests on a scheduled interval we can ensure that our detection logic is functional end-to-end days, weeks or months after the logic was initially deployed.

A few other random thoughts about this I will just brain dump:

  • As your threat intel/red team/detection team/IR team discovers new methods of threat actors executing a TTP, they can create test scripts (which are automatically executed) to determine if they will be detected by existing logic or if new logic needs to be created.
  • Achieving 100% test coverage for your entire detection catalogue is likely not truly feasible for a variety of reasons, the primary one being the number of environments and system types most detection catalogues cover. For example, an encoded powershell command is easy to reproduce and run harmlessly on a system in your attack test framework; dumping your domain controllers NTDS.dit or running a password spray is probably not.
  • There are vendor tools that can act as the “scheduler” component for you. They have most have a lot of cool features including threat intel integrations, however, depending on your SIEM and ticketing platform setup, there can be quite a bit of additional work that must be done to properly correlate in step 5.

We’re currently in the process of building this out in our environment so you can expect updates along the way as we encounter road bumps and challenges. If you have done or are doing something similar please reach out. I’d be interested in hearing more about what worked well and what did not.