Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the easy-accordion-free domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the zoho-flow domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home/mother99/jacksonholdingcompany.com/wp-includes/functions.php:6114) in /home/mother99/jacksonholdingcompany.com/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":1754,"date":"2023-11-21T15:05:37","date_gmt":"2023-11-21T15:05:37","guid":{"rendered":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/"},"modified":"2023-11-21T15:05:37","modified_gmt":"2023-11-21T15:05:37","slug":"detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm","status":"publish","type":"post","link":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/","title":{"rendered":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm"},"content":{"rendered":"

In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their\u2026 Read more on Cisco Blogs<\/a><\/p>\n

\u200b<\/p>\n

In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their techniques are many, and they go out of their way to avoid detection and obfuscate their actions. In fact, one element of obfuscation \u2013 command-line obfuscation \u2013 is the process of intentionally disguising command-lines, which hinders automated detection and seeks to hide the true intention of the adversary\u2019s scripts.<\/p>\n

Types of Obfuscation<\/strong><\/h2>\n

There are a few tools publicly available on GitHub that give us a glimpse of what techniques are used by adversaries. One of such tools is Invoke-Obfuscation, a PowerShell script that aims to help defenders simulate obfuscated payloads. After analyzing some of the examples in Invoke-Obfuscation, we identified different levels of the technique:<\/p>\n\n

Each of the colors in the image represents a different technique, and while there are various types of obfuscation, they\u2019re not changing the overall functionality of the command. In the simplest form, Light<\/strong> obfuscation changes the case of the letters on the command line; and Medium <\/strong>generates a sequence of concatenated strings with added characters \u201c`\u201d and \u201c^\u201d which are generally ignored by the command line. In addition to the previous techniques, it is possible to reorder the arguments on the command-line as seen on the Heavy <\/strong>example, by using the syntax specify the order of execution. Lastly, the Ultra <\/strong>level of obfuscation uses Base64 encoded commands, and by using Base8*8 can avoid a large number EDR detections.<\/p>\n

In the wild, this is what an un-obfuscated command-line would look like:<\/p>\n

One of the simplest, and least noticeable techniques an adversary could use, is changing the case of the letters on the command-line, which is what the previously mentioned \u2018Light\u2019 technique demonstrated:<\/p>\n

The insertion of characters that are ignored by the command-line such as the ` (tick symbol) or ^ (caret symbol), which was previously mentioned in the \u2018Medium\u2019 technique, would look like this in the wild:<\/p>\n\n

In our examples, the command silently installs software from the website evil.com. The technique used in this case is especially stealthy, since it is using software that is benign by itself and already pre-installed on any computer running the Windows operating system.<\/p>\n

Don\u2019t Ignore the Warning Signs, Inspect Obfuscated Elements Quickly<\/strong><\/h2>\n

The presence of obfuscation techniques on the command-line often serves as a strong indication of suspicious (almost always malicious) activity. While in some scenario\u2019s obfuscation may have a valid use-case, such as using credentials on the command-line (although this is a very bad idea), threat actors use these techniques to hide their malicious intent. \u00a0The Gamarue and Raspberry Robin malware campaigns commonly used this technique to avoid detection by traditional EDR products. This is why it\u2019s essential to detect obfuscation techniques as quickly as possible and act on them.<\/p>\n

Using Large Language Models (LLMs) to detect obfuscation<\/strong><\/h2>\n

We created an obfuscation detector using large language models as the solution to the constantly evolving state of obfuscation techniques. These models consist of two distinct parts: the tokenizer and the language model.<\/p>\n

The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. In other words, the goal of the tokenizer is to separate the sentence or command-line into smaller pieces that are normalized, and the LLM can understand.<\/p>\n

The tokens into which the command-line is separated are essentially a statistical representation of common combinations of characters. Therefore, the common combinations of letters get a \u201clonger\u201d token and the less common ones are represented as separate characters.<\/p>\n

It is also important to keep the context of what tokens are commonly seen together, in the English language these are words and the syllables they are constructed from. This concept is represented by \u201c##\u201d in the world of natural language processing (NLP), which means if a syllable or token is a continuation of a word we prepend \u201c##\u201d. The best way to demonstrate this is to have a look at two examples; One of an English sentence that the common tokenizer won\u2019t have a problem with, and the second with a malicious command line.<\/p>\n\n

Since the command-line has a different structure than natural language it is necessary to train a custom tokenizer model for our use-case. Additionally, this custom tokenizer is going to be significantly better statistical representation of the command-line and is going to be splitting the input into much longer (more common) tokens.<\/p>\n

For the second part of the detection model \u2013 the language model \u2013 the Electra<\/a> model was chosen. This model is tiny when compared to other commonly used language models (~87% less trainable parameters compared to BERT<\/a>), \u00a0but is still able to learn the command line structure and detect previously unseen obfuscation techniques. The pre-training of the Electra model is performed on several benign command-line samples taken from telemetry, and then tokenized. During this phase, the model learns the relationships between the tokens and their \u201cnormal\u201d combinations of tokens and their occurrences.<\/p>\n

The next step for this model is to learn to differentiate between obfuscated and un-obfuscated samples, which is called the fine-tuning phase. During this phase we give the model true positive samples that were collected internally. However, there weren\u2019t enough samples observed in the wild, so we also created a synthetic obfuscated dataset from benign command-line samples. During the fine-tuning phase, we give the Electra model both malicious and benign samples. By showing different samples, the model learns the underlying technique and notes that certain binaries have a higher probability of being obfuscated than others.<\/p>\n

The resulting model achieves impressive results having 99% precision and recall.<\/p>\n

As we looked through the results of our LLM-based obfuscation detector, we found a few new tricks known malware such as Raspberry Robin or Gamarue used. Raspberry Robin leveraged a heavily obfuscated command-line using wt.exe, that can only be found on the Windows 11 operating system. On the other hand, Gamarue leveraged a new method of encoding using unprintable characters. This was a rare technique, not commonly seen in reports or raw telemetries.<\/p>\n

Raspberry Robin:<\/p>\n\n

Gamarue:<\/p>\n\n

The Electra model has helped us detect expected forms of obfuscation, as well as these new tricks used by the Gamarue, Raspberry Robin, and other malware families. In combination with the existing security events from the Cisco XDR portfolio, the script increases its detection fidelity.<\/p>\n

Conclusion<\/strong><\/h2>\n

There are many techniques out there that are used by adversaries to hide their intent and it is just a matter of time before we stumble upon something new. LLMs provide new possibilities to detect obfuscation techniques that generalize well and improve the accuracy of our detections in the XDR portfolio. Let\u2019s stay vigilant and keep our networks safe using the Cisco XDR portfolio.<\/p>\n

We\u2019d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social!<\/em><\/p>\n

Cisco Security Social Channels<\/strong><\/p>\n

Instagram<\/a><\/strong>Facebook<\/a><\/strong>Twitter<\/a><\/strong>LinkedIn<\/a><\/strong><\/p>\n

\n\t\tShare\n
\n
<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t <\/a>\n\t<\/div>\n<\/div>\n<\/div>\n
Share:<\/div>\n
\n
\n
<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t <\/a>\n\t<\/div>\n<\/div>\n<\/div>\n

\u00a0\u00a0Obfuscation is often used by adversaries to avoid detection. This article describes a new approach to detect obfuscation using Large Language Models.\u00a0\u00a0Read More<\/a>\u00a0Cisco Blogs\u00a0<\/p>","protected":false},"excerpt":{"rendered":"

<\/p>\n

In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their\u2026 Read more on Cisco Blogs<\/a><\/p>\n

\u200b<\/p>\n

In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their techniques are many, and they go out of their way to avoid detection and obfuscate their actions. In fact, one element of obfuscation \u2013 command-line obfuscation \u2013 is the process of intentionally disguising command-lines, which hinders automated detection and seeks to hide the true intention of the adversary\u2019s scripts.<\/p>\n

Types of Obfuscation<\/strong><\/h2>\n

There are a few tools publicly available on GitHub that give us a glimpse of what techniques are used by adversaries. One of such tools is Invoke-Obfuscation, a PowerShell script that aims to help defenders simulate obfuscated payloads. After analyzing some of the examples in Invoke-Obfuscation, we identified different levels of the technique:<\/p>\n

Each of the colors in the image represents a different technique, and while there are various types of obfuscation, they\u2019re not changing the overall functionality of the command. In the simplest form, Light<\/strong> obfuscation changes the case of the letters on the command line; and Medium <\/strong>generates a sequence of concatenated strings with added characters \u201c`\u201d and \u201c^\u201d which are generally ignored by the command line. In addition to the previous techniques, it is possible to reorder the arguments on the command-line as seen on the Heavy <\/strong>example, by using the syntax specify the order of execution. Lastly, the Ultra <\/strong>level of obfuscation uses Base64 encoded commands, and by using Base8*8 can avoid a large number EDR detections.<\/p>\n

In the wild, this is what an un-obfuscated command-line would look like:<\/p>\n

One of the simplest, and least noticeable techniques an adversary could use, is changing the case of the letters on the command-line, which is what the previously mentioned \u2018Light\u2019 technique demonstrated:<\/p>\n

The insertion of characters that are ignored by the command-line such as the ` (tick symbol) or ^ (caret symbol), which was previously mentioned in the \u2018Medium\u2019 technique, would look like this in the wild:<\/p>\n

In our examples, the command silently installs software from the website evil.com. The technique used in this case is especially stealthy, since it is using software that is benign by itself and already pre-installed on any computer running the Windows operating system.<\/p>\n

Don\u2019t Ignore the Warning Signs, Inspect Obfuscated Elements Quickly<\/strong><\/h2>\n

The presence of obfuscation techniques on the command-line often serves as a strong indication of suspicious (almost always malicious) activity. While in some scenario\u2019s obfuscation may have a valid use-case, such as using credentials on the command-line (although this is a very bad idea), threat actors use these techniques to hide their malicious intent. \u00a0The Gamarue and Raspberry Robin malware campaigns commonly used this technique to avoid detection by traditional EDR products. This is why it\u2019s essential to detect obfuscation techniques as quickly as possible and act on them.<\/p>\n

Using Large Language Models (LLMs) to detect obfuscation<\/strong><\/h2>\n

We created an obfuscation detector using large language models as the solution to the constantly evolving state of obfuscation techniques. These models consist of two distinct parts: the tokenizer and the language model.<\/p>\n

The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. In other words, the goal of the tokenizer is to separate the sentence or command-line into smaller pieces that are normalized, and the LLM can understand.<\/p>\n

The tokens into which the command-line is separated are essentially a statistical representation of common combinations of characters. Therefore, the common combinations of letters get a \u201clonger\u201d token and the less common ones are represented as separate characters.<\/p>\n

It is also important to keep the context of what tokens are commonly seen together, in the English language these are words and the syllables they are constructed from. This concept is represented by \u201c##\u201d in the world of natural language processing (NLP), which means if a syllable or token is a continuation of a word we prepend \u201c##\u201d. The best way to demonstrate this is to have a look at two examples; One of an English sentence that the common tokenizer won\u2019t have a problem with, and the second with a malicious command line.<\/p>\n

Since the command-line has a different structure than natural language it is necessary to train a custom tokenizer model for our use-case. Additionally, this custom tokenizer is going to be significantly better statistical representation of the command-line and is going to be splitting the input into much longer (more common) tokens.<\/p>\n

For the second part of the detection model \u2013 the language model \u2013 the Electra<\/a> model was chosen. This model is tiny when compared to other commonly used language models (~87% less trainable parameters compared to BERT<\/a>), \u00a0but is still able to learn the command line structure and detect previously unseen obfuscation techniques. The pre-training of the Electra model is performed on several benign command-line samples taken from telemetry, and then tokenized. During this phase, the model learns the relationships between the tokens and their \u201cnormal\u201d combinations of tokens and their occurrences.<\/p>\n

The next step for this model is to learn to differentiate between obfuscated and un-obfuscated samples, which is called the fine-tuning phase. During this phase we give the model true positive samples that were collected internally. However, there weren\u2019t enough samples observed in the wild, so we also created a synthetic obfuscated dataset from benign command-line samples. During the fine-tuning phase, we give the Electra model both malicious and benign samples. By showing different samples, the model learns the underlying technique and notes that certain binaries have a higher probability of being obfuscated than others.<\/p>\n

The resulting model achieves impressive results having 99% precision and recall.<\/p>\n

As we looked through the results of our LLM-based obfuscation detector, we found a few new tricks known malware such as Raspberry Robin or Gamarue used. Raspberry Robin leveraged a heavily obfuscated command-line using wt.exe, that can only be found on the Windows 11 operating system. On the other hand, Gamarue leveraged a new method of encoding using unprintable characters. This was a rare technique, not commonly seen in reports or raw telemetries.<\/p>\n

Raspberry Robin:<\/p>\n

Gamarue:<\/p>\n

The Electra model has helped us detect expected forms of obfuscation, as well as these new tricks used by the Gamarue, Raspberry Robin, and other malware families. In combination with the existing security events from the Cisco XDR portfolio, the script increases its detection fidelity.<\/p>\n

Conclusion<\/strong><\/h2>\n

There are many techniques out there that are used by adversaries to hide their intent and it is just a matter of time before we stumble upon something new. LLMs provide new possibilities to detect obfuscation techniques that generalize well and improve the accuracy of our detections in the XDR portfolio. Let\u2019s stay vigilant and keep our networks safe using the Cisco XDR portfolio.<\/p>\n

We\u2019d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social!<\/em><\/p>\n

Cisco Security Social Channels<\/strong><\/p>\n

Instagram<\/a><\/strong>Facebook<\/a><\/strong>Twitter<\/a><\/strong>LinkedIn<\/a><\/strong><\/p>\n

\n\t\tShare<\/p>\n
\n
<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t <\/a>\n\t<\/div>\n<\/div>\n<\/div>\n
Share:<\/div>\n
\n
\n
<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t\t<\/a>\n\t<\/div>\n<\/div>\n
\n
\n\t <\/a>\n\t<\/div>\n<\/div>\n<\/div>\n

\u00a0\u00a0Obfuscation is often used by adversaries to avoid detection. This article describes a new approach to detect obfuscation using Large Language Models.\u00a0\u00a0Read More<\/a>\u00a0Cisco Blogs\u00a0<\/p>\n

<\/p>\n","protected":false},"author":0,"featured_media":1755,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[],"class_list":["post-1754","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cisco-learning"],"yoast_head":"\nDetecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm - JHC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm\" \/>\n<meta property=\"og:description\" content=\"In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their\u2026 Read more on Cisco Blogs \u200b In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their techniques are many, and they go out of their way to avoid detection and obfuscate their actions. In fact, one element of obfuscation \u2013 command-line obfuscation \u2013 is the process of intentionally disguising command-lines, which hinders automated detection and seeks to hide the true intention of the adversary\u2019s scripts. Types of Obfuscation There are a few tools publicly available on GitHub that give us a glimpse of what techniques are used by adversaries. One of such tools is Invoke-Obfuscation, a PowerShell script that aims to help defenders simulate obfuscated payloads. After analyzing some of the examples in Invoke-Obfuscation, we identified different levels of the technique: Each of the colors in the image represents a different technique, and while there are various types of obfuscation, they\u2019re not changing the overall functionality of the command. In the simplest form, Light obfuscation changes the case of the letters on the command line; and Medium generates a sequence of concatenated strings with added characters \u201c`\u201d and \u201c^\u201d which are generally ignored by the command line. In addition to the previous techniques, it is possible to reorder the arguments on the command-line as seen on the Heavy example, by using the syntax specify the order of execution. Lastly, the Ultra level of obfuscation uses Base64 encoded commands, and by using Base8*8 can avoid a large number EDR detections. In the wild, this is what an un-obfuscated command-line would look like: One of the simplest, and least noticeable techniques an adversary could use, is changing the case of the letters on the command-line, which is what the previously mentioned \u2018Light\u2019 technique demonstrated: The insertion of characters that are ignored by the command-line such as the ` (tick symbol) or ^ (caret symbol), which was previously mentioned in the \u2018Medium\u2019 technique, would look like this in the wild: In our examples, the command silently installs software from the website evil.com. The technique used in this case is especially stealthy, since it is using software that is benign by itself and already pre-installed on any computer running the Windows operating system. Don\u2019t Ignore the Warning Signs, Inspect Obfuscated Elements Quickly The presence of obfuscation techniques on the command-line often serves as a strong indication of suspicious (almost always malicious) activity. While in some scenario\u2019s obfuscation may have a valid use-case, such as using credentials on the command-line (although this is a very bad idea), threat actors use these techniques to hide their malicious intent. \u00a0The Gamarue and Raspberry Robin malware campaigns commonly used this technique to avoid detection by traditional EDR products. This is why it\u2019s essential to detect obfuscation techniques as quickly as possible and act on them. Using Large Language Models (LLMs) to detect obfuscation We created an obfuscation detector using large language models as the solution to the constantly evolving state of obfuscation techniques. These models consist of two distinct parts: the tokenizer and the language model. The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. In other words, the goal of the tokenizer is to separate the sentence or command-line into smaller pieces that are normalized, and the LLM can understand. The tokens into which the command-line is separated are essentially a statistical representation of common combinations of characters. Therefore, the common combinations of letters get a \u201clonger\u201d token and the less common ones are represented as separate characters. It is also important to keep the context of what tokens are commonly seen together, in the English language these are words and the syllables they are constructed from. This concept is represented by \u201c##\u201d in the world of natural language processing (NLP), which means if a syllable or token is a continuation of a word we prepend \u201c##\u201d. The best way to demonstrate this is to have a look at two examples; One of an English sentence that the common tokenizer won\u2019t have a problem with, and the second with a malicious command line. Since the command-line has a different structure than natural language it is necessary to train a custom tokenizer model for our use-case. Additionally, this custom tokenizer is going to be significantly better statistical representation of the command-line and is going to be splitting the input into much longer (more common) tokens. For the second part of the detection model \u2013 the language model \u2013 the Electra model was chosen. This model is tiny when compared to other commonly used language models (~87% less trainable parameters compared to BERT), \u00a0but is still able to learn the command line structure and detect previously unseen obfuscation techniques. The pre-training of the Electra model is performed on several benign command-line samples taken from telemetry, and then tokenized. During this phase, the model learns the relationships between the tokens and their \u201cnormal\u201d combinations of tokens and their occurrences. The next step for this model is to learn to differentiate between obfuscated and un-obfuscated samples, which is called the fine-tuning phase. During this phase we give the model true positive samples that were collected internally. However, there weren\u2019t enough samples observed in the wild, so we also created a synthetic obfuscated dataset from benign command-line samples. During the fine-tuning phase, we give the Electra model both malicious and benign samples. By showing different samples, the model learns the underlying technique and notes that certain binaries have a higher probability of being obfuscated than others. The resulting model achieves impressive results having 99% precision and recall. As we looked through the results of our LLM-based obfuscation detector, we found a few new tricks known malware such as Raspberry Robin or Gamarue used. Raspberry Robin leveraged a heavily obfuscated command-line using wt.exe, that can only be found on the Windows 11 operating system. On the other hand, Gamarue leveraged a new method of encoding using unprintable characters. This was a rare technique, not commonly seen in reports or raw telemetries. Raspberry Robin: Gamarue: The Electra model has helped us detect expected forms of obfuscation, as well as these new tricks used by the Gamarue, Raspberry Robin, and other malware families. In combination with the existing security events from the Cisco XDR portfolio, the script increases its detection fidelity. Conclusion There are many techniques out there that are used by adversaries to hide their intent and it is just a matter of time before we stumble upon something new. LLMs provide new possibilities to detect obfuscation techniques that generalize well and improve the accuracy of our detections in the XDR portfolio. Let\u2019s stay vigilant and keep our networks safe using the Cisco XDR portfolio. We\u2019d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social! Cisco Security Social Channels InstagramFacebookTwitterLinkedIn Share Share: \u00a0\u00a0Obfuscation is often used by adversaries to avoid detection. This article describes a new approach to detect obfuscation using Large Language Models.\u00a0\u00a0Read More\u00a0Cisco Blogs\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\" \/>\n<meta property=\"og:site_name\" content=\"JHC\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-21T15:05:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif\" \/>\n\t<meta property=\"og:image:width\" content=\"1\" \/>\n\t<meta property=\"og:image:height\" content=\"1\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\"},\"author\":{\"name\":\"\",\"@id\":\"\"},\"headline\":\"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm\",\"datePublished\":\"2023-11-21T15:05:37+00:00\",\"dateModified\":\"2023-11-21T15:05:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\"},\"wordCount\":1228,\"publisher\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif\",\"articleSection\":[\"Cisco: Learning\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\",\"url\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\",\"name\":\"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm - JHC\",\"isPartOf\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif\",\"datePublished\":\"2023-11-21T15:05:37+00:00\",\"dateModified\":\"2023-11-21T15:05:37+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage\",\"url\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif\",\"contentUrl\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif\",\"width\":1,\"height\":1},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/jacksonholdingcompany.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/#website\",\"url\":\"https:\/\/jacksonholdingcompany.com\/\",\"name\":\"JHC\",\"description\":\"Your Business Is Our Business\",\"publisher\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/jacksonholdingcompany.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/#organization\",\"name\":\"JHC\",\"url\":\"https:\/\/jacksonholdingcompany.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jacksonholdingcompany.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/07\/cropped-cropped-jHC-white-500-\u00d7-200-px-1-1.png\",\"contentUrl\":\"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/07\/cropped-cropped-jHC-white-500-\u00d7-200-px-1-1.png\",\"width\":452,\"height\":149,\"caption\":\"JHC\"},\"image\":{\"@id\":\"https:\/\/jacksonholdingcompany.com\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm - JHC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/","og_locale":"en_US","og_type":"article","og_title":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm","og_description":"In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their\u2026 Read more on Cisco Blogs \u200b In the security industry, there is a constant, undeniable fact that practitioners must contend with: criminals are working overtime to constantly change the threat landscape to their advantage. Their techniques are many, and they go out of their way to avoid detection and obfuscate their actions. In fact, one element of obfuscation \u2013 command-line obfuscation \u2013 is the process of intentionally disguising command-lines, which hinders automated detection and seeks to hide the true intention of the adversary\u2019s scripts. Types of Obfuscation There are a few tools publicly available on GitHub that give us a glimpse of what techniques are used by adversaries. One of such tools is Invoke-Obfuscation, a PowerShell script that aims to help defenders simulate obfuscated payloads. After analyzing some of the examples in Invoke-Obfuscation, we identified different levels of the technique: Each of the colors in the image represents a different technique, and while there are various types of obfuscation, they\u2019re not changing the overall functionality of the command. In the simplest form, Light obfuscation changes the case of the letters on the command line; and Medium generates a sequence of concatenated strings with added characters \u201c`\u201d and \u201c^\u201d which are generally ignored by the command line. In addition to the previous techniques, it is possible to reorder the arguments on the command-line as seen on the Heavy example, by using the syntax specify the order of execution. Lastly, the Ultra level of obfuscation uses Base64 encoded commands, and by using Base8*8 can avoid a large number EDR detections. In the wild, this is what an un-obfuscated command-line would look like: One of the simplest, and least noticeable techniques an adversary could use, is changing the case of the letters on the command-line, which is what the previously mentioned \u2018Light\u2019 technique demonstrated: The insertion of characters that are ignored by the command-line such as the ` (tick symbol) or ^ (caret symbol), which was previously mentioned in the \u2018Medium\u2019 technique, would look like this in the wild: In our examples, the command silently installs software from the website evil.com. The technique used in this case is especially stealthy, since it is using software that is benign by itself and already pre-installed on any computer running the Windows operating system. Don\u2019t Ignore the Warning Signs, Inspect Obfuscated Elements Quickly The presence of obfuscation techniques on the command-line often serves as a strong indication of suspicious (almost always malicious) activity. While in some scenario\u2019s obfuscation may have a valid use-case, such as using credentials on the command-line (although this is a very bad idea), threat actors use these techniques to hide their malicious intent. \u00a0The Gamarue and Raspberry Robin malware campaigns commonly used this technique to avoid detection by traditional EDR products. This is why it\u2019s essential to detect obfuscation techniques as quickly as possible and act on them. Using Large Language Models (LLMs) to detect obfuscation We created an obfuscation detector using large language models as the solution to the constantly evolving state of obfuscation techniques. These models consist of two distinct parts: the tokenizer and the language model. The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. In other words, the goal of the tokenizer is to separate the sentence or command-line into smaller pieces that are normalized, and the LLM can understand. The tokens into which the command-line is separated are essentially a statistical representation of common combinations of characters. Therefore, the common combinations of letters get a \u201clonger\u201d token and the less common ones are represented as separate characters. It is also important to keep the context of what tokens are commonly seen together, in the English language these are words and the syllables they are constructed from. This concept is represented by \u201c##\u201d in the world of natural language processing (NLP), which means if a syllable or token is a continuation of a word we prepend \u201c##\u201d. The best way to demonstrate this is to have a look at two examples; One of an English sentence that the common tokenizer won\u2019t have a problem with, and the second with a malicious command line. Since the command-line has a different structure than natural language it is necessary to train a custom tokenizer model for our use-case. Additionally, this custom tokenizer is going to be significantly better statistical representation of the command-line and is going to be splitting the input into much longer (more common) tokens. For the second part of the detection model \u2013 the language model \u2013 the Electra model was chosen. This model is tiny when compared to other commonly used language models (~87% less trainable parameters compared to BERT), \u00a0but is still able to learn the command line structure and detect previously unseen obfuscation techniques. The pre-training of the Electra model is performed on several benign command-line samples taken from telemetry, and then tokenized. During this phase, the model learns the relationships between the tokens and their \u201cnormal\u201d combinations of tokens and their occurrences. The next step for this model is to learn to differentiate between obfuscated and un-obfuscated samples, which is called the fine-tuning phase. During this phase we give the model true positive samples that were collected internally. However, there weren\u2019t enough samples observed in the wild, so we also created a synthetic obfuscated dataset from benign command-line samples. During the fine-tuning phase, we give the Electra model both malicious and benign samples. By showing different samples, the model learns the underlying technique and notes that certain binaries have a higher probability of being obfuscated than others. The resulting model achieves impressive results having 99% precision and recall. As we looked through the results of our LLM-based obfuscation detector, we found a few new tricks known malware such as Raspberry Robin or Gamarue used. Raspberry Robin leveraged a heavily obfuscated command-line using wt.exe, that can only be found on the Windows 11 operating system. On the other hand, Gamarue leveraged a new method of encoding using unprintable characters. This was a rare technique, not commonly seen in reports or raw telemetries. Raspberry Robin: Gamarue: The Electra model has helped us detect expected forms of obfuscation, as well as these new tricks used by the Gamarue, Raspberry Robin, and other malware families. In combination with the existing security events from the Cisco XDR portfolio, the script increases its detection fidelity. Conclusion There are many techniques out there that are used by adversaries to hide their intent and it is just a matter of time before we stumble upon something new. LLMs provide new possibilities to detect obfuscation techniques that generalize well and improve the accuracy of our detections in the XDR portfolio. Let\u2019s stay vigilant and keep our networks safe using the Cisco XDR portfolio. We\u2019d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social! Cisco Security Social Channels InstagramFacebookTwitterLinkedIn Share Share: \u00a0\u00a0Obfuscation is often used by adversaries to avoid detection. This article describes a new approach to detect obfuscation using Large Language Models.\u00a0\u00a0Read More\u00a0Cisco Blogs\u00a0","og_url":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/","og_site_name":"JHC","article_published_time":"2023-11-21T15:05:37+00:00","og_image":[{"width":1,"height":1,"url":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif","type":"image\/gif"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#article","isPartOf":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/"},"author":{"name":"","@id":""},"headline":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm","datePublished":"2023-11-21T15:05:37+00:00","dateModified":"2023-11-21T15:05:37+00:00","mainEntityOfPage":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/"},"wordCount":1228,"publisher":{"@id":"https:\/\/jacksonholdingcompany.com\/#organization"},"image":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage"},"thumbnailUrl":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif","articleSection":["Cisco: Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/","url":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/","name":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm - JHC","isPartOf":{"@id":"https:\/\/jacksonholdingcompany.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage"},"image":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage"},"thumbnailUrl":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif","datePublished":"2023-11-21T15:05:37+00:00","dateModified":"2023-11-21T15:05:37+00:00","breadcrumb":{"@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#primaryimage","url":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif","contentUrl":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/11\/16463100-97B2ua.gif","width":1,"height":1},{"@type":"BreadcrumbList","@id":"https:\/\/jacksonholdingcompany.com\/detecting-obfuscated-command-lines-with-a-large-language-model-michael-polak-on-november-21-2023-at-205-pm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jacksonholdingcompany.com\/"},{"@type":"ListItem","position":2,"name":"Detecting Obfuscated Command-lines with a Large Language Model Michael Polak on November 21, 2023 at 2:05 pm"}]},{"@type":"WebSite","@id":"https:\/\/jacksonholdingcompany.com\/#website","url":"https:\/\/jacksonholdingcompany.com\/","name":"JHC","description":"Your Business Is Our Business","publisher":{"@id":"https:\/\/jacksonholdingcompany.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jacksonholdingcompany.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jacksonholdingcompany.com\/#organization","name":"JHC","url":"https:\/\/jacksonholdingcompany.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jacksonholdingcompany.com\/#\/schema\/logo\/image\/","url":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/07\/cropped-cropped-jHC-white-500-\u00d7-200-px-1-1.png","contentUrl":"https:\/\/jacksonholdingcompany.com\/wp-content\/uploads\/2023\/07\/cropped-cropped-jHC-white-500-\u00d7-200-px-1-1.png","width":452,"height":149,"caption":"JHC"},"image":{"@id":"https:\/\/jacksonholdingcompany.com\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/posts\/1754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/comments?post=1754"}],"version-history":[{"count":0,"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/posts\/1754\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/media\/1755"}],"wp:attachment":[{"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/media?parent=1754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/categories?post=1754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jacksonholdingcompany.com\/wp-json\/wp\/v2\/tags?post=1754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}