method.tex 42.8 KB
Newer Older
1
\section{Design}
AbiusX's avatar
AbiusX committed
2
\label{section:method}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
3
As shown in prior research, purely static analysis of dynamic code, particularly that of an obfuscated malware, is particularly challenging~\cite{hills:2015v,Canali:2012ws}.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
4
To analyze dynamic web server malware, we propose a novel dynamic analysis approach called {\it Multi-Aspect Execution (MaX)} that can reveal masked malicious behaviors of highly evasive web server malware (Section~\ref{design:counterfactual_execution}).
5
6
Intuitively, it attempts to explore multiple aspects of malware by exercising multiple execution paths in {\it cooperative} isolated execution environments (Section~\ref{design:cooperativeisolations}).
% \sysname combines dynamic and static analysis.
7
%For the purpose of malware analysis, we use emulization, a novel approach that enable analysis of dynamic code via emulation.
8
%Specifically, \sysname runs a few dynamic paths of the application and then performs static analysis using the obtained control flow graph and other dynamic artifacts (e.g., database connections and Internet resources) to explore undiscovered execution paths. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
9
10
11
12
Each execution is isolated so that it does not affect analyses of other executions, 
\begin{newtext}
  while the global scope artifacts (e.g., database connections stored in global variables) are shared between isolated executions to facilitate the analysis.
\end{newtext} 
13
% of the target program.
AbiusX's avatar
AbiusX committed
14

15
%deobfuscation of dynamic malware
16
\begin{comment} % YK: Removed this paragraph. It was just repetition. I will put this in a box. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
17
The technique enables (1) discovery of malicious behaviors of highly obfuscated dynamic malware in a sandboxed environment that prevents infection of the host system, and (2) exposure of malicious program execution paths hidden by evasive techniques such as dynamic evaluation (e.g., dynamic function calls).
18
\end{comment}
19
20
%Thus our analysis method can be divided into two phases.
%The first phase handles decoding and deobfuscation of code so that its functional behavior can be analyzed. The second phase deals with distinguishing benign and malicious behavior.
AbiusX's avatar
AbiusX committed
21

22
%\subsection{Decoding Malware}
23
24
25



Yonghwi Kwon's avatar
Yonghwi Kwon committed
26
\subsection{Multi-Aspect Execution (MaX)}
27
\label{design:counterfactual_execution}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
28
\sysname systematically explores multiple aspects of a target program in order to expose potential malicious execution paths.
29
\begin{newtext}
AbiusX's avatar
AbiusX committed
30
31
	Specifically, \sysname employs counterfactual execution, a multi-path exploration approach coupled with cooperative state isolation that shares important artifacts among isolated executions to facilitate discovery of more code %and execution contexts 
	for malicious behavior discovery.
32
\end{newtext}
33

34
\vspace{-1em}
35
\inlinetitle{Counterfactual Execution. }
Yonghwi Kwon's avatar
Yonghwi Kwon committed
36
\sysname enables discovering parts of code that would not be accessible in a vanilla dynamic analysis~\cite{schafer2013dynamic} via a concept called {\it counterfactual execution} which forces execution into branches even if the branch conditions are not satisfied, past exit nodes, and into pieces of code that are not normally executed.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
37
Such counterfactual execution relies on state isolation to track changes made to the execution state when exploring counterfactual paths, and supports fine-grained control over state changes (e.g., reversing and backporting).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
38
%Counterfactual emulation is powered by \emph{emulation state isolation}, which tracks changes made to the emulation state when emulating counterfactual code and supports fine-grained management of these changes during analysis (e.g., reversing).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
39
This feature enables our approach to {\it unwrap, decode, and expose} the original code of obfuscated and encoded files while maintaining a valid state throughout execution, minimizing false positives and negatives.
AbiusX's avatar
AbiusX committed
40

Yonghwi Kwon's avatar
Yonghwi Kwon committed
41
%Note that counterfactual execution differs from multi-path exploration in that 
42
Counterfactual execution handles dynamic constructs such as \code{eval()}, \code{include()}, and dynamic function calls, each of which might lead to discovery of new code snippets and generation of new paths along the program execution.
Yonghwi Kwon's avatar
wording    
Yonghwi Kwon committed
43
It analyzes dynamically generated code recursively until it is not able to discover any new unique code (i.e., reached a fixed point). %actively used in server side malware, and handling them is critical.
AbiusX's avatar
AbiusX committed
44
45
46
47
48
49
50
51

%enables discovering parts of code that would not be accessible in a normal emulation. Counterfactual emulation forces emulation into branches that are not satisfied, past exit nodes, and into pieces of code that are not normally executed. Counterfactual emulation is powered by \emph{emulation state isolation}, in part supported by Emulization but also complemented in our approach.
%Emulation state isolation tracks changes made to the emulation state when emulating counterfactual code, and allows fine-grained management (e.g., reversing) of these changes during analysis. This feature enables our approach to unwrap, decode and see obfuscated and encoded code with ease, while maintaining a valid state throughout emulation that minimizes false positives and false negatives, retaining accuracy.

%The addition of these features enables our approach to see the actual behavior of the code under analysis. Thus our malware detection does not rely on syntactic signatures or detecting obfuscation methods, rather it focuses on understanding behavior of the code under analysis.

%This distinction is the main contrast between our work, and the majority of open and proprietary malware detection tools for web malware. Even the smaller set of tools that attempt to decode obfuscated code before performing the signature matching, would fail on all malware that use a dynamic component (such as a variable computed at runtime) to decode themselves.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
52
We use an example malware program to show how counterfactual execution systematically explores multiple execution paths. In Fig.~\ref{fig:evasivemalware}-(a), line 1 checks if an input is provided to the script. When no input is available, line 2 exits the script with a message (\code{die()} is the exit expression in PHP).
53
Line 3 checks to make sure that the provided input is the expected password. If not, it simply copies the malware on line 10 and terminates.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
54
If the correct password is provided, there is a loop to prevent recognition of the maliciousness of the script. After 200 iterations, and only when the loop counter is a multiple of 11, the script performs its malicious activity (e.g., send spam email).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
55
A naive dynamic analysis will be unable to expose the malicious behavior as it will be unable to drive execution past lines 2 and 3, resulting in missing the entire malicious logic.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
56

57
\begin{figure}[ht]
58
\vspace{-1em}
59
60
61
\caption{Evasive Malware Example}
\label{fig:evasivemalware}
\centering
62
\vspace{-0.5em}
63
\includegraphics[width=1.0\columnwidth]{fig/counterfactual.pdf}
64
\vspace{-2em}
65
66
67
\end{figure}

\begin{comment}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
68
69
70
71
72
73
74
75
76
77
78
79
80
\begin{lstlisting}[caption={A sandbox evading malware sample.},label=listing:counterfactual,language=PHP,basicstyle=\ttfamily,xleftmargin=.08\columnwidth,numbers=left]
<?php
  if (!isset($_GET[1]))
    die("Nothing to see here.");
  if ($_GET[1]==$password) {
    for ($i=0;$i<1000;++$i)
      if ($i>200 and $i%11==0)
        do_malicious();
    else
      do_benign();
  }
  copy_the_malware();
?>
AbiusX's avatar
AbiusX committed
81
\end{lstlisting}
82
\end{comment}
AbiusX's avatar
AbiusX committed
83

Yonghwi Kwon's avatar
Yonghwi Kwon committed
84
Counterfactual execution handles this issue by creating a new isolated execution on a predicate or on terminal events (i.e., exit expressions).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
85
Fig.~\ref{fig:evasivemalware}-(b) shows traces (i.e., executed source code line numbers) from a naive dynamic analysis and the proposed counterfactual execution.
86
87
Dynamic analysis only covers 2 lines due to the missing input at line 1.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
88
Counterfactual execution creates a new isolated state on line 2 (a terminal event) and continues past the termination. On line 3, it enters the branch even if the condition is not satisfied, while creating another (nested) state isolation.
89
Then, the loop on line 4 is executed. Within the loop there is also a predicate on line 5 where we create a new isolated state on line 5, exploring the malicious execution path on line 6.
AbiusX's avatar
AbiusX committed
90

91
While we successfully force the execution paths including the malicious function, we observe that at least one real world sample
92
does not properly expose malicious behavior.
93
We analyzed this case manually and found that it is because the malicious code is dependent on variables evaluated on other paths (e.g., Line 8 in our case).
94
For instance, we observe that it increments a variable within {\code{do\_benign()}} and the variable is used to decode malicious code in {\code{do\_malicious()}}. Hence, the execution fails if we do not execute the other path in the for-loop on line 4.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
95
96
To detect such cases where there are dependencies between the new isolated executions and other executions, we track data dependencies across executions.
Specifically, if a variable used in an isolated execution is referenced by other executions later, we consider there are dependencies between the executions.
AbiusX's avatar
AbiusX committed
97
For the cases with dependencies, a straightforward approach to handle them is to actually execute the loop without any intervention.
98

Yonghwi Kwon's avatar
Yonghwi Kwon committed
99
\rtbox{
Yonghwi Kwon's avatar
Yonghwi Kwon committed
100
  \emph{Counterfactual execution} forces execution into branches even if the branch conditions are not satisfied, past exit nodes, and into pieces of code that are not normally executed, enabling discovery of dynamically generated code recursively until we comprehensively cover them. To this end, \sysname effectively exposes hidden malicious code in sophisticated malware.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
101
}
102
%\subsection{Optimizations}
103
%The branch on line 7, although generally not satisfied by emulation, is counterfactually executed and thus the malicious behavior is observed.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
104

105
\vspace{0.5em}
106
\inlinetitle{Control Flow Trimming. }
107
While counterfactual execution effectively exposes malicious behaviors of malware, when an execution is dependent on variables defined on an unexplored path, the execution will fail and malicious behaviors will not be revealed.
108
While an easy way to handle them is to actually execute the loop without any intervention, we find that in many cases malware writers intentionally put long running loops in order to prevent malware from being analyzed by dynamic analysis.
AbiusX's avatar
AbiusX committed
109
From our experience, delaying the malicious actions for a few minutes effectively prevents them from being recognized by most dynamic analysis techniques.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
110
%we observed malware samples that run for more than a few days before they expose malicious behaviors.
AbiusX's avatar
AbiusX committed
111
To handle such cases and facilitate malicious behavior discovery, we propose {\it Control Flow Trimming} as a systematic approach.
112

AbiusX's avatar
AbiusX committed
113
Specifically, when an execution fails to reveal malicious behaviors due to the dependencies between the execution and unexplored execution caused by the counterfactual execution, it first executes malware without any intervention.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
114
As it may require an extended period of time to expose the malicious behavior, during runtime we measure how many times each execution path takes.
AbiusX's avatar
AbiusX committed
115
When we observe a particular path is dominating the execution, preventing exploration of other execution paths, we create a new isolated execution state and force the new execution to explore the other uncovered paths.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
116
If the new execution is successful, we conclude the analysis of the time consuming loop.
117
118
119
We determine a new execution is successful if we discover any new executed statements or execution states compared to those in the first counterfactual execution that covers the same path.

\begin{figure}[ht]
120
%\vspace{-1em}
121
122
123
\caption{Control Flow Trimming Example on Fig.~\ref{fig:evasivemalware}}
\label{fig:cfgtrimming}
\centering
124
\vspace{-0.5em}
125
\includegraphics[width=1.0\columnwidth]{fig/controlflow_trimming.pdf}
AbiusX's avatar
AbiusX committed
126
\vspace{-2em}
127
128
\end{figure}

Yonghwi Kwon's avatar
Yonghwi Kwon committed
129
Fig.~\ref{fig:cfgtrimming}-(a) shows a control flow graph of the partial code in Fig.~\ref{fig:evasivemalware}-(a) (Lines 3-10).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
130
The label of each node represents its line number. Edges represent control flows. Note that line 6 with red color is the call to malicious code.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
131
  Fig.~\ref{fig:cfgtrimming}-(b) shows a weighted control flow graph (CFG) after 100 iterations of the lines 4-8. Note that edges between the nodes 3, 4, 5, and 8 are thick indicating that the path is executed frequently.
132
Specifically, we increase a count for each edge between nodes every time it executes.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
133
For instance, after the 100 iterations, each thick edge (e.g., the edge between 3 and 4) will have 100 for the counter value.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
134
135
When a counter value reaches a predetermined threshold (100 in this paper), we apply the control flow trimming method.
In particular, for each node that has an edge with a counter value reached the threshold, we check whether there is an alternative path (i.e., edge). If it does and the alternative path's counter value is less than the threshold, we drive the execution to the alternative path.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
136
Essentially, we trim the control flow that reached the threshold, driving the execution to unexplored paths.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
137
138

\smallskip
139
\noindent
Yonghwi Kwon's avatar
Yonghwi Kwon committed
140
{\it -- Runtime Threshold Adjustment:}
141
We observe that there are malware samples that require a larger threshold to successfully execute malicious behaviors.
142
To handle such cases, \sysname incrementally increases the threshold by a factor of 2.
143
Fig.~\ref{fig:cft_threshold_adjust}-(a) shows an example. The program has a loop (Lines 1-5) and within the loop, it first executes {\tt do\_benign()} which takes more than 10 seconds (to deliberately hinder dynamic analysis) and then updates the decryption key (Line 4).  Then, the key is used to decrypt the malicious code and execute via {\tt eval()}(Line 6).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
144
145
146

% 100 -> 200 -> 400 -> 800 : so we must stop around 800
%when analyzed with a threshold of 100, while a handful of samples require threshold to be set to 1,000. Increasing trimming threshold above 1,000 does not seem to change analysis results for benign or malicious samples.
147
148

\begin{figure}[ht]
149
\vspace{-1em}
150
151
152
\caption{Adjusting Threshold in Control-Flow Trimming}
\label{fig:cft_threshold_adjust}
\centering
153
\vspace{-0.5em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
154
\includegraphics[width=0.8\columnwidth]{fig/cft_threshold_adjust.pdf}
155
\vspace{-1em}
156
157
158
\end{figure}


Yonghwi Kwon's avatar
Yonghwi Kwon committed
159
Fig.~\ref{fig:cft_threshold_adjust}-(b) presents a trace from a naive dynamic analysis. It iterates the loop 1,000 times, executing the time-consuming code {\tt do\_benign()} (Line 2) 1,000 times as well. It requires around 2 hours 46 minutes to reach the malicious code.
160
161
162
163
164

Fig.~\ref{fig:cft_threshold_adjust}-(c) shows a trace from the counterfactual execution. It quickly reaches the malicious code (Line 6). However, as it skipped the loop iterations, the decryption key ({\tt \$key}) is not correct, resulting in the failed execution at {\tt eval()} (We consider invalid code passed to {\tt eval()} as a failure).

Fig.~\ref{fig:cft_threshold_adjust}-(d) represents the first attempt of control-flow trimming with the threshold 100. It iterates the loop 100 times and then tries to execute the malicious code. However, due to the insufficient decryption key update, the execution fails.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
165
\sysname then increases the threshold by a factor of 2. Fig.~\ref{fig:cft_threshold_adjust}-(e) is a trace from the second attempt with the updated threshold 200. After the 200 iterations, it executes the malicious code successfully. Note that the key is only updated during the first 200 iterations. %(Line 3).
Yonghwi Kwon's avatar
Yonghwi Kwon committed
166

Yonghwi Kwon's avatar
Yonghwi Kwon committed
167
168
Our evaluations show that the strong majority of malware expose their malicious behavior with the default threshold of 100. 
A handful of samples triggers the runtime threshold adjustment algorithm, increasing the trimming threshold up to 800. 
169
We manually verified whether the runtime adjustment is sufficient or not by observing the analysis results with different default thresholds. Specifically, we run the experiments with 7 different thresholds: 100, 200, 400, 800, 1,600, 3,200, and {\it unlimited}. The experiment result shows that the threshold above 800 does not discover any new dynamic code, indicating {\it the runtime threshold adjustment is effective in discovering dynamic code without any manual intervention}.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
170

Yonghwi Kwon's avatar
Yonghwi Kwon committed
171
\rtbox{
172
  \emph{Control flow trimming} (CFT) ensures that analysis finished in a reasonable time, by first limiting loops to a threshold of 100 iterations and then increasing the threshold by a factor of 2 until the execution does not observe any failed statements (e.g., {\tt eval()} with a string that contains invalid code). Due to this dynamic adjustment of the threshold, \sysname can effectively and efficiently discover malicious code.  
Yonghwi Kwon's avatar
Yonghwi Kwon committed
173
	%If no malicious behavior is exposed, this threshold is dynamically increase at runtime by a factor of 10 until it covers all iterations.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
174
}
175

176
177
178
%\vspace{1em}
\subsection{Cooperative Isolated Execution}
%\inlinetitle{Cooperative Isolated Execution.}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
179
\label{design:cooperativeisolations}
180

181
\sysname provides a {\it cooperatively isolated execution} environment to (1) isolate each execution path of the program and (2) cooperatively share resources resolved in each isolated execution in order to help discover dynamically loaded code snippets (e.g., through {\tt include}).
182
%to identify malware relying on execution of dynamic code. % by leveraging a novel technique called \emph{cooperative isolated execution}.
183
The isolated executions are nested, and for each dynamically generated part of the program, a new isolation is created. Each execution is isolated so that state changes/errors in one execution would not inadvertently affect the other executions. However, they are also cooperative to help discover more execution contexts (e.g., database connections, configuration variables, function/class definitions, etc.) which can lead to exposing malicious behavior (e.g., malicious code resides in an external module loaded dynamically). 
184
This cooperation enables us to discover more of the application code. Specifically, without the cooperative isolation scheme, \sysname covers 36,034 statements of Wordpress whereas \sysname covers 58,786 with the cooperative isolation (Details in Appendix~\ref{appendix:counterfactual}).
185
%Specifically, after analysis under one isolation, analysis results including execution contexts are shared with others.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
186
187
188
189
190
191
192
193
194
195
%This collective and cooperative analysis helps \sysname discover malicious code much more effectively than state-of-the-art techniques~\cite{xxx}.


%It provides realistic environments that malware might expect in order to effectively expose potentially malicious code. It also prevents malware from affecting the host system during analysis.

%Together, these features form an isolated counterfactual execution environment that observes and tests the behavior of potentially malicious code.
% and \emph{counterfactual execution} techniques.
%The emulization approach used to identify malware relies on robust emulation of dynamic code, \emph{sandboxing} and \emph{counterfactual emulation}.
%Together, these features form an isolated counterfactual execution environment that observes and tests the behavior of potentially malicious code.
%We use emulation with two additional features to help with malware detection, \emph{sandboxing} and \emph{counterfactual emulation}.
196
\begin{newtext}
197
\inlinetitle{Cooperative Isolations.} As each isolation explores a single execution path, there are artifacts (e.g., variables, resources, constants, etc.) that are unresolved in one particular isolation while they are resolved in other isolations.
198
If such artifacts are used in creation of dynamic behavior (e.g., used in {\tt include} or {\tt eval}), the analysis will not be able to resolve them and its results might be limited.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
199
%When those unresolved variables are used to create or include new code snippets, the analysis results might be limited. 
200
201
{\it Cooperative isolated execution}'s role is to share artifacts discovered in one isolation with other isolations to provide a resolution for such unresolved artifacts such as dynamically included files, environment variables, database connections, etc.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
202
203
204
205
206
\smallskip
\noindent
{\it -- Global Scope Artifacts:}
{\it Artifacts belonging to the global scope are shared}, such as function definitions, class definitions, constants, global variables, environment variables, etc.
Note that dynamic languages such as PHP allow redefinition of functions and classes. %, as well as checking whether such an artifact is defined (e.g., \code{function\_exists()}).
207
208
%, so that they can be accessed by any functions and modules. We call them global execution contexts.

209
210
211
The insight for such sharing is because PHP applications commonly leverage global scope artifacts to implement dynamically loaded plugin modules. %architecture
%the common pluggable architecture of PHP application,
%where plugins and modules can be added and replaced with minimal effort.
212
For example, Joomla uses configuration files to decide which subset of its core modules to load, and Wordpress uses database values to determine which plugins are active in an installation, and thus need to be loaded and executed.
AbiusX's avatar
AbiusX committed
213
These global scope artifacts can further be modified throughout program execution, resulting in additional modules being loaded and executed.
214
Specifically, a loaded Wordpress plugin can then use its own configuration parameters, and load another plugin, or redefine a core function/class (more details in Appendix~\ref{appendix:metrics}).
215

216
217
218
% it is common to share resources between function calls and modules (e.g., configuration settings, database connections, and class/function definitions). 
%These programs define what modules will be included in each setup of the program
%To share those resources, they are often stored in a place that can be accessed by other functions and modules (e.g., global variable). Specifically, function/class definitions and global variables can be accessed in any context under the same namespace. They are often used to store configurations/options and application-wide database connections.
219

Yonghwi Kwon's avatar
Yonghwi Kwon committed
220
Note that cooperative isolations do not share local scope artifacts such as local variables. 
221
%Furthermore, because local scope artifacts tend to have widely different values across execution paths, sharing them with other executions is more likely to yield incorrect states.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
222
223
Intuitively, local artifacts are not meant to be shared between functions and modules while global artifacts are often meant to be shared. 
%YK: Any supporting data would be great.
224
225
226
%Moreover, the number of shared resources will be significantly more. If we share local contexts, the state-explosion and path-explosion problems will occur. %From our experiments, our technique discovers significantly more code with sharing global scope resources, without imposing substantial overhead.
	
\end{newtext}
227
228
229



Yonghwi Kwon's avatar
Yonghwi Kwon committed
230

Yonghwi Kwon's avatar
Yonghwi Kwon committed
231
232
233


\begin{figure}[ht]
Yonghwi Kwon's avatar
Yonghwi Kwon committed
234
  \vspace{-1em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
235
  \caption{Cooperative Isolated Execution}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
236
  \vspace{-0.5em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
237
238
  \label{fig:cooperativeisolations}
  \centering
Yonghwi Kwon's avatar
Yonghwi Kwon committed
239
  \includegraphics[width=1.0\columnwidth]{fig/cooperativeisolations_new.pdf}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
240
  \vspace{-2em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
241
242
\end{figure}

AbiusX's avatar
AbiusX committed
243

Yonghwi Kwon's avatar
Yonghwi Kwon committed
244
\noindent
AbiusX's avatar
AbiusX committed
245
%{\it -- Example: }
246
%{\color{red} FIXME: example was modified slightly. Make sure to review it. Also the figure needs to be updated, "global execution context" was replaced with "global artifacts".}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
247
%Fig.~\ref{fig:cooperativeisolations} shows an example of how cooperative isolated execution works. 
AbiusX's avatar
AbiusX committed
248
249
Fig.~\ref{fig:cooperativeisolations} shows how \sysname works on a program that establishes a database connection, then populates a global configuration variable (\code{\$config}) from the database. Using the populated configuration variable, the program then loads a plugin which contains function/class definitions that include malicious code.
%one of the plugins contains malicious code. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
250
251

In Fig.~\ref{fig:cooperativeisolations}, there are three isolated executions.
252
Isolated Execution 1 resolves a database connection while Isolated Executions 2 and 3 fail to do so because they take different execution paths, depicted as different curves in Fig.~\ref{fig:cooperativeisolations}.
253
%, they do not have a valid database connection. 
AbiusX's avatar
AbiusX committed
254
255
256
However, Isolated Executions 2 and 3 cover code that populates the configuration and loads plugins respectively. 
Without the database connection, even though Isolation Executions 2 and 3 cover critical parts of the program that might expose malicious code, they would not be able to load the malicious plugin due to the unresolved database connection.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
257
With \sysname, Isolated Execution 2 can retrieve the database connection resolved by Isolated Execution 1. Furthermore, Isolated Execution 3 is able to load the malicious plugin leveraging the populated global variable \code{\$config} from Isolated Execution 2. %, which is not the one actively used by the website (containing the malicious code).
AbiusX's avatar
AbiusX committed
258

Yonghwi Kwon's avatar
Yonghwi Kwon committed
259
260
261
 

\inlinetitle{Sandboxing. } \sysname allows malware to access system resources (e.g., files or database) while preventing persistent modifications to the external system state. As a result, malware will be executed as if it runs on the system natively without harming the underlying host system. % even if malicious code is executed.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
262
263
\sysname achieves this protection via virtualizing access to external resources such as files, networks, databases, etc. and redirecting them to emulated resources, while using containers (i.e., Docker) to ensure it cannot damage the host.
%that even if malware evades this virtualization, it cannot damage the host.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
264

Yonghwi Kwon's avatar
Yonghwi Kwon committed
265
To implement this, we override PHP functions that can alter the system objects (e.g., files and database) to redirect the accesses to the objects to virtualized system objects. We allow malware to modify the virtualized objects as they do not harm the host system and provide more insights into the intent of malware.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
266
267
268
269
270
271
272
273
274
275
276
277
278
279
%and do not allow them to change the system state directly.
%Instead, we duplicate system objects (e.g., files and database) and redirect the accesses to those duplicated objects. 
%This overriding allows us to maintain fine-grained control over malware resource manipulations for our analysis.
%Our system overrides several PHP system functions and classes with ones that do not alter the external system behavior.

In our prototype, 31 functions and classes are explicitly virtualized. For example, \code{fopen()} will be \textit{proxied} (i.e., forwarded to the original function) if it is in {\it read} mode.
If it is in {\it write} mode, the file will be duplicated and the file accesses will be redirected to the duplicated file (i.e., the access is sandboxed).
Similarly, the function \code{unlink()} would not remove the actual file in the host. The file will be duplicated once and unlinked, successfully simulating \code{unlink()}. If there is another attempt to call \code{unlink()} on the same file, as \sysname remembers the file is already duplicated, it will not duplicate the file again and the \code{unlink()} will fail.
%explicitly sandboxed. Any attempt to call unlink on a file (and thus remove it) will simply return false.

%Explicitly and implicitly sandboxed functions are dubbed \emph{potentially malicious functions} (PMF henceforth). \emph{Safe functions} refers to whitelisted functions.


\rtbox{
280
  {\it Cooperative isolated execution} allows \sysname to analyze behavior of malware within a cooperative sandbox, sharing artifacts obtained from each isolated execution with other isolations, facilitating path discovery process. With the help of cooperative isolated execution, we discover paths containing 22,752 additional (38\% of the total code) statements in Wordpress.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
281
282
283
284
%YK: Just to point to the relevant eval.
%ABX: it is from another [unpublished] paper, it's not part of eval in this paper!
% (Details in Section~\ref{relevant_eval_section}).
}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
285
\vspace{-1em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
286

287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
%cft_threshold_adjust.pdf
% for $i = 0; ...
%   $key += $keycode[$i];
%   if ($i > 400)
%      if($key % 5 == 0)
%         $src = str_rot13( base64_decode($src) );
%      else if($key % 7 == 0)
%         $src = base64_decode( str_rot13($src) );
%      else if($key % 13 == 0)
%         $src = str_rot13( base64_decode( str_rot13($src) ) );
% gzinflate(str_rot13(base64_decode
%openssl_decrypt ( string $data , string $method , string $key [, int $options = 0 [, string $iv = "" [, string $tag = "" [, string $aad = "" ]]]] )

%
%To handle such cases, if the execution with the control flow trimming fails, meaning that it does not discover any new statements and states, we double the threshold and run again until we observe a successful execution after the control flow trimming.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
303
304
%The loop on line 4 is executed several times, but not a thousand times. First, a fixed number (e.g., 100) of iterations are executed, then a new isolation is created and several other possible indexes are tested (e.g., boundaries, random steps).
%Even if the branch condition on line 7 is not satisfied, counterfactual emulation forces execution into it and observes the malicious behavior on line 8.
305

Yonghwi Kwon's avatar
Yonghwi Kwon committed
306
%\YK{We can split the counterfactual execution into two subsection. For instance, (1) Counterfactual execution is only about creating isolated execution environment upon a branch. (2) Changing environment (e.g., loop count) will be the control flow trimming part to fight against evasive techniques.}
307

AbiusX's avatar
AbiusX committed
308
%The limitation of this approach is when the malicious behavior is dependent on the \emph{branch invariant}. For example, in the listing above, \code{do\_malicious()}'s activity can be made semantically dependent on the value of \code{\$i}, resulting in semantically different benign behavior in the counterfactual mode.
AbiusX's avatar
AbiusX committed
309
% counterfactual invariant, which does not satisfy the branch invariant but instead forces emulation into the branch, will fail to find it.
AbiusX's avatar
AbiusX committed
310
311
312
%However, our evaluations show that this is very rare (only one instance in the benchmarks collected).
%Although future malware may evade detection by utilizing this limitation, Emulware could mitigate the limitation via constraint solvers.
%Incorporating a constraint solver is future work.
AbiusX's avatar
AbiusX committed
313
314
315
316
317
318
319
%whereas employing a constraint solver can help reduce the surface of this limitation.
%the current approach, while employing a constraint solver in the future can help our approach correctly emulate many of these invariants.


%jwd: Abbas: Anh and I could not figure out exactly what you were trying to say, so we are deleting it.  It did not seem
%that important.

Yonghwi Kwon's avatar
Yonghwi Kwon committed
320
%It is noteworthy that several PHP functions have features that are not well specified. For example, the family of \code{fopen} functions in PHP can work on URLs and several other forms of addresses, as well as the traditional file system addresses. Sandboxing these functions requires particular attention to such behavior to prevent any potentially malicious behavior.
AbiusX's avatar
AbiusX committed
321
%A whitelist approach enables us to check PHP documentations for each function added to the whitelist to make sure there is no unexpected behavior.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
322
323

\subsection{Proof of Concept (PoC) Automated Malware Detector: \toolname}
324
In this section, we present our proof of concept malware detection tool, \toolname, to demonstrate the effectiveness of malware analysis primitives provided by \sysname in comparison with state-of-the-art malware detection tools. 
325
326
PHP was chosen as the target language for the prototype as it is used by 79\% of all websites, and is also responsible for 71\% of all server-side malware~\cite{sucuri_2017_report,phpusage}.

327
It is important to note that the purpose of this tool is to demonstrate the effectiveness and practicality of concepts discussed in this section, rather than proposing a malware detector as a core contribution of the paper.
328
\toolname is built on top of \sysname, leveraging advanced malware analysis capabilities such as cooperative isolated execution and counterfactual execution. However, \toolname differs from \sysname as it needs to make a decision on whether a given program is malicious or not. \toolname employs several straightforward heuristics for this decision. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
%It is important to note that limitations caused by those heuristics are independent of the limitations of \sysname. 


%\subsubsection{Automated Malware Detection}
\label{section:malware-detection-tool}
\label{section:detecting-malware}
%To demonstrate the effectiveness of \sysname in detecting malware in practice, we build a %tool for malware detection similar to AV tools.
%jwd-I think this was again left over from the old organization.
%In this section, we present the design and evaluation of \sysname.

%Our sandbox mocks several PHP system functions and classes with ones that do not alter the external system behavior. In our sandbox, 31 functions and classes are explicitly mocked for this purpose. For example, \code{fopen()} will be \textit{proxied} (i.e., forwarded to the original function) if it is in read mode, while it will be mocked and sandboxed if it is used in write mode.

%The sandbox also includes a whitelist. The whitelist contains more than 300 manually added functions and classes, all of which are considered safe according to specifications.
%Functions that are potentially dangerous have been explicitly sandboxed, and all other functions are assumed unsafe and implicitly sandboxed.
%The majority of whitelisted functions are state introspection functions, as well as data (e.g., string) manipulation functions, such as regular expression operations and type casts.

%The remaining functions (i.e., those not explicitly sandboxed and not whitelisted) are considered unsafe and are implicitly sandboxed, by returning generic results. For example, the function \code{unlink()} is not whitelisted and is not explicitly sandboxed. Any attempt to call unlink on a file (and thus remove it) will simply return false.


\noindent
{\bf Measuring Maliciousness. }
\toolname categorizes PHP functions into two different types: \emph{Potentially Malicious Functions} (PMF) and \emph{Safe Functions} (SF).
351
Functions that can change system states (e.g., \code{system()}, \code{fwrite()}, and \code{unlink()}) are classified as PMF. Functions that do not affect system state such as program state introspection functions, data (e.g., string) manipulation functions (e.g., regular expression operations and type casts), and arithmetic functions are categorized as SF.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
352
353
354
355
356
357
358
359
360
%Explicitly and implicitly sandboxed functions are dubbed \emph{potentially malicious functions} (PMF henceforth). \emph{Safe functions} refers to whitelisted functions.
%}

%Aside from explicitly sandboxed functions, everything not listed in the \emph{whitelist} will be implicitly sandboxed, by returning generic results.
%The majority of potentially malicious functions and classes are implicitly sandboxed and return generic results. For example, the function \code{unlink()} is not whitelisted and is not explicitly sandboxed. Any attempt to call unlink on a file (and thus remove it) will simply return false.

We define two metrics for determining whether code is malicious or benign: {\it PMFR (Potentially Malicious Functions Ratio)} and {\it MS (Maliciousness Score)}. \emph{PMFR} is the number of potentially malicious functions invoked in the code, divided by the total number of invoked functions.
The threshold for this metric should be low but cannot be close to 0 as benign applications can also call system state changing functions (i.e., PMF).
%The threshold for this metric has been set at 5\%, to allow for minor potentially malicious functionality that might exist in benign code.
361
\emph{MS (Maliciousness Score)} is a value computed based on the 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
362
363
364
365
%\begin{newtext}
%prevalence and impact 
%\end{newtext}
amount and intensity of potentially malicious activity.
366
Each function has a maliciousness score between 0 and \begin{newtext}2\end{newtext}, depending on its parameters and behavior, inspired by that function's prevalence among popular malware. For example, \code{file\_get\_contents()} can fetch a URL, a file, or standard input, corresponding to the scores of 2, 0 (if the file is within program directory, otherwise 2) and 1.
367

Yonghwi Kwon's avatar
Yonghwi Kwon committed
368
369

Another important aspect of web-server malware is that they rely on dynamic constructs to decode and execute malicious code, sometimes nesting several layers of encoding and dynamic evaluation to evade detectors.
370
MS considers such nested execution layers into account. Specifically, each function's MS is multiplied by the \emph{dynamic evaluation nesting depth} times 10. Specifically, every time the code uses \code{eval()}-like constructs, the dynamic evaluation nesting depth is increased by 1. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
371
372
%
%Although each implicitly sandboxed function has a default maliciousness score of 2, it is multiplied by the depth of \emph{dynamic evaluation depth} times 10. Every time the code uses \emph{eval()}-like constructs to execute data as code, the dynamic evaluation depth is increased by one.
373
For example, a single use of the function \code{system()} in a normal piece of code will yield a maliciousness score of 1 while using it inside an \code{eval} will yield a malicious score of 10.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
374
375
376
377
378
379
380
381
382
383
384



%The threshold for considering a maliciousness score as malicious was set at 10, that is, at least 5 uses of implicitly malicious functions, or at least one use of dynamically evaluated malicious functions.


Intuitively, a higher PMFR suggests that the program contains significant malicious behavior compared to benign behavior, suggesting that the program is likely malicious.
%The intuition behind the PMFR is that any code containing potentially malicious behavior above a certain ratio is likely to be malware.
%The 5\% threshold loosely translates to one in every 20 statements, as we expected malware to do several benign functions before performing its malicious behavior.
%However, MFUR failed to identify a particular malware sample (number 3 in our benchmark) that used bloated object oriented code to deobfuscate itself and perform a single high-impact malicious activity.
MS, on the other hand, is useful in detecting surgical malware, i.e., malware that either injects itself into a benign code, or malware that does a significant amount of benign work (e.g., system inspection) before performing a surgical attack (e.g., a single shell command).
385
%.MS was devised to help identify such malware. Code that employs many benign functions to unwrap itself and set the environment for performing its malicious activity, would get past the MFUR threshold, but because the malicious activity is performed inside nested dynamic evaluation, MS is more likely to identify it.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
386
%
Yonghwi Kwon's avatar
Yonghwi Kwon committed
387
388
389
390
391
%Both threshold numbers were chosen arbitrarily, but performed unexpectedly well in evaluation (Section~\ref{section:evaluation}).
%We also computed the total number of executed statements and expressions, which can be used to reduce false positives with respect to the maliciousness score.
%If either of these metrics reach the defined threshold, the code is reported as malicious.
%Finer-grained metrics are also reported by the tool to further evaluate false positives and false negatives.
%Fig.~\ref{fig:ms_pmfr_threshold} demonstrates how the majority of benign files have a MS and PMFR of zero, and virtually all of them fall under the 20/5\% threshold.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
392
%
393
394
Note that \toolname metrics are defined in a simple, straightforward way, as the point of this prototype is simply to show the effectiveness of the analysis techniques in exposing malicious behavior, which can be detected with more fine-grained metrics as part of future research. 
%Note that the definitions of MS values can be defined in finer-grained ways. As our focus is to demonstrate the effectiveness of \sysname, even with those simple and less rigorous definitions, we did not try to improve the definitions. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
395
396


Yonghwi Kwon's avatar
Yonghwi Kwon committed
397
\noindent
Yonghwi Kwon's avatar
Yonghwi Kwon committed
398
399
400
{\it -- Defining MS and PMFR Thresholds:}
\toolname detects a sample as malware if either MS or PMFR reaches predefined thresholds: 5\% and 20 for PMFR and MS respectively.

AbiusX's avatar
AbiusX committed
401

Yonghwi Kwon's avatar
Yonghwi Kwon committed
402
The thresholds are obtained by analyzing MS and PMFR from a set of benign and malware samples. Specifically, we select 509 malware samples from known malware  repositories~\cite{malware_repo1, malware_repo2} and benign samples from benign web applications~\cite{phpmyadmin, cakephp}.
403
The two repositories for malware~\cite{malware_repo1, malware_repo2} are independent collections of malicious PHP scripts found in the wild, 619 total (from April 2016 to April 2019) retaining 509 samples reported as malicious by VirusTotal. 
Yonghwi Kwon's avatar
Yonghwi Kwon committed
404

Yonghwi Kwon's avatar
Yonghwi Kwon committed
405

Yonghwi Kwon's avatar
Yonghwi Kwon committed
406
\begin{figure}[htb]
Yonghwi Kwon's avatar
Yonghwi Kwon committed
407
  \vspace{-1.5em}  
AbiusX's avatar
AbiusX committed
408
	\caption{MS and PMFR Scores of Malware and Benign Samples (Red: Malware, Blue: Benign).}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
409
\vspace{-0.8em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
410
411
  \label{fig:ms_pmfr_distribution}
	\centering
Yonghwi Kwon's avatar
Yonghwi Kwon committed
412
413
	\includegraphics[width=0.9\columnwidth]{fig/ms-pmfr.png}
  \vspace{-1.5em}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
414
415
\end{figure}

416
417
418
419
Then we iteratively select incrementally larger random subsamples,
obtaining PMFR and MS values until reaching a fixpoint, where increasing the random subsample size does not change the threshold anymore.
The fixpoint is reached at 400 samples, as depicted in Fig.~\ref{fig:ms_pmfr_distribution}. 
X-axis and Y-axis represent MS and PMFR of the samples respectively. Note that the distribution of each of MS and PMFR are diverse. Some of the malware leave a large MS footprint because they do significant malicious work, while having low PMFR due to being injected in the middle of benign programs. Fig.~\ref{fig:ms_pmfr_distribution} also depicts how some other malware, contrary to the previous group, have high PMFR and low MS, as they are relatively small files that do focused malicious activity (e.g., copy files) and do not include any other code, thus their MS remains low.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
420

Yonghwi Kwon's avatar
Yonghwi Kwon committed
421

Yonghwi Kwon's avatar
Yonghwi Kwon committed
422
423


424
\begin{comment}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
425
426
427
428
429
430
431
\begin{figure}[htb]
	\caption{Enlarged from Fig.~\ref{fig:ms_pmfr_distribution} (Red: Malware, Blue: Benign).}
  \label{fig:ms_pmfr_threshold}
	\centering
	\includegraphics[width=0.8\columnwidth]{fig/threshold-export.pdf}
	%\vspace{-3em}
\end{figure}
432
\end{comment}
Yonghwi Kwon's avatar
Yonghwi Kwon committed
433

434
435
436
437
Observe that most benign samples have both 0 MS and PMFR. There are two benign samples that have 10 MS values while their PMFR are 0.
Fig.~\ref{fig:ms_pmfr_distribution} also includes an enlarged graph near the 0 MS and PMFR to more clearly depict the threshold.
Observe that all malicious samples have either {\it larger than 20 MS value or 5\% PMFR}. %As a result, there is no malware in the blue box in Fig.~\ref{fig:ms_pmfr_threshold}.
%To this end, we choose the largest MS and PMFR within the blue box as thresholds.
Yonghwi Kwon's avatar
Yonghwi Kwon committed
438
439
440
441
%The thresholds are determined by authors' domain expertise and insight and also are configurable.

%To aid with false positive/negative results, our tool generates a sorted list of PMFR and MS per scanned file in an application, enabling administrators to manually inspect results (and adjust the thresholds if necessary).

Yonghwi Kwon's avatar
Yonghwi Kwon committed
442
443
444



445
\begin{newtext}
446
447
	We also performed a sensitivity analysis on \emph{dynamic evaluation nesting depth} coefficient (i.e., 10), and noticed that reducing it to 1 will result in up to 3\% false negatives in our datasets, while setting it at 9 will result in 1.5\% false negatives in our evaluations. Setting the coefficient to 10 and above resulted in no false negatives.
	False positives however, were consistently zero in the sensitivity analysis, most likely because our dataset does not include any obfuscated code blocks that utilize malicious functions (Details in Appendix~\ref{appendix:metrics}).
448

449
450
451
452
453
454
455
456
\end{newtext}


\rtbox{
	To evaluate the effectiveness of malware analysis primitives provided by \sysname in comparison with state-of-the-art malware detection tools, we build \toolname, a prototype PHP malware detector based on \sysname. 
	\toolname uses two metrics, Maliciousness Score (MS) and Potentially Malicious Function Ratio (PMFR), and we systematically search and define the thresholds of 20 (for MS) and 5\% (for PMFR) by iteratively analyzing increasingly larger subsamples of ground truth dataset until reaching a fixpoint.
	The thresholds are reconfigurable and \sysname's capabilities {\it do not depend on the thresholds}.
}