Commit e25d5b71 authored by AbiusX's avatar AbiusX
Browse files
parents a36c4be4 e6f1c00e
......@@ -14,9 +14,8 @@
%In \toolname, we assign maliciousness scores for PHP functions.
\noindent
{\bf Observations.}
We observe that malware (including the notorious c99~\cite{c99} and other Webshells) densely utilize functions (e.g., \code{getcwd()}) used to inspect and modify the operating system and execution environment.
However, there are benign applications (e.g., phpMyAdmin) that also use these functions. Moreover, we notice that in practice, malware are injected in the middle of benign applications to make it harder for detection.
We observe that malware (including the notorious c99~\cite{c99} and other Webshells) densely utilize functions that inspect and modify the operating system and execution environment.
However, some of the functions are also used in benign applications (e.g., phpMyAdmin). Moreover, we notice that in practice, malware are injected in the middle of benign applications to make it harder for detection.
%We devise MS to handle such surgical malicious activity injected in the middle of a benign application.
......@@ -24,10 +23,9 @@ To minimize false positives caused by those functions used by both malware and b
%MS needs to only apply to functions that are rarely used in benign applications, e.g., uncompressing data that includes code.
We also consider functions that are executed as part of dynamically generated code more malicious than those executed without dynamic code generation.
This is because malware want to hide their payload via obfuscations which are often implemented via dynamic code generation techniques.
This is because malware often hide their payload via obfuscations which are commonly implemented via dynamic code generation techniques.
%The insight is that signatures for potentially malicious functions used in a malicious code are quickly discovered and updated in signature-based scanners. Thus malicious actors need to hide the fact that they use these functions in their payloads by obfuscating them.
%
It is possible for a benign application to obfuscate code blocks with high MS scores for legitimate reasons. However, in our experience this is very rare.
%, and thus we choose to raise an alarm and cause a false positive that can be dismissed with manual inspection, rather than fail to find the malware and result in a false negative.
......@@ -47,28 +45,41 @@ It is possible for a benign application to obfuscate code blocks with high MS sc
\noindent
{\bf Determining Potentially Malicious Functions and Their Maliciousness Scores.}
%
%
As mentioned in Section~\ref{section:detecting-malware}
we categorized PHP functions into two categories: \emph{Potentially Malicious Functions} (PMF) and \emph{Safe Functions} (SF).
As listed in Table~\ref{table:function-whitelist}, 294 functions were categorized as SF manually.
SFs are executed normally and have a maliciousness score of zero.
As listed in Table~\ref{table:function-whitelist}, 294 functions were categorized as SF manually.
SFs do not affect system state hence they are executed normally and have a maliciousness score of zero.
The remainder of PHP functions are categorized as PMF.
%TODO: maybe remove the next sentence.
In PHP 7.2 (with default extensions), there are 1,438 PMFs.
As we don't want PMFs to execute and perform their malicious activity,
In PHP 7.2 (with default extensions on Mac OS), there are 1,438 PMFs.
To prevent PMFs from affecting the host system,
%As we do not want PMFs to execute and perform their malicious activity,
the analysis replaces them with a function that immediately returns null.
\sysname neutralizes all functionalities that affect the system state.
PMFs have a maliciousness score of 1.
Besides, we have identified 31 functions that are frequently used in both malicious and benign applications.
To better capture the execution context of those functions (i.e., whether the functions are used in malware or not), we assign fine-grained scores for each of the function as shown in Table~\ref{table:sandboxed-functions} including reasons for the assigned scores.
Specifically, functions for encoding/decoding, encryption/decryption, and compression/uncompression have different scores depending on their arguments.
For instance, I/O functions have higher maliciousness scores when they access network.
We assign score 0 for the functions that initialize or create objects (e.g., {\tt curl\_init}, {\tt fopen}, and {\tt mysqli\_init}) as these functions alone do not exhibit malicious behaviors yet but subsequent operations on the created objects do. We assign maliciousness scores 1 or 2 on the subsequent operations.
There are several functions that have a score of 0, including {\tt unlink}, {\tt getcwd}, {\tt mkdir}, and MySQL functions. As they are pervasively used in both malware and benign applications, we assign the 0 scores to avoid false positives. However, as they can affect the host system state, we sandbox them.
Note that Section~\ref{section:discussion} includes a sensitivity analysis of fine-grained scoring on these 31 functions.
31 important functions -- including SFs and PMFs -- that are frequent in both malicious and benign applications are sandboxed by overriding them to virtualize access to system resources (Section~\ref{section:detecting-malware}).
As these functions are manually overriden, their respective maliciousness score is defined with finer granularity based on their input parameters.
Table~\ref{table:sandboxed-functions} provides insights on how these 31 sandboxed functions are scored between 0 and 2.
Section~\ref{section:discussion} includes a sensitivity analysis of fine-grained scoring on these 31 functions.
% are sandboxed by overriding them to virtualize access to system resources (Section~\ref{section:detecting-malware}).
%31 important functions -- including SFs and PMFs -- that are frequent in both malicious and benign applications are sandboxed by overriding them to virtualize access to system resources (Section~\ref{section:detecting-malware}).
%As these functions are manually overriden, their respective maliciousness score is defined with finer granularity based on their input parameters.
%Table~\ref{table:sandboxed-functions} provides insights on how these 31 sandboxed functions are scored between 0 and 2.
%That this fine-grained scoring could be applied to other functions to further increase detection accuracy. However, as mentioned before, the point of this research is to show that a simple detection technique coupled with counterfactual execution as a program analysis method can be very effective in malware detection, leaving room for future research in more accurate detection techniques.
......@@ -83,59 +94,84 @@ Section~\ref{section:discussion} includes a sensitivity analysis of fine-grained
\begin{table*}[]
\centering
\scriptsize
\footnotesize
%\renewcommand{\arraystretch}{.95}% narrower
\begin{tabular}{|l|l|l|l|l|l|}
\begin{tabular}{l}
\hline
abs & addcslashes & addslashes & apache\_getenv & array\_change\_key\_case & array\_combine \\ \hline
array\_diff & array\_diff\_assoc & array\_fill & array\_fill\_keys & array\_filter & array\_flip \\ \hline
array\_intersect & array\_intersect\_key & array\_key\_exists & array\_keys & array\_map & array\_merge \\ \hline
array\_pop & array\_push & array\_replace & array\_replace\_recursive & array\_reverse & array\_search \\ \hline
array\_shift & array\_slice & array\_splice & array\_unique & array\_unshift & array\_values \\ \hline
array\_walk & array\_walk\_recursive & asort & assert & basename & bin2hex \\ \hline
call\_user\_func & call\_user\_func\_array & ceil & checkdate & chr & class\_alias \\ \hline
class\_exists & class\_implements & closedir & compact & constant & count \\ \hline
create\_function & crypt & curl\_close & curl\_error & curl\_getinfo & curl\_setopt \\ \hline
curl\_setopt\_array & curl\_version & current & date & date\_create & date\_default\_timezone\_get \\ \hline
date\_default\_timezone\_set & date\_format & debug\_backtrace & dechex & define & defined \\ \hline
dirname & dirname & dirname & each & end & error\_log \\ \hline
error\_reporting & explode & extension\_loaded & extract & fclose & file\_exists \\ \hline
filegroup & filemtime & fileowner & fileperms & filesize & filter\_var \\ \hline
floor & flush & flush & func\_get\_arg & func\_get\_args & func\_get\_args \\ \hline
func\_num\_args & function\_exists & gd\_info & get\_class & get\_class\_methods & get\_defined\_vars \\ \hline
get\_html\_translation\_table & get\_loaded\_extensions & get\_magic\_quotes\_gpc & get\_object\_vars & get\_parent\_class & getenv \\ \hline
gethostbyname & glob & gmdate & hash\_equals & hash\_hmac & header \\ \hline
header\_remove & headers\_list & headers\_sent & hex2bin & hexdec & html\_entity\_decode \\ \hline
htmlentities & htmlspecialchars & http\_build\_query & iconv\_set\_encoding & implode & in\_array \\ \hline
ini\_get & interface\_exists & intval & is\_a & is\_array & is\_bool \\ \hline
is\_callable & is\_dir & is\_file & is\_float & is\_int & is\_null \\ \hline
is\_numeric & is\_object & is\_readable & is\_resource & is\_scalar & is\_string \\ \hline
is\_writable & join & json\_decode & json\_encode & key & krsort \\ \hline
ksort & ltrim & max & mb\_check\_encoding & mb\_convert\_encoding & mb\_detect\_encoding \\ \hline
mb\_internal\_encoding & mb\_strlen & mb\_strpos & mb\_strpos & mb\_strrpos & mb\_strstr \\ \hline
mb\_strtolower & mb\_substr & md5 & memory\_get\_usage & method\_exists & microtime \\ \hline
min & mktime & move\_uploaded\_file & mt\_rand & mysqli\_errno & mysqli\_error \\ \hline
mysqli\_fetch\_array & mysqli\_fetch\_assoc & mysqli\_fetch\_object & mysqli\_fetch\_row & mysqli\_free\_result & mysqli\_get\_client\_info \\ \hline
mysqli\_get\_server\_info & mysqli\_insert\_id & mysqli\_more\_results & mysqli\_num\_fields & mysqli\_num\_rows & mysqli\_ping \\ \hline
mysqli\_real\_escape\_string & mysqli\_set\_charset & next & nl2br & number\_format & ob\_end\_clean \\ \hline
ob\_end\_flush & ob\_flush & ob\_get\_clean & ob\_get\_contents & ob\_get\_flush & ob\_get\_level \\ \hline
ob\_implicit\_flush & ob\_start & openssl\_decrypt & openssl\_random\_pseudo\_bytes & ord & parse\_ini\_string \\ \hline
parse\_str & parse\_url & pathinfo & php\_sapi\_name & phpversion & pow \\ \hline
preg\_grep & preg\_match & preg\_match\_all & preg\_quote & preg\_replace\_callback & preg\_split \\ \hline
prev & print\_r & printf & property\_exists & rand & random\_byes \\ \hline
range & rawurldecode & rawurlencode & readdir & readfile & realpath \\ \hline
register\_shutdown\_function & reset & round & rtrim & scandir & serialize \\ \hline
session\_cache\_limiter & session\_destroy & session\_get\_cookie\_params & session\_id & session\_name & session\_save\_path \\ \hline
session\_set\_cookie\_params & session\_set\_save\_handler & session\_start & session\_status & session\_unset & session\_write\_close \\ \hline
set\_error\_handler & set\_exception\_handler & setcookie & setlocale & settype & sha1 \\ \hline
simplexml\_load\_file & sizeof & sort & spl\_autoload\_register & spl\_autoload\_unregister & spl\_object\_hash \\ \hline
sprintf & str\_ireplace & str\_pad & str\_repeat & str\_replace & str\_split \\ \hline
strcasecmp & strip\_tags & stripos & stripslashes & stristr & strlen \\ \hline
strpbrk & strpos & strrev & strrpos & strstr & strtolower \\ \hline
strtotime & strtoupper & strtr & strval & substr & substr\_count \\ \hline
substr\_replace & sys\_get\_temp\_dir & time & timezone\_identifiers\_list & timezone\_open & trigger\_error \\ \hline
trim & ucfirst & uksort & uniqid & unserialize & urldecode \\ \hline
urlencode & usort & utf8\_encode & var\_dump & version\_compare & vsprintf \\ \hline
{\bf Safe Functions} \\ \hline \hline
abs, addcslashes, addslashes, apache\_getenv, array\_change\_key\_case, array\_combine,
array\_diff, array\_diff\_assoc, array\_fill, array\_fill\_keys, array\_filter, array\_flip, \\
array\_intersect,
array\_intersect\_key,
array\_key\_exists, array\_keys, array\_map, array\_merge,
array\_pop, array\_push, array\_replace, array\_replace\_recursive, array\_reverse, \\
array\_search,
array\_shift,
array\_slice, array\_splice, array\_unique, array\_unshift, array\_values,
array\_walk, array\_walk\_recursive, asort, assert, basename, bin2hex, \\
call\_user\_func, call\_user\_func\_array, ceil, checkdate, chr, class\_alias,
class\_exists, class\_implements, closedir, compact, constant, count,
create\_function, crypt,
curl\_close, \\
curl\_error, curl\_getinfo, curl\_setopt,
curl\_setopt\_array, curl\_version, current, date, date\_create, date\_default\_timezone\_get,
date\_default\_timezone\_set, date\_format, \\
debug\_backtrace, dechex, define, defined,
dirname, dirname, dirname, each, end, error\_log,
error\_reporting, explode, extension\_loaded, extract, fclose, file\_exists,
filegroup, \\
filemtime, fileowner, fileperms, filesize, filter\_var,
floor, flush, flush, func\_get\_arg, func\_get\_args, func\_get\_args,
func\_num\_args, function\_exists, gd\_info, get\_class, \\
get\_class\_methods, get\_defined\_vars,
get\_html\_translation\_table, get\_loaded\_extensions, get\_magic\_quotes\_gpc, get\_object\_vars, get\_parent\_class, getenv,
gethostbyname, \\
glob, gmdate, hash\_equals, hash\_hmac, header,
header\_remove, headers\_list, headers\_sent, hex2bin, hexdec, html\_entity\_decode,
htmlentities, htmlspecialchars, \\
http\_build\_query, iconv\_set\_encoding, implode, in\_array,
ini\_get, interface\_exists, intval, is\_a, is\_array, is\_bool,
is\_callable, is\_dir, is\_file, is\_float, is\_int, is\_null,
is\_numeric, \\
is\_object, is\_readable, is\_resource, is\_scalar, is\_string,
is\_writable, join, json\_decode, json\_encode, key, krsort,
ksort, ltrim, max, mb\_check\_encoding, mb\_convert\_encoding, \\
mb\_detect\_encoding,
mb\_internal\_encoding, mb\_strlen, mb\_strpos, mb\_strpos, mb\_strrpos, mb\_strstr,
mb\_strtolower, mb\_substr, md5, memory\_get\_usage, method\_exists, \\
microtime,
min, mktime, move\_uploaded\_file, mt\_rand, mysqli\_errno, mysqli\_error,
mysqli\_fetch\_array, mysqli\_fetch\_assoc, mysqli\_fetch\_object, mysqli\_fetch\_row, \\
mysqli\_free\_result, mysqli\_get\_client\_info,
mysqli\_get\_server\_info, mysqli\_insert\_id, mysqli\_more\_results, mysqli\_num\_fields, mysqli\_num\_rows, mysqli\_ping,\\
mysqli\_real\_escape\_string, mysqli\_set\_charset, next, nl2br, number\_format, ob\_end\_clean,
ob\_end\_flush, ob\_flush, ob\_get\_clean, ob\_get\_contents, ob\_get\_flush, ob\_get\_level, \\
ob\_implicit\_flush, ob\_start, openssl\_decrypt, openssl\_random\_pseudo\_bytes, ord, parse\_ini\_string,
parse\_str, parse\_url, pathinfo, php\_sapi\_name, phpversion, pow,
preg\_grep, \\
preg\_match, preg\_match\_all, preg\_quote, preg\_replace\_callback, preg\_split,
prev, print\_r, printf, property\_exists, rand, random\_byes,
range, rawurldecode, rawurlencode, \\
readdir, readfile, realpath,
register\_shutdown\_function, reset, round, rtrim, scandir, serialize,
session\_cache\_limiter, session\_destroy, session\_get\_cookie\_params, session\_id, \\
session\_name, session\_save\_path,
session\_set\_cookie\_params, session\_set\_save\_handler, session\_start, session\_status, session\_unset, session\_write\_close,
set\_error\_handler, \\
set\_exception\_handler, setcookie, setlocale, settype, sha1,
simplexml\_load\_file, sizeof, sort, spl\_autoload\_register, spl\_autoload\_unregister, spl\_object\_hash,
sprintf, str\_ireplace,\\
str\_pad, str\_repeat, str\_replace, str\_split,
strcasecmp, strip\_tags, stripos, stripslashes, stristr, strlen,
strpbrk, strpos, strrev, strrpos, strstr,
strtolower,
strtotime, strtoupper, strtr, \\
strval, substr, substr\_count,
substr\_replace, sys\_get\_temp\_dir, time, timezone\_identifiers\_list, timezone\_open, trigger\_error,
trim, ucfirst, uksort, uniqid, unserialize, urldecode, \\
urlencode, usort, utf8\_encode, var\_dump, version\_compare, vsprintf
\\ \hline
\end{tabular}
\centering
......@@ -187,7 +223,7 @@ Section~\ref{section:discussion} includes a sensitivity analysis of fine-grained
\centering
\caption{
Functions explicitly sandboxed in \toolname to preserve correctness and increase detection accuracy.
Functions sandboxed in \toolname to preserve correctness and increase detection accuracy.
The insight for each function as well as the respective maliciousness score is included.
}
\label{table:sandboxed-functions}
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment