Saturday, November 16, 2019
Assemblers And Disassembler Softwares Computer Science Essay
Assemblers And Disassembler Softwares Computer Science Essay    A disassembler is a computer program that translates machine language into assembly language the inverse operation to that of an assembler . A disassembler differs from a decompiler which targets a high-level language rather than an assembly language. The output of a disassembler is often formatted for human-readability rather than suitability for input to an assembler, making it principally a reverse-engineering tool.  Assembly language source code generally permits the use of constants and programmer comments . These are usually removed from the assembled machine code by the assembler . A disassembler operating on the machine code would produce disassembly lacking these constants and comments. The disassembled output becomes more difficult for a human to interpret than the original annotated source code. Some disassemblers make use of the symbolic debugging information present in object files such as ELF. The Interactive Disassemblerallow the human user to make up mnemonic symbols for values or regions of code in an interactive session: human insight applied to the disassembly process often parallels human creativity in the code writing process.  Disassembly is not an exact science: On CISC platforms with variable-width instructions, or in the presence of self-modifying code, it is possible for a single program to have two or more reasonable disassemblies. Determining which instructions would actually be encountered during a run of the program reduces to the proven-unsolvable halting problem.  Examples of disassemblers  Any interactive debugger will include some way of viewing the disassembly of the program being debugged. Often, the same disassembly tool will be packaged as a standalone disassembler distributed along with the debugger. For example, objdump, part of GNU Binutils, is related to the interactive debugger gdb . The some ofexample of dissembler are  IDA  ILDASM is a tool contained in the .NET Framework SDK. It can be used to disassemble PE files containing Common Intermediate Language code.  OllyDbg is a 32-bit assembler level analysing debugger  PVDasm is a Free, Interactive, Multi-CPU disassembler.  SIMON a test/ debugger/ animator with integrated dis-assembler for Assembler, COBOL and PL/1  Texe is a Free, 32bit disassembler and windows PE file analyzer.  unPIC is a disassembler for PIC microcontrollers  Interactive Disassembler  Interactive Disassembler  The Interactive Disassembler, more commonly known as simply IDA, is a disassembler used for reverse engineering. It supports a variety of executable formats for different processors and operating systems. It also can be used as a debugger for Windows PE, Mac OS XMach-O, and LinuxELF executables. A decompiler plugin for programs compiled with a C/C++compiler is available at extra cost. The latest full version of Ida Pro is commercial.IDA performs much automatic code analysis, using cross-references between code sections knowledge of parameters of API calls, and other information. However the nature of disassembly precludes total accuracy, and a great deal of human intervention is necessarily required. IDA has interactive functionality to aid in improving the disassembly. A typical IDA user will begin with an automatically generated disassembly listing and then convert sections from code to data and viceversa.  Scripting  IDC scripts make it possible to extend the operation of the disassembler. Some helpful scripts are provided, which can serve as the basis for user written scripts. Most frequently scripts are used for extra modification of the generated code. For example, external symbol tables can be loaded thereby using the function names of the original source code. There are websites devoted to IDA scripts and offer assistance for frequently arising problems.  Users have created plugins that allow other common scripting languages to be used instead of, or in addition to, IDC. IdaRUB supports Ruby and IDAPython adds support for Python  Supported systems/processors/compilers  Operating systems  x86WindowsGUI  x86 Windows console  x86 Linux console  x86 Mac OS X  ARM Windows CE  Executable file formats  PE (Windows)  ELF (Linux, most *BSD)  Mach-O (Mac OS X)  Netware .exe  OS/2 .exe  Geos .exe  Dos/Watcom LE executable (without embedded dos extender)  raw binary, such as a ROM image  Processors  Intel 8086 family  ARM, including thumb code  Motorola 68xxx/h8  ZilogZ80  MOS Technology 6502  Intel i860  DEC Alpha  Analog Devices ADSP218x  Angstrem KR1878  Atmel AVR series  DEC series PDP11  Fujitsu F2MC16L/F2MC16LX  Fujitsu FR 32-bit Family  Hitachi SH3/SH3B/SH4/SH4B  Hitachi H8: h8300/h8300a/h8s300/h8500  Intel 196 series: 80196/80196NP  Intel 51 series: 8051/80251b/80251s/80930b/80930s  Intel i960 series  Intel Itanium (ia64) series  Java virtual machine  MIPS: mipsb/mipsl/mipsr/mipsrl/r5900b/r5900l  Microchip PIC: PIC12Cxx/PIC16Cxx/PIC18Cxx  MSIL  Mitsubishi 7700 Family: m7700/m7750  Mitsubishi m32/m32rx  Mitsubishi m740  Mitsubishi m7900  Motorola DSP 5600x Family: dsp561xx/dsp5663xx/dsp566xx/dsp56k  Motorola ColdFire  Motorola HCS12  NEC 78K0/78K0S  PA-RISC  PowerPC  SGS-Thomson ST20/ST20c4/ST7  SPARC Family  Samsung SAM8  Siemens C166 series  TMS320Cxxx series  Compiler/libraries (for automatic library function recognition)[3]  Borland C++ 5.x for DOS/Windows  Borland C++ 3.1  Borland C Builder v4 for DOS/Windows  GNU C++ for Cygwin  Microsoft C  Microsoft QuickC  Microsoft Visual C++  Watcom C++ (16/32 bit) for DOS/OS2  ARM C v1.2  GNU C++ for Unix/common  SIMON (Batch Interactive test/debug)  SIMON (Batch interactive test/debug) was a proprietary test/debugging toolkit for interactively testing Batch programs designed to run on IBMs System 360/370/390 architecture.  It operated in two modes, one of which was full instruction set simulator mode and provided Instruction step, conditional Program Breakpoint (Pause) and storage alteration features for Assembler, COBOL and PL/1 programs.  High level language (HLL) users were also able to see and modify variables directly at a breakpoint by their symbolic names and set conditional breakpoints by data content.  Many of the features were also available in partial monitor mode which relied on deliberately interrupting the program at pre-defined points or when a program check occurred.In this mode, processing was not significantly different from normal processing speed without monitoring.  It additionally provided features to prevent application program errors such as Program Check, Wild branch , and Program loop. It was possible to correct many errors and interactively alter the control flow of the executing application program. This permitted more errors to be detected for each compilation which, at the time, were often scheduled batch jobs with printed output, often requiring several hours turnaround before the next test run.  Operating Systems  Simon could be executed on IBMMVS, MVS/XA, ESA or DOS/VSE operating systems and required IBM 3270 terminals for interaction with the application program.  LIDA  lida is basically a disassembler and code analysis tool. It uses the bastards libdisasm for single opcode It allows interactive control over the generated deadlisting via commands and builtin tools.  features  It trace execution flow of binary  It work with symbolic names: interactive naming of functions, labels, commenting of code.  It scan for known anti-debugging, anti-disassembling techniques  It scan for user defined code sequences  It integrated patcher  It also integrated cryptoanalyzer  Many disassemblers out there use the output of objdump à ¢Ã ¢Ã¢â¬Å¡Ã ¬ lida that tries a more serious approach. The several limitations of objdump are broken by using libdisasm and by tracing the execution flow of the program.  Further by having the control over the disassembly more features can be included. Everybody who has already worked on some deadlisting will immediate feel a need to work interactive with the code  and be able to change it.  Therefore lida will have an integrated patcher resolves symbolic names, provides the ability to comment the code, serves efficient browsing methods. The more exotic features of lida should be on the analysis side. The code can be scanned for custom sequences known antidebugging techniques known encryption algorithms also you will be able to directly work with the programs data and for example pass it to several customizable en-/decryption routines.  This of course only makes limited sense as it is not a debugger. Tough often I really missed this functionality.  Limitations of objdump based disassemblers  Usual programs one would like to disassemble are either coded directly in assembly, or use some tricks to avoid beeing disassembled. I will here give a short overview of the most objdump features  objdump relies on section headers  It is an ELF executable that contains correct section headers. Tough for the OS-loader to run an ELF binary, section headers are not necessary at all. The important thing to get a process loaded into memory are the program headers .  So the first common anti disassembling trick is to either drop or manipulate the ELF section headers By doing so, objdump refuses to perform the disassembly:  [emailprotected]> file tiny-crackme  tiny-crackme: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, corrupted section header size  [emailprotected]> objdump -D tiny-crackme  objdump: tiny-crackme: File format not recognized  The binary I took as example to verify is yanistos tiny-crackme  objdump does not trace the execution flow I  By not tracing the execution flow objdump can easily be fooled to just disassemble a few lines and stop there.  This means it does not recognize any functions, does not see the code which is stored in data sections.  objdump does not trace the execution flow  Additionally another common trick is to insert garbage opcodes and overjump them to disalign the disassembly from the execution flow.  Example: When an instruction jumps into the middle of the next instruction, objdump does not disassemble from this exact location. It will continue with the next instruction and consequently dissasemble garbage from here on.  As a result you will mainly see totally usesless instructions in the whole disassembly.  . Implementation Details  lida uses libdasm of the bastard for single opcode decoding. It does not use the whole environment including the typhoon database.  The main program is coded in perl/TK  which uses a C backend for the most timeconsuming parts (disassembly, analysis, scanning for strings). Generally lida is designed to be as fast as possible (the disassembly)  by trying not to waste all your RAM ðŸâ¢â  lida is designed to be also efficient in usability. Therefore all important functions are accessible via single keystrokes, or short commands. This means no clicking around is necessary, you can enter your tasks directly into the commandline.  The disassembling engine  The disassembling is done in currently 4 (or 6) passes, default is all 6:  1st pass  is the main control flow disassembly  Here the disassembly is started from the executables entrypoint, and recursively  disassembles the binary by following each branch, and stepping into each sub-  routine.  This leads in also disassembling code blocks in data sections, if existent :),  so the disassembly is not limited to a .text section.  Also, if indirect jumps/calls are used, the final destination is looked up  in the binaries data of course  2nd pass  for glibc binaries:  A heuristic scan scans for the main() function and starts pass1 there (so also re-  cursive disassembling)  3rd pass  all other code sections  This pass repeats pass1 for all found executable sections, and starts at section  start. If the binary does not contain section headers, the disassembly starts  at the first loaded executable address.  4th pass  functions  This pass scans for typical function prologues and starts pass1 at each found  address. This is for discovering code regions which are not explicitly called,  and where their entrypoints are evaluated at runtime.  5th pass  disassembling caves  All passes build up a map of the binary. If until now there are code regions  which were not yet disassembled, they can be now.  6th pass  remainders  If pass 5 was executed, and there are still caves, they are displayed as DB xx,   Definitely for pass 4 and 5 there are enhancements to come, as well as for the recursive disassembly function itself.  Also to mention whenever a jump into the middle of a previous instruction is beeing found,  currently those addresses are beeing marked. To follow is a representation of instructions within instructions (compare 3.1), as of course by intelligent placing of opcodes both instructions can be valid and used during the execution flow.  Signature Scanning  Basically it is done by a signature scanning. I quote it because it is not a simple pattern matching.  For understanding that, one needs a little understanding of typical hash-encryption algorythms.  Lets take for example a MD5 hash. How can we find the code that does an MD5 hash?  On a very high level generating a hash is usually done in 3 steps: the init function, the update function and the finalize function.  The init function usually sets up an array of some numeric values, which are then modified in a loop using the input data (plain data) during the algorythm, until the hash is calculated.  The finalize function creates the representation in a common format (easily spoken; it pads the digest and is appending the size).  Hoewever, it does not matter to know actually how the algorythm works to find it ðŸâ¢â  Due to the common fact, that the initialization functions use fixed numeric initialization values, which are the same in every implementation, as they are part of the algorythm  these are the values we are searching for. For MD5 those are:  0x67452301  0xefcdab89  0x98badcfe  0x10325476  So to find an MD5 implementation, it is necessary to scan for those dword values, of course they can appear in any order (strange enough nearly always they are used in the listed order above). Now as those dwords can exist also in just any binary by accident (oltough seldom) some smarter scanning is done: the values need to appear in a limited size of a code block. The values can be in any order, and also some fuzzyness has been added to scan for a little bit altered init values.  Heuristic Scanning  Heuristic scanning is not yet implemented. It is intended to find custom crypto code.  Basically it is beeing looked for a sequence of suspicious opcode sequences, which look like an encryption routine.  OllyDbg is an x86debugger that emphasizes binary code analysis, which is useful when source code is not available. It traces registers, recognizes procedures, API calls, switches, tables, constants and strings, as well as locates routines from object files and libraries. According to the programs help file, version 1.10 is the final 1.x release. Version 2.0 is in development and is being written from the ground up. The software is free of cost, but the shareware license requires users to register with the author. The current version of OllyDbg cannot disassemble binaries compiled for 64-bit processors.    
Subscribe to:
Post Comments (Atom)
 
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.