A Versatile Decryption Loader

This page describes how to write a versatile loader for encrypted COM files. Make sure you have read part1 of this guide.

I got tired of always including the source code for a loader in the ASM source file, and decided to make a program which takes an existing COM file (already compiled), encrypts it, and attaches it to a loader.

There are two ways to do this:

Have the loader at the start of the file (my method)
Have the loader at the end of the file (much like a virus - but heuristic virus scanners might pick this up).

How the Loader Works

Initially the encrypted COM file will look as follows:

       +-----------+
       |           |
       |  loader   |
       |           |
       +-----------+
       |           |
       | encrypted |
       |   code    |
       |           |
       +-----------+

What we need is for that loader to decrypt the code, and then move the code to offset 100h. This is very important because otherwise all the memory references in the decrypted code will be wrong. Refer to the diagram below:

       +-----------+       +-----------+       +-----------+
       |           |       |           |       |           |
       |  loader   |       |  loader   |       | decrypted |
       |           |       |           |       |   code    |
       +-----------+  -->  +-----------+  -->  |           |
       |           |       |           |       +-----------+
       | encrypted |       | decrypted |       |           |
       |   code    |       |   code    |       |           |
       |           |       |           |       |           |
       +-----------+       +-----------+       +-----------+

This still will not work because when you move the decrypted code to offset 100h, the loader gets overwritten. The solution is to have a loader move a relocator to somewhere out of the way, and then jump to the relocator. The relocator then moves the decrypted code and jumps back to offset 100h.

       +-----------+       +-----------+       +-----------+       +-----------+
       |  loader   |       |  loader   |       |  loader   |       |           |
       +-----------+       +-----------+       +-----------+       | decrypted |
       | relocator |       | relocator |       | relocator |       |   code    |
       +-----------+  -->  +-----------+  -->  +-----------+  -->  |           |
       |           |       |           |       |           |       +-----------+
       | encrypted |       | decrypted |       | decrypted |       |           |
       |   code    |       |   code    |       |   code    |       |           |
       |           |       |           |       |           |       |           |
       +-----------+       +-----------+       +-----------+       +-----------+
                                               | relocator |       | relocator |
                                               +-----------+       +-----------+

It does not matter whether the loader or the relocator decrypts the code.

The Relocator

The basic structure of the relocator is as follows:

; assume that the loader has set
; DS:SI -> encrypted code
; ES:DI -> where to move code (100h)
; CX     = number of bytes to move
relocator	equ $
		push cx
		rep movsb			; this moves the code
		pop cx

		; decryption time...
		mov si, 0100h

next_byte:
		xor byte ptr [si], 0ffh		; this decrypts it,
		inc si                          ; could be done in the loader
		loop next_byte

		; jump back		
		mov si, 0100h
		jmp si				; this executes the decrypted code

Moving around code

When moving around code, make sure no memory references are made invalid. Have a look at the following implementation of a relocator.

relocate_code	equ $
		; move the encrypted code to offset 100
		mov	si, offset data		; si -> start of encrypted
		mov	cx, word ptr [num_bytes]
		mov	di, 0100h
		rep	movsb			; move the bytes
		mov	si, 0100h
		jmp	si			; execute
relocate_code_size	equ $ - relocate_code

The two memory references data and num_bytes are absolute references. This means that regardless of where in memory the above piece of code resides, data and num_bytes refer to the same offset in the current segment. This is because all labels have a fixed value, set by the assembler when your program is assembled.

However, the operand to a jump instruction is usually a relative memory reference. Take the following jump instructions (take note of the machine code bytes):

CS:0100 EB4E          JMP     0150          ; JMP +4E   bytes  (relative)
CS:0102 E9FB0E        JMP     1000          ; JMP +0EFB bytes  (relative)
CS:0105 7449          JZ      0150          ; JZ  +49   bytes  (relative)
CS:0107 EA99993412    JMP     1234:9999     ; JMP to 1234:9999 (absolute)

That is why the relocator has

		mov	si, 0100h
		jmp	si                  ; always jumps to offset 100h

instead of

		jmp	100h                ; will not jump to 100h if code 
		                            ; has been moved from original
		                            ; location