I got tired of always including the source code for a loader in the ASM source file, and decided to make a program which takes an existing COM file (already compiled), encrypts it, and attaches it to a loader.
There are two ways to do this:
+-----------+ | | | loader | | | +-----------+ | | | encrypted | | code | | | +-----------+
What we need is for that loader to decrypt the code, and then move the code to offset 100h. This is very important because otherwise all the memory references in the decrypted code will be wrong. Refer to the diagram below:
+-----------+ +-----------+ +-----------+ | | | | | | | loader | | loader | | decrypted | | | | | | code | +-----------+ --> +-----------+ --> | | | | | | +-----------+ | encrypted | | decrypted | | | | code | | code | | | | | | | | | +-----------+ +-----------+ +-----------+
This still will not work because when you move the decrypted code to offset 100h, the loader gets overwritten. The solution is to have a loader move a relocator to somewhere out of the way, and then jump to the relocator. The relocator then moves the decrypted code and jumps back to offset 100h.
+-----------+ +-----------+ +-----------+ +-----------+ | loader | | loader | | loader | | | +-----------+ +-----------+ +-----------+ | decrypted | | relocator | | relocator | | relocator | | code | +-----------+ --> +-----------+ --> +-----------+ --> | | | | | | | | +-----------+ | encrypted | | decrypted | | decrypted | | | | code | | code | | code | | | | | | | | | | | +-----------+ +-----------+ +-----------+ +-----------+ | relocator | | relocator | +-----------+ +-----------+
It does not matter whether the loader or the relocator decrypts the code.
; assume that the loader has set ; DS:SI -> encrypted code ; ES:DI -> where to move code (100h) ; CX = number of bytes to move relocator equ $ push cx rep movsb ; this moves the code pop cx ; decryption time... mov si, 0100h next_byte: xor byte ptr [si], 0ffh ; this decrypts it, inc si ; could be done in the loader loop next_byte ; jump back mov si, 0100h jmp si ; this executes the decrypted code
relocate_code equ $ ; move the encrypted code to offset 100 mov si, offset data ; si -> start of encrypted mov cx, word ptr [num_bytes] mov di, 0100h rep movsb ; move the bytes mov si, 0100h jmp si ; execute relocate_code_size equ $ - relocate_code
The two memory references data and num_bytes are absolute references. This means that regardless of where in memory the above piece of code resides, data and num_bytes refer to the same offset in the current segment. This is because all labels have a fixed value, set by the assembler when your program is assembled.
However, the operand to a jump instruction is usually a relative memory reference. Take the following jump instructions (take note of the machine code bytes):
CS:0100 EB4E JMP 0150 ; JMP +4E bytes (relative) CS:0102 E9FB0E JMP 1000 ; JMP +0EFB bytes (relative) CS:0105 7449 JZ 0150 ; JZ +49 bytes (relative) CS:0107 EA99993412 JMP 1234:9999 ; JMP to 1234:9999 (absolute)
That is why the relocator has
mov si, 0100h jmp si ; always jumps to offset 100h
instead of
jmp 100h ; will not jump to 100h if code ; has been moved from original ; location