The Clueless Newbies Guide to Hello World in Nasm
Q. Assembly language? Is this like assembly in High School?
A. No, not really. In High School, you had to sit quietly with your hands
folded. Writing assembly language, you're in charge of the processor's
every action. Sort of like being the speaker at assembly in High School,
except that
Q. So how do I do this?
A. You write assembly language source code - sort of English-like text - and the assembler (Nasm) converts it into machine code.
Q. What's this machine code look like?
A. BA 08 01 B4 09 CD 21 C3 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0D 0A 24
Q. I can't make anything of that!
A. That's why we use assembly language!
Q. So what's this assembly language look like?
A. Glad you asked... This is about the simplest program Nasm will assemble.
org 100h
mov dx,msg
mov ah,9
int 21h
mov ah,4Ch
int 21h
msg db 'Hello, World!',0Dh,0Ah,'$'
Q. What's that make the processor do?
A. Let's take it line-by-line...
org 100h - this doesn't tell the processor to do anything. It's a "pseudo-instruction" which tells Nasm that your code will be loaded at address 100h, so that Nasm can calculate the addresses of labels correctly.
Q. Wait a minute. How do I know it should be 100h?
A. ".com" files always load at 100h. The preceeding 100h (256 decimal) bytes are the "Program Segment Prefix" - the PSP - some information that dos fills in when your program loads. This line - "org 100h" - is the only clue Nasm needs to generate a ".com" file.
Q. Okay, so the next line moves dx to msg, right?
A. No, the mov's (and other instructions) work the other way around in "Intel syntax" ("AT&T syntax" works in the direction you'd expect, but has other complications).
mov dx,msg - moves the address of our message into the dx register.
mov ah,9 - moves the number 9 into the ah register.
Q. Well, that's easy enough to figure out once you figure it out. What's "int"? Integer?
A. No, the "int" instruction calls an "interrupt service routine" - an ISR. This calls some code - already loaded by the operating system or some other program - whose address is stored in a table at a known address (starts at memory address 0). The address of each ISR takes up 4 bytes, so when the processor encounters an "int" instruction, it multiplies the operand (21h, in this case) by 4, fetches the address of the ISR from the table (the Interrupt Vector Table -IVT), executes the code at *that* address, and returns to where it was.
Q. So what does all that do for me?
A. Well, "int 21h" is the "dos services interrupt", and it will do a lot of different things, depending upon what value it finds in the ah register. Since we just put 9 in ah, it will print the dollar-sign terminated string whose address it finds in the dx register. Since we put the address of our message in dx, it'll print our message.
Q. If the message is printed, what's the rest of it for? We move 4Ch into ah, and then do that funny "int 21h" again. What's it do this time?
A. Ends the program, and returns control to dos (command.com). If you write code in a higher level language, the compiler/interpreter will generate code to do this when you just stop coding. In assembly language, the processor needs to be told explicitly to stop, or it will attempt to execute whatever comes next as an instruction. This almost never does anything useful. It is very unlikely that the processor will encounter any garbage that it will interpret as instructions to format your hard drive, but do remember to terminate your program!
Q. Okay. Now I can see our message in the last line, but what's all that garbage around it?
A. "msg" is a "variable name" - just a label - it could say "rumplestilskin" just as well, as long as both uses of it match (the "mov dx,msg " and "msg db blablablah", that is). Nasm calculates the address of the label, and plugs it into "mov dx,msg". The address of "msg" is 10Bh, in this case, but it would change if we altered the code, and we'd go nuts keeping track of it if Nasm didn't do it for us.
The "db" is another "pseudo-instruction" - it doesn't generate any code, but tells Nasm to output the following bytes into our program.
The next part is our message - Hello, World! - notice it's enclosed in single quotes in the source code. Either single quotes or double quotes can be used. If you want to use a single quote/apostrophe in the message, enclose it in double quotes - "here's an example".
The "0Dh,0Ah" are a "carriage return/line feed" pair - just to move
the cursor to the next line for a nicer display. Without it, the output
would look like "Hello, World!C:\>" when we exit. Note that the
(hexadecimal) numbers start with a leading zero. Nasm can't distinguish
between the hex number "ABCDh", and the label "ABCDh", so the rule is
that a number
The "$" terminates the string to output - because int 21h with 9 in ah wants it that way.
Q. Why a dollar sign?
A. Got me. Dos wants it that way. You can easily write a routine to print a zero-terminated string, and not use int 21h/ah 9 for this chore.
Q. Well, okay, how do I get it to do something?
A. Save the text (your "source code") to a file named, let's say, "hi.asm", and run Nasm on it. You've got a choice here. You can just do "nasm hi.asm", and rename the resulting "hi" file to "hi.com" before you can run it. Or you can take your first step towards becoming a Nasm Guru, and do "nasm hi.asm -o hi.com". The "-o" switch names the output file. The space after the "o" is optional - you can write "-ohi.com" - I think it's more readable with the space, but Nasm isn't fussy. We really should use the "-f bin" (or "-fbin") switch to tell Nasm we want a binary format output file, but that's Nasm's default, so we can be lazy.
Q. Now what?
A. Say "hi".
I really would like some feedback on this. If you're a newbie, what more do you need to know? If you're experienced, what have I done wrong?