Toby Opferman :: Articles

Toby Opferman
http://www.opferman.net
programming@opferman.net

Assembly Tutorial

Finally, I have decided to write an Assembly tutorial due to popular requests
and for lack of a better one on the web (Unless you count Art of Assembly). This is
by no means complete and it may not even help you at all. I provide this tutorial with
hopes you may actually learn something, but it may make you even more confused.
YOU HAVE BEEN WARNED!

PART II. Computer Basics

This Part of the assembly tutorial is to tell you how the computer works from
the computer's perspective. This section is really common sense, but it is overlooked
and forgotten since the computer is a very good abstraction. By Abstraction I mean
making something seem to be one thing, even act like that thing, when it is not. It
is an idea in your head. Some people can skip this section.

PLEASE NOTE, A lot of the OS references referr to Windows & DOS for simpliticity's
sake. It is more likely you are programming for one of these 2 Operating Systems
anyway.

Section A. Devices.

Every device hooked up to your computer is just a series of switches that send data
to the computer and some even take data back from the computer. Basically, each is
assigned a 'PORT' (There is another method called 'Memory Mapped I/O' where fixed
memory addresses are used instead of ports, some small processors made by companies
like Motorola use this technique). A Port is just an Input/Output line, an interface
to read/write data to the device. There is also something called an "IRQ" Interrupt
Request. Each device is assigned it's own IRQ. There is something else called a "PIC"
Programmable Interrupt Controler. Basically, you turn the IRQ on in the PIC. Then, when a
device sends data, the computer is notified. There is an IRQ assert pin on the CPU and the
CPU is now notified that your mouse has moved. When this occurs, your computer generates
an INTERRUPT. Every IRQ has code associated with it, a coorsiponding software interrupt.
They are not always the same number. Example, in DOS, the IRQ's are set up starting at
INT 8. IRQ 0 is software interrupt 8 (INT 8h). IRQ 1 is software interrupt 9 (INT 9h)
and so on. This is setup at boot time, the PIC has to be setup and once it is, the settings
stay for the duration of the computer's uptime.

Now, once the IRQ is generated, the CPU stops, saves everything, and the coorsponding
code to that device is executed. The device now checks the ports and updates the devices
activities. If the device is the mouse, for example, you move the mouse, the IRQ is
generated, CPU saves it's state, generates the software INT, which then gets the mouses
new position and puts the pointer on the screen.

As you should know, the mouse and the monitor do not really talk to each other.
The mouse has no real idea what it is doing. You move it, it sends a signal. You
press a button, it sends a signal. That is it. Your mouse driver, the code that
someone wrote to read the mouse's settings is where it interacts with the screen.
The mouse driver keeps track of the location on the screen where it is, the pointer
it not "really" controled by the mouse, the software puts it up there. If you wanted,
you could intercept the software interrupt for the mouse and change it to be totally
messed up.

The same with the keyboard, the same with even expansion cards like your sound card.
Take for example the old simple 8 bit Sound Blaster PRO's. When you played a sound,
the sound card has a port that would take 1 byte. Your software would decode the sound
to notes with whatever algorithm was needed and then send a lot of information
out the byte port of the sound blaster at a certain speed to make a tone. You could
use "DMA" which is Direct Memory Access. That way, the processor does not have to
be involved in 100% of the processing of the sound so it could do other things. With
DMA, you basically decoded your sound, gave the DMA it's address and told it what speed
and where to send it. Once it was done sending, your sound card would generate an
IRQ! Telling you, to send the next load! If you had a GUS, life was a bit easier,
you could tell the sound card where your samples were and it would the sounds in
hardware instead of software, increasing CPU effiency by taking a load off of the
CPU.

Not to get too advanced for you beginners at assembly, but ports are extremely easy to use.

MOV DX, 3DAh ; 3DAh is the port number for Video Retrace (A function of the monitor/video card)
IN AL, DX ; IN instruction asks to get the data at the port in DX

MOV DX, 3C9h ; 3C9h is send Pallete Data register (A function of the video card)
MOV AL, 0 ; Set DATA to Send
OUT DX, AL ; OUT instruction sends the data you want to the register

You see, IN and OUT are assembly instructions (Do not worry if you understand the code,
it is irrevelvant). In this example, we are tring to show you how easy it is to access
external devices, you just use IN to get your data and OUT to send it! Sometimes it
gets a little more complex on how to get what you want, depending on the device, but
you don't need to worry about that for now.

Also, you can make devices for your computer if you know what you're doing. Maybe
making an expansion card is a bit extreme, but you could easily put wires thru your
mouse port, joystick port and hook up a simple bit keyboard or LED lights and read/light
them up using OUT and IN instructions quite simply.

And as you should know, your monitor really doesn't know what's going on either, it's
just an ION beam that is shooting at phosphers on your monitor until they can't hold the
energy and radiate it as light. Depending on what wavelengths/beams are used, depends on
what color is radiated.

And your CPU has no clue what's going on either! Your CPU has a thing called an "IP"
Instruction Pointer. What it does, read the next memory location from referenced by the IP into
the CPU, exectutes that instruction, now that instruction may alter where it reads next,
such as a jump here or there in memory, and IRQ's may be generated to alter it's course, but
all it knows is that 1 instruction(This is a simple explaination!!). Every time you execute
an instruction it's like starting over, it knows not what's next and what it's doing or anything.
It sees an ADD, it ADDs. It sees a MOV data, it moves the data, that is all!

Everything is an abstraction that appears to be working together to form what you see
and use. It's just doing it's job and processing, sending data here and there, it's
all just a tranfer of current to all it's working parts.

Section B. Data Storage

What I am going to talk about here is diskdrives. Basically, you do a DIR and you
see files, other times you try to access a disk and you get an error. Somedisks
are "boot" disks some disks aren't. I am going to explain disk storage here.

When it comes to abstraction, the file system has to be one of the biggest abstractions
of the computer :-) First, a disk is made up of sectors. That is about it. Your
read/write head reads sectors of 512 bytes back to the computer. Your computer represents
them back to you abstractly. a 1.44 meg disk is really 2.0 meg. It puts code on
the first sectors. If it is a boot disk, it puts boot code, if it is not a boot
disk it puts code to print a message "Non-System Disk or disk error Replace and press any key
when ready"!

When your computer boots up, it starts in real mode. It starts executing code at
(I am pretty sure) 0FFFFh:0h in ROM. It performs hardware checks and what not accordingly
to CMOS configurations. When it is ready to load the Operating System, it reads
the first sector of the drive (Configured in the CMOS) it looks for the "boot signature" (AA55h).
If the boot signature is there, it loads the first sector (512 bytes) into memory location
starting at 0000h:7C00h. The control is now passed over to the boot code. It needs to do what
ever the OS needs to configure and it also must load the rest of the boot code/OS
code. If the boot code is > 512 it must load the rest, the boot code is now in
control of the CPU!

In the old days, there used to be boot games, games that basically had
boot code to load themselves and you would play them. They would have a boot sector
that would just load itself into memory and configure only what the game needed and that's it.
You had to turn your PC off to get out of the game.

Whend you do a DIR or how ever you do to find the list of files on your PC, your OS
reads from DISK a certain place on the disk called the "FAT" which is just a mess of
information as any other part of the disk. It puts it into a format, and as long as
the disk is in tact, it now can tell you what is on the disk and it knows where it
is located on the disk according to the FAT format.

Your disk drive, and computer have no clue what is on the disk. You can really, put
anything you want, where you want it! and it would be fine. You would just have to
remeber where you put it! But, in order to keep order, File Systems were invented.
They keep a table and arrange files in certain locations so it is more organized.

Section C. File Formats

All files are exactly the same. I do not know why people always try to say
that is a "binary" file. They're all binary files! All files are stored in
a certain format, unless they are "raw" which means just straight data. Text
files are "raw" they have no format, they are just viewed. You can view any type
of file and I often do. I open up EXE's or OBJ's or DLL's from time to time
in a regular file viewer or even type or cat them. Yes, the range of values the
data in these files is not limited to text & certain speical characters, they
range the entire ASCII chart. You can look for text strings tho. This is
sometimes helpful in certain situtations when looking for something. Now, with
a text editor you cannot manipulate these files as a text editor will have lost
some data read in that it does not support, it will insert new lines since it most
likely will have broken up the file since I doubt the file is in nice short lines
if it's not just 1 big line! But, you can use a hex editor.

Any file can be executed. Change a text file to .exe or .com it will execute,
most likely crash, but it will execute. All files are just data stored in a certain
format. All you need to do to use any file, is find out it's format and there you have it.
If it's compressed in certain parts or all, you have to find out how to decompress.

In DOS, the only difference between a "COM" file and "EXE" file is that the command shell
will look for the .COM extension First. Meaning if you have 2 files, blah.exe blah.com,
you type blah, blah.com will execute (Providing there is not an internal command,
i.e. dir is an internal command and dir.com and dir.exe will either never execute!)

Now, you can rename a .com to .exe, that doesn't matter. What determines if a file is
executed as a .COM or .EXE is it's format. The first 2 characters of a EXE is "MZ". If
that is there, the OS now reads in the header and executes the code accordingly. If it
is not there, it reads the whole file in (Cannot be > 64k) and executes it at
seg:0100h. .COM's are "raw" executables, they have no format. EXE's have a header.

GIF file, JPEG file, etc. All just data, in a certain format. You may say "How do
viewers know the difference?" it's the format. Just like you saw the boot signature
and the EXE signature of MZ, these files usually have signatures in certain places
so the viewer knows what type it is, reads in the format and executes it. Besides the
fact that some may just read the extentions of the files to tell.

FINAL NOTE: You do not have to know the above to program in assembly. Most of you
may have already known the above, some picked up things here and there and others
been obilivous to it. I am just tring to get your mind on track to a lower sense.
Going to the low level with low level in mind. Getting your mind away from abstraction
and high level langauges so you can think and know better what is going on. Do not
worry if you didn't understand everything above.

Assembly Tutorial Part 2