Friday, April 19, 2024 | Toby Opferman
 

Assembly Tutorial Part 3

Toby Opferman
http://www.opferman.net
programming@opferman.net

                                        Assembly Tutorial
                                        
        Finally, I have decided to write an Assembly tutorial due to popular requests
and for lack of a better one on the web (Unless you count Art of Assembly).  This is 
by no means complete and it may not even help you at all.  I provide this tutorial with
hopes you may actually learn something, but it may make you even more confused.  
YOU HAVE BEEN WARNED!

                                      PART III. Logic

	In this tutorial we will talk generically about assembly and in the next
tutorial, we will actually start some x86 specific assembly.

	We need to establish some ground rules on how to code assembly.  Assembly
language programming is based on logic.  Every line performs one operation.  
For those of you who have done high level langauges, 1 line can
be 1 or more logical/nonlogical operations.  Let us give an example:

x = 5*z+y;

That performs about 3 operations.  Here is how you would look at it:

Multiply z by 5 and store somewhere.
Add y to The temporality stored number
Store the output in x.


Now, we will start to talk about the ground rules of assembly first, for
those of you who have never programmed.  A register is an interal storage
space inside the CPU that you can use to operate on data and temperality
store data.  This is kind of the same concept as a variable.  A variable
is a space in memory where you store and operate on data.  Why the reason
to use registers then?  Speed.  Putting data in registers will reduce the CPU's
access to the bus all the time to and from memory.  Your programs will run
faster by utilizing registers correctly.  Also, a lot of processors have
operations that REQUIRE you to have your data in a certain specified register
or registers, and some CPU's also require that all operations must
involve a register.  Some simple processors will not let you add 2
variables, you would have to move the data of 1 into a register then
add the register to the second variable. On the other hand, some more powerful 
processors will allow you to add 2 variables and store in another variable all in
1 operation.

Let us start to come up with how to do things in assembly language.  We will
model our code after the x86 though.  This code is also not going to be from
the most effient or optimized viewpoint, we start out simple.


The problem:  x = y + 1

MOVE Y INTO REGISTER 2
MOVE 1 INTO REGISTER 1
ADD REGISTER 1 TO REGISTER 2
MOVE REGISTER 2 INTO X


Let us examine how we did this.  We took the contents of Y and put it into
a register.  We then put the number 1 into another register.  Next, we
added those 2 registers together and stored the answer into a register.
Next, we took the answer we stored and put it into the variable X.


x = y

MOVE Y INTO REGISTER 1
MOVE REGISTER 1 INTO X

Simple operation, we are taking a variable Y and putting it into a register.
We then take that register and put it into variable X.  

Now, conditions.  Let's say we want to do something if and only if one thing
happens.  Let us example high level version:

if x = 1 then
  y = 2
end if

IF the variable x is 1, then we want to make y = 2.  There are a few ways we will 
examine this issue.  We will talk about "Branching" and "Jumps".   Depending
on what processor you are working with, x86 calls it jumps, mainframes
and a lot of RISC processors call it branching.  They are gotos, some
conditional and some not conditional.  Conditional means that they ONLY
jump if some certain event happens, otherwise, they don't jump.  Non-Conditional
means they jump no matter what.

Ok, have you seen the movie "What about bob?"  where he talks about "Baby Steps".
That is assembly.  Assembly is not hard, it is very simple.  You examine
any 1 line of code and it has a very basic and simple concept.

COMPARE X AND 1
JUMP IF EQUAL TO THEN
JUMP TO END IF
THEN:
  MOVE 2 INTO REGISTER 1
  MOVE REGISTER 1 INTO Y
END IF:

First, we compare the value in X to 1.  The next line means "if the previous
operation is true, then goto THEN".  THEN:  is a label.  You can
reference those in your program so you can go to different sections.  They
do not execute, they "fall through" which means when the code hits the label,
your program just keeps executing what is after it.  That is why we have
a nonconditional jump that forces the code to the END IF: label.  We do
this because if it didn't jump from the first jump, we want it to skip
the execution code.   


Here is another way to look at it.  It's reverse logic:

COMPARE X AND 1
JUMP IF NOT EQUAL TO END IF
THEN:
  MOVE 2 INTO REGISTER 1
  MOVE REGISTER 1 INTO Y
END IF:

We jump if they are NOT equal, so if they are equal it just falls thru
to the then.  Infact, as I said before, THEN is just a label.  if we
aren't going to reference it in the program, we don't need it:


COMPARE X AND 1
JUMP IF NOT EQUAL TO END IF
MOVE 2 INTO REGISTER 1
MOVE REGISTER 1 INTO Y
END IF:


Of course in real programming labels can't have spaces in them.  I just
did that for clarity.  You must also realize that the labels can be named
anything.  They are for YOU only and the people who will read your code.

COMPARE X AND 1
JUMP IF NOT EQUAL TO HAMBURGER
MOVE 2 INTO REGISTER 1
MOVE REGISTER 1 INTO Y
HAMBURGER:

This is clearly acceptable.  Except that Hamburger doesn't really describe
the operation.  Also, rules when making labels, you can't make 2 labels
the same name or the same name as a variable or a CPU instruction or
even a compiler directive.  We will get more into detail when we
start assembly programming itself tho.


The final basic element we want to go over is looping.  Some processors
have loop-type instructions and repeat type instructions.  We will
NOT be covering these here in a generic fashion, you should
consult the documentation for the processor you will be coding on.

We will go over a simple idea.  Say you have an ARRAY of 10 elements
and you want to set each element to zero.  (There are optiminal ways
to do this, but here we are just giving you a basic understanding of
how to loop and not tring to optimize for any specific processor)

FOR Y = 1 TO 10
  ARRAY[Y] = 0
END FOR

Simple loop.  Y loops 10 times and the array elements are set to 0.
There are a few ways to handle this.  

MOVE 1 INTO REGISTER 1
MOVE REGISTER 1 INTO Y
LOOP:
MOVE Y INTO REGISTER 1
MOVE MEMORY ADDRESS OF ARRAY INTO REGISTER 2
ADD REGISTER 1 TO REGISTER 2
MOVE 0 INTO THE ADDRESS POINTED TO BY REGISTER 2
MOVE Y INTO REGISTER 1
ADD 1 TO REGISTER 1
MOVE REGISTER 1 TO Y
COMPARE REGISTER 1 TO 10
JUMP IF LESS THAN 10 TO LOOP



A bit more complicated.  We can optimize since we know that Y is in
register 1 already:

MOVE 1 INTO REGISTER 1
MOVE REGISTER 1 INTO Y

LOOP:
  MOVE Y INTO REGISTER 1
  MOVE MEMORY ADDRESS OF ARRAY INTO REGISTER 2

  ADD REGISTER 1 TO REGISTER 2
  MOVE 0 INTO THE ADDRESS POINTED TO BY REGISTER 2

  ADD 1 TO REGISTER 1
  MOVE REGISTER 1 TO Y

  COMPARE REGISTER 1 TO 10
JUMP IF LESS THAN 10 TO LOOP


And we space it out a bit, so you can see how much easier it is to
read and all we did is space and indent.  This is important to assembly.
Now, a lot of langauges make element 1 the first element of an array.
This is not true at the assembly level.  The first element of an array
is 0.  Since, the element number is the number you add to the address
of the variable to get the element.  Let us examine:

Address    Data
1000        3  
1001        1
1002        4
1003        2
1004        1
1005        3

Everything in assembly is an address.  Labels and variables alike.  Each
instruction is at a particular place in memory.  So, we need to
get the address of the array, which would be the start of the array and
the first element.  We get "1000"  add 2 and we get "1002"  we can
now use a register as a "pointer" and put data into the memory location
indirectly.  A Pointer is an object that holds an address in memory and
INDIRECTLY can put data into the spot in memory.  Not to be more
confusing, but when you add the element number, it should be
element*size of element, so if your data was 2 bytes large, the first
element would be at 0 (0*2) of course but the second element at 2 which is 2*1,
the third at 4 (2*2), etc.

An Analogy of a pointer:
A pointer would be like a MAC machine.  You can do your transactions at
the bank, withdrawl and deposit.  You could also use a "pointer" which
would be the MAC machine.  You just need the "address" which is the account
number/MAC Card.  You can remotely deposit and withdrawl money and it
indirectly affects your account at the bank.

A pointer basically holds the address of another variable.  And you can
change the contents of that address from the pointer.


If you don't get this, don't feel too bad.  Pointers are one of the hardest things
to grasp.  It is not just assembly that has pointers tho, C, C++, and other
languages have them as well so this is not just an assembly language concept.

MOVE 1 INTO REGISTER 1
MOVE REGISTER 1 INTO Y

LOOP:
  MOVE Y INTO REGISTER 1
  MOVE MEMORY ADDRESS OF ARRAY INTO REGISTER 2

  ADD REGISTER 1 TO REGISTER 2
  MOVE 0 INTO THE ADDRESS POINTED TO BY REGISTER 2

  ADD 1 TO REGISTER 1
  MOVE REGISTER 1 TO Y

  COMPARE REGISTER 1 TO 10
JUMP IF LESS THAN 10 TO LOOP

So, you can see that at the end, we increment the variable and we see if
we are at 10 (0-9 is the variable indexes).  It's that simple.

Now, a lot of operations are Hardware and Operating system dependent.  Now,
I know that Assembly IS CPU dependent, but they do all share a common
assessment, and so does hardware/operating systems.  The operations
I am talking about are interfacing to devices.  We can talk all day about
adding numbers, loading and storing data from one place to another, but
we want to be able to do input and output!

Hardware operations can be performed by Operating System calls or
by direct access of the hardware according to the arcitecture of the
computer.  Some computers use "memory-maped I/O" which means each
device reads and writes to a specific memory location.  The PC
mainly uses Port I/O which there are speical instructions that send
data out ports to and from the hardware.  Some CPUs that are more
sophisticated can enable hardware protection, where it monitors
applications and can grant access rights and at the OS's descression,
can let an application access hardware directly or force them to use
an interface provided by the OS and the driver for the device.  You
have to find these things out about your operating system and your
CPU when you begin assembly programming.




 
About Toby Opferman

Professional software engineer with over 15 years...

Learn more »
Codeproject Articles

Programming related articles...

Articles »
Resume

Resume »
Contact

Email: codeproject(at)opferman(dot)com