SUNY Geneseo, Department of Computer Science

Sparc Assembly Language Programming with GCC

Doug Baldwin
Last updated January 16, 2002

Introduction

This document is a brief guide to using GCC as a Sparc assembler. Documentation on this subject seems skimpy, at best. Much of what I write is therefore based on my own experiences and inferences drawn from what documentation is available. This document describes the situation in SUNY Geneseo's Sun environment, but I make no guarantees about how well it will apply to other environments.

This document is not a guide to the Sparc architecture or instruction set, to GCC in general, to UNIX in general, etc. Other sources can provide that information, in particular...

Your Program's Runtime Environment

Most of the time, code you write in assembly language will be just a few subprograms of a larger program. Thus you don't need to even start to worry about things like where space for the runtime stack comes from, how the overall program starts or ends, etc. Something takes care of all of these things for you.

Define labels for the entry points of your assembly language subprograms, and declare those labels global via the ".global" pseudo-op so that linkers can see them and make them available to the rest of your program. Other parts of your program can then call your assembly language subprograms by using the entry point label as a function name. GCC passes parameters via register windows as described in The Sparc Architecture Manual (i.e., the subprogram receives its first six parameters in registers %i0 through %i5, and any others on the stack; the subprogram should begin with a SAVE instruction to allocate itself a stack frame and register window and end with a RET and RESTORE, etc.) For example, here is an outline of a one-parameter void function, foo, defined in assembly language:

.global foo
foo: save %sp, -SA(MINFRAME), %sp

     /* ... parameter value is now available in register %i0 ... */
     /* ... do whatever it is that foo does ... */

     ret
     restore %g0, %g0, %g0

Here is a fragment of C code that calls foo. This would appear in a separately compiled C source file. The most important point here is that the call looks exactly a call on any externally-defined C function:

extern void foo( int i );
...
foo( 17 );
...

You can write whole programs in assembly language if you wish. This is really no harder than writing a subprogram as described above. This is because when you run a program under UNIX, it begins execution via a subprogram call to label "main". (Technically, it appears that UNIX starts a program somewhere in code contained in the Gnu runtime library; this code then calls "main". But for all intents and purposes, think of a program as starting execution at "main".) By the time "main" gets called, a stack has necessarily been created for the program, and the stack and frame pointers initialized.

The code you write at "main" should be a perfectly standard Sparc subprogram. As with any subprogram, label "main" should be declared global. The first instruction at "main" should be a SAVE. Your main subprogram should exit by executing a RESTORE to release the register window and return a status code, and a RET to transfer control back to UNIX (presumably you are really returning to code in the Gnu runtime library, but as with program start-up, you can act as if you are returning directly to the UNIX shell). Little, if anything, will actually care what status code you return, although by convention a value of 0 indicates that your program exited normally, and a non-zero value indicates that it exited abnormally.

The C Runtime Library

One of the big advantages to using GCC as your assembler is that it works well with Gnu's C runtime library. In particular, code you assemble with GCC is automatically linked with this library. This means that you can call familiar C runtime library functions such as "printf" from assembly programs. These functions have exactly the same names in the assembler as they do in C. For example, the function you would invoke from C by writing

printf(... );

is invoked from an assembly language program by writing

call printf

General Assembly Language Syntax

GCC generally recognizes the operation names, statement formats, register names, and similar syntactic features used in The Sparc Architecture Manual. However, a few fine points are worth knowing:

Instruction Set

By default, GCC assembles the Sparc Version 8 instruction set. The assembler recognizes the instruction mnemonics presented in The Sparc Architecture Manual.

Every one of our Suns that I have tested seems able to execute the Version 8 instruction set. GCC can be made to assemble Version 9 instructions, but the resulting executables may not run on all our Suns. So stick to Version 8 instructions.

Defining Symbols

Because GCC pre-processes assembler source files, there are two kinds of symbolic constant that you can use in assembly language: Pre-processor symbols, defined using "#define", and true assembler symbols, defined as described below. The symbols differ in subtle ways, and it can actually matter how you define a symbol in some cases.

Pre-processor symbols are defined via "#define", just as you would do in a C program.

Because these symbols are replaced by their values before the assembler proper ever sees your program, they cannot be used as symbols that the assembler must know are symbols (for example, you can't declare a pre-processor symbol to be global by using it as an argument to a ".global" pseudo-op; you can't use the pre-processor to give a symbolic name to a memory location).

On the other hand, you must use pre-processor symbols in cases where you want the pre-processor to understand that something is a symbol (e.g., if the symbol is to be tested by a "#ifdef" or similar directive).

True assembler symbols (i.e., symbols that are defined via the assembler proper, and which the assembler will know are symbols) are defined either as labels, or via syntax of the form

<Name> = <Value>

where <Name> is the name of the symbol being defined, and <Value> is the value it is to represent.

Define labels using the syntax

<Name>: ...

where <Name> is the name of the symbol being defined, and "..." should be replaced by an instruction, a data definition, or anything else that fills a piece of memory. The value of <Name> will be the address of the first byte defined by "...". For example, the following assembles an "add" instruction, and defines "foo" to represent the address into which that instruction gets loaded:

foo: add %l0, %l1, %l2

Helpful Pseudo-Ops

The assembler recognizes a number of pseudo-ops. Here are the ones I find most useful. This is by no means a complete list of every pseudo-op the assembler recognizes.

Segmentation

Assembly language programs are divided into "segments". This provides hints to UNIX on how to map different parts of the assembled program to different memory pages at load time. For example, instructions may be placed in read-only pages, uninitialized data buffers need to be in writable pages but UNIX needn't write any particular initial values into those pages, etc.

The ".seg" pseudo-op indicates what segment a part of your program describes. The syntax is

.seg <Segment_Name>

where <Segment_Name> stands for a string constant naming a kind of segment. For example

.seg "text"
.seg "data"

The commonly used segments, and their names, are:

text
Executable instructions
data
Initialized data, e.g., string constants, numeric constants not easily generated within an instruction, etc.
bss
Uninitialized data, e.g., buffers, uninitialized arrays, etc.

Global Symbols

Use the ".global" pseudo-op to make a symbol accessible from separately assembled files. The syntax is

.global <Symbol>

where <Symbol> is a symbol name. For example,

.global main

Memory Alignment

Sparc addresses are often subject to alignment restrictions. For example, 32 bit words can only be loaded from or stored into memory at addresses that are multiples of 4, half-words must be at addresses that are multiples of 2, etc. The ".align" pseudo-op forces the next address generated by the assembler to have a specified alignment in memory. The syntax is

.align <Alignment>

which forces the next address generated by the assembler to be aligned to a multiple of <Alignment>. For example

.align 4

Reserving/Initializing Memory

Several pseudo-ops help you set aside regions of memory in your program. Memory thus reserved can be either initialized with specific values, or left uninitialized. Particularly useful pseudo-ops include

.word

This initializes one or more 32-bit words. The syntax is

.word <Value1>, <Value2>, ...

This initializes the current word in memory with <Value1>, the next word with <Value2>, and so forth. You must provide at least one value. Addresses must be aligned to a multiple of 4 before using ".word" (use ".align" to do this). For example, the following initializes the word at address "MyData" to 17:

.align 4
MyData: .word 17

.asciz

Initializes a sequence of bytes with null-terminated strings. The syntax is

.asciz <String1>, <String2>, ...

where <String1>, <String2> etc. are string constants. You must provide at least one string. For example

.asciz "Hello"
.asciz 'world'

.skip

This increments the location counter by a given number of bytes (in effect, reserving that many bytes in memory without initializing them). The syntax is

.skip <NBytes>

where <NBytes> is the number of bytes of memory to reserve. For example, the following reserves space for a 32-bit word:

.skip 4

Helpful Header Files

Certain Sun header files are invaluable when programming in assembly language. These can be included in your program via an "#include" directive, with the name of the header file enclosed in angle brackets. For example

#include <sys/stack.h>

You can get lots of useful information from reading header files. Header files are just plain text files, so you can read them with any editor, "less", "more", etc. But to read a header file, you will need to know its full pathname. Unless otherwise indicated, all the header files discussed here are contained in or below directory "/usr/include". Thus, for example, the file I describe as "sys/stack.h" has full pathname "/usr/include/sys/stack.h".

The header files that I have found most useful for assembly language programming are

sys/stack.h
Describes the format of a stack frame, and provides constants corresponding to key sizes of, and offsets within, stack frames. Of particular note, constant MINFRAME is the minimum possible size of a stack frame. The contents of the stack pointer should be a multiple of constant STACK_ALIGN to ensure proper address alignment; macro SA rounds its argument up to a multiple of STACK_ALIGN (useful when you increment or decrement the stack pointer).

Invoking the Assembler

When GCC is run on an assembly language source file, it will by default pre-process and assemble the file, link it with the Gnu runtime library, and place the executable result in file "a.out". Many of the same command-line options that you use to control compilation and linking of a C program also apply to assembly language. In particular, use "-c" to pre-process and assemble, but not link, a file, and "-o" followed by a file name to specify a name other than "a.out" for the executable.

GCC uses a source file's extension to decide how to process it. Assembly language source files should have an extension of ".S" (note the capital "S"). (If for some reason you don't want GCC to run the pre-processor on your file, give it an extension of ".s" -- note that this "s" is lowercase.)

Sun's header files seem to have been written under the assumption that the pre-processor symbol "_ASM" is defined whenever a file is being processed by an assembler. GCC doesn't do this. So to avoid masses of syntax errors when Sun header files try to include C source code in the middle of your assembly language program, you have to manually define "_ASM". Do this by including the command-line option "-D_ASM" when you invoke GCC.

As an example GCC command line, here is the command you would use to assemble a program named "demo". Note that the source for the program is in file "demo.S", and the executable is placed in file "demo":

gcc -D_ASM -o demo demo.S

Example

As an example of most of the features discussed above, here is an assembly language version of the ever-popular "Hello, world" program. This program calls "printf", with "Hello, world" as its argument. Note the new-line character at the end of the string, so that subsequent output (i.e., the shell's next prompt) will appear on a fresh line.

/* This is a standard "Hello World" program written
   in Sparc assembly language.				*/


#include <sys/stack.h>




	.seg "text"


/* main: The driver for the rest of the program. This
   saves itself a frame, calls "printf" with a pointer
   to the string "Hello, world" as its only argument,
   and then returns.					*/

	.global main

main:	save %sp, -(SA(MINFRAME)), %sp		/* Create program's stack frame.	*/

	set hello, %o0				/* Pass a pointer to the message...	*/
	call printf				/* ... to printf			*/
	nop					/* Delay slot: fill it but do nothing.	*/

	ret					/* Return to system.			*/
	restore %g0, 0, %o0			/* Delay slot: Return "OK" status.	*/




	.seg "data"

hello:	.asciz "Hello, world\n"