THE RAINBOW June 1991 PAGE 40 A guide for the assembly-language programmer on the move OS-9 Assembly Language by Jeff Mikel After several years of programming with Radio Shack's EDTASM+ assembler. I finally got OS-9 Level II and the OS-9 Level II Development System. This gave me access to all those things I dreamed of doing — editing files larger than 32K. editing and assembling large programs in memory, and easy disk, printer and keyboard I/O. But moving from EDTASM+ to OS-9 was not automatic. I had a great deal to learn about OS-9 assembly-language programming. The OS-9 environment is an assembly-language programmer's dream. But with all the extra capability comes a set of design rules that must be followed if your programs are to perform properly. This is a small price to pay to reap the benefits of conditional assembly, macros, very large source files, and even local and global variables. If there is a high-level assembly language, the OS-9 Level II Development System has it. Learning the new software packages was challenging since a lot of the necessary information is hidden in the manuals. It took a lot of manual searching and program writing before I found the information I needed. This article covers the basics you need to know, but can't seem to find. First, I cover the assembler and linker of the Development System. Then, I discuss how to write new OS-9 commands. Finally, we'll take a brief look at the techniques for writing subroutines for BASIC09. Throughout this article, I assume you are familiar with the CoCo's 6809 instruction set and that you have written enough assembly-language code to be comfortable with it. I do not attempt to teach the hows and whys of assembly-language programming. Rather, I explain how to write source code for the Relocating Macro Assembler or RMA. I also assume you understand enough about OS-9 to know how and when to change directories, specify pathlists and perform the other necessary functions without being told. If you do not have much OS-9 experience, you should first get some background with BASIC09, or some other high-level language, to familiarize yourself with OS-9 operation. However, if you have used EDTASM+, or other packages under Disk BASIC, and are familiar with OS-9, you are primed and ready to forge ahead to OS-9 and the RMA package. And, if you are already familiar with the asm assembler, you can probably skim this article and study only the parts that relate exclusively to the RMA. Building a Working Disk Before assembling any programs, first set up a working disk with the necessary commands and enough space to hold your source files. Do not use the original Development System disks — put write protect tabs on them and keep them in a safe place. Your working disk should have, at the very least, the files rma and rlink in the CMDS directory. If you plan to use scred an excellent text editor for assembly-language programming, make sure it is also in the CMDS directory, scred needs the file termset in /dd/SYS, so make sure that file is also available. I recommend the sys.l file be on your working disk. This file can be used by the linker for resolving the system-wide symbolic names and is faster than the Level I assembler's solution of including the statement use os9defs in each of your source files, sys.l is found in the LIB directory on your master disk. I keep my copy of sys.l in the root directory to save keystrokes. If you have 512K, you definitely want to use the RAM disk included with the Development System during the editing and assembly process. There are three sizes from which to choose. The 96K version is adequate for my assembly-language needs. The necessary modules for using the RAM disk are in the MODULES directory on your Development System disk. A quick way to start using the RAM disk is to load it from your Development System disk each time you boot your system. The RAM disk comes in two parts: A device driver and a device descriptor. Both must be in memory for the RAM disk to work. To save memory, you should first merge the driver module with the device descriptor you want. For example, THE RAINBOW June 1991 PAGE 41 to create a 128K RAM disk, load attr if it is not already in memory. Then, put your (backup) Development System disk into /d0 and enter chd /d0/modules merge ram.dr r0_128k.dd >/d0/ram disk attr /d0/ramdisk e pe From now on. you can insert the working disk into drive /d0 and enter load /d0/ramdisk iniz /r0 Now device /r0 functions like one of your disk drives, except it is much faster. Although this method is easy, it removes 8K from the OS-9 system memory, which could severely limit the number of active processes or devices at one time. The best way to add the RAM disk to your system is to create a new boot disk with the device driver and device descriptor included in the os9boot file. You can also configure the RAM disk as device /dd instead of /r0 in this way. I like this method very much. My system boots with /dd as a 96K RAM disk. After booting I use a procedure file to set up my RAM disk with the required files. The following is a sample procedure file for a single-drive system: load makdir chx /d0/cmds load scred load rma load rlink -x makdir /dd/CMDS makdir /dd/SYS unlink makdir copy /d0/sys/termset /dd/sys/termset copy /d0/lib/sys.l /dd/sys.l With all the required files either in memory or on the RAM disk, the assembler and linker really fly. Now that you have a working disk, it's time to explore the realm of OS-9 assembly language. A Quick Review Take a quick look at Listing 1. a short program that prints "Have a nice day." on the screen. Those of you who have attempted to write similar programs with EDTASM+ will appreciate the brevity of this program. I will quickly cover its operation. The first line, which starts with psect, gives the assembler some information about the program. (I'll discuss that in more detail later.) The program begins at Start, where Register X is set to point to the output string. Next. Register Y is loaded with the length of the string in bytes. Register A is then loaded with 1. which is the path number for the standard output device (usually the screen). The os9 I$WritLn system call writes the string to the screen and F$Exit returns control to the Shell. Since Register B was cleared earlier, the Shell assumes no errors occurred and — voila! — a successful program. The Assembler Directives The assembler has a set of statements called program section directives that direct it (hence the name) in its production of executable code. These directives are: psect, csect and vsect. Of the three, the only required directive in any program is psect. Every program must have a psect. The psect, or program section directive, marks the beginning of a code section. The assembler and the linker use the information given in the psect directive in building the final OS-9 module. Proper syntax for psect is psect name, typelang, attrev, edition, stacksize, entry For Level I asm users, the psect directive takes the place of the mod directive. Each source file should have only one psect directive, but multiple psects (from multiple files) can be connected with the linker. This enables you to build a library of common routines, debug them once and use them forever. The following is a description of each psect parameter: name — the name for the psect. The psect name is arbitrary, but it is good practice to use a name that describes the purpose or function of the routines in the psect. Names can be as long as 20 characters. The assembler does not use this name, but the linker does when it reports errors. It is advisable to give each psect a unique name. typelang — this specified value becomes the type/language byte for the final memory module. The four high-order bits define the module type and the four low-order bits specify the language type. Type and language codes are specified on Page 3-4 of the Technical Reference section in your OS-9 Level II manual. attrev — this value becomes the attributes/revision byte for the final module. The four high-order bits specify the module's attributes. Currently, only Bit 7 is defined. When set. Bit 7 indicates the module is reentrant and can therefore be shared. The four low-order bits specify the module's revision number. edition — this becomes the edition byte in the module header. stacksize — this is an estimation as to how much memory the routines in this psect need for the stack. Be generous; remember that OS-9 uses the stack even if your program does not. entry — the entry address of your program: the address to which OS-9 transfers control when your program is executed. With reference to Listing 1, the program section name is first, the module's type/lang- uage byte is $11 (a 6809 object-code program module), and the attributes/revision byte is $81 (a reentrant program with Revision 1). The program is a first edition with 100 bytes reserved for stack space. The program begins at Start. Direct your attention to the sample program header in Listing 2. This listing demon- strates the use of symbolic names in the psect directive. The only differences be- tween this psect and the one declared in Listing 1 are the name and stack size re- quirements. However, the real purpose of Listing 2 is to demonstrate the other two program section directives: vsect and csect. The vsect, or variable section directive, tells the assembler to begin a variable storage section. There are two types of vsects: Direct-page and non direct-page. If you specify a direct-page, the linker assigns the variables that are defined within the vsect to the 6809's direct page (the page referenced in conjunction with the 6809's DP register). The first vsect in Listing 2 is a direct-page vsect, since vsect dp is used. Thus, the count and path variables are located in the direct page. Variables stored in the direct page are more quickly accessed and require less program code than those not stored in the direct page. So, store your often-used variables there. Keep in mind there are only 256 bytes in the direct page, so you may not be able to employ the direct page for all the variables you are using. The IObuffer and filename variables are not located in the direct page. These variables should not be accessed through the same procedure to which you are accustomed with EDT- ASM+ since neither you nor the assembler know the exact location of these variables at run time. Basically, don't use extended addressing. Use an index register and reference these variables as offsets from the base address. Notice that the variables in the second vsect are large — large enough to use all the direct-page space had they been put there. THE RAINBOW June 1991 PAGE 42 It is advantageous to keep these large variables off the direct page since it is prime real estate. You are not limited to just two vsects per psect — you can have as many vsects as necessary. It is advisable, but not absolutely required, that you put all similar variables in a single vsect at the beginning of the program. csect is the next program directive. It is similar to vsect, except it does not allocate any space. A csect, or constant section directive, is merely a convenient way of assigning successive values to a list of labels and is similar to the EQU directive. In Listing 2. fn.name has the value zero and fn.ext equals eight. You can also have more than one csect in a psect. If you specify an expression after the csect. this value becomes the base address for the section. In the second csect. first equals 10, second equals 11 and last equals 12. Don't put your variables in a csect. even though one of the sample programs in the RMA manual does. Always use a vsect so the linker can properly assign values to the symbolic names. Remember that csect is used to define constants only. just like the EQU and SET directives. For example, False EQU 0 True EQU 1 is the same as csect False RMB 1 True RMB 1 ends With this in mind, the constants defined within a csect should normally be used with immediate addressing. For example, you can use the following source code with constants: LDA #True CMPA #True The Four Golden Rules When writing assembly-language programs for OS-9, keep in mind that you are writing for a multiuser, multitasking environment. This means your program does not have the degree of control you are used to with EDTASM+. The operating system determines where in memory your program runs and where its data storage area is. Also, the operating system must be free to interrupt your program and pause it so that the other active tasks receive their time, too. Unless your program has timing-critical code, leave the interrupts unmasked. For all of this to work smoothly you must follow a few simple rules. First, you must write position-independent code. Your program must be able to execute from any position in the microprocessor's 64K address space. The only way to ensure this is to avoid using absolute addressing. This means that when you need to access strings, variables or constants in your program, you must specify them as an offset from the program counter (PC) register if they are in a psect or as an offset from Register U if they are in vsect and not as an absolute address. That is. instead of using either ldy #name jmp label use leay name,pcr bra label This is called position independent code and is a requirement in OS-9. The assembler takes care of the calculations for you. so the process is painless and soon becomes second nature. Second, your program should never modify any memory outside its variable space. Since OS-9 allocates memory for each process at run time, your program might overwrite itself, or some other important data, if it is allowed contact with memory that does not belong to it. Remember, your programs are no longer running alone. Multiple copies of your program can be running at the same time. Therefore, your program must never modify itself. Third, when your program begins, OS-9 gives it some important data located in the 6809's registers. With this data, a program knows where its variable storage is and whether or not any parameters have been passed to it. Some programs don't need to know this information, but you should make sure you save it, or use it. before loading the registers with other data. Fourth, never directly access the CoCo's hardware. You should always use a system call to handle the desired function. This ensures your program runs on OS-9 systems other than your own. OS-9 has built-in routines for keyboard, screen, printer, and disk I/O, so it is a waste of time to write your own. Interfacing with the Shell Now. the $20.000 question: If you don't know in advance where your variable storage is, how do you access it? Fortunately. OS-9 provides enough data, via the registers, to provide a simple solution. When OS-9 forks a process (which is what your completed program is when it executes), it puts the following data into the registers: B — contains the number of characters in the parameter list, including the carriage return at the end. X — points to the first byte of the parameter list. Y — points to the top of the process' memory area, which is the end of the parameter list. U — points to the bottom of the memory area, which is the first byte of the direct page. DP — contains the most significant byte of the direct-page address. Never change the direct-page register. Terminate your programs by using the F$Exit call. The value in Register B at the time of the F$Exit call is returned to the Shell as an error number. A value of zero indicates no errors. Make sure to load Register B with the proper value before calling F$Exit. Since the direct-page register is set for you, any variables in the direct page can be accessed with direct addressing or as offsets from Register U. as long as it is not changed. Using the variables from Listing 2, you could use lda path stx count Variables that are off the direct page should always be referenced as offsets from Register U to ensure your programs are using the memory space OS-9 has assigned to them. To ensure this process, use leax IObuff,u leay filename,u instead of ldx #IObuff Idy #filename Following the proper rules becomes automatic after a short while. Pay particular at- tention to the sample programs, since they demonstrate important principles. Building An Executable Program Program development with the Development System is a two-step process. First, the assembler gathers your source code into a Relocatable Object File (ROF). Then, the linker processes one or more ROFs. adds the proper module header and CRC bytes, and creates the final product. Although the two-step process for generating an executable module may seem tedious, it is this very process that makes RMA so powerful. The power of this method comes from the fact that any symbolic name not defined in a psect is passed to the linker, which attempts to resolve it based on information from other psects. Thus, program code in one psect can use symbolic names defined in another psect. THE RAINBOW June 1991 PAGE 44 To make this process easier to implement, the assembler gives you the ability to declare your symbolic names as either local or global names. Local names are defined only in their psect and are unknown to other psects. In fact, other psects may use the same names for different values without any "multiply defined symbol" conflicts. Global symbols are made available to all psects. but they are referred to only if the psect has not also defined the same name. If one psect defines the symbol buffer as a global symbol and a second defines buffer as a local symbol, the symbol in the second psect is local and different from the global symbol of the same name. To make a symbol global, include a colon (:) after the symbol name when it is defined. All other symbols are local symbols by default. When using RMA, it is important to note that the assembler is case-sensitive. Thus, the labels Buffer, buffer and BUFFER are all different, even though they are spelled the same. Avoid the temptation of using the same name with different upper- and lowercase combinations in the same psect. Doing so is asking to become easily confused when debugging time comes around. When multiple psects are linked, there must be one. and only one, mainline code segment. The mainline code segment is that section of code OS-9 transfers control to when the program begins executing. The psects in listings 1 and 2 are mainline psects. psects used with the mainline psect should be written as subroutines. They can have their own vsects, but they should not attempt to define the typelang. attrev. edition or entry bytes of the module. Make these entries in the psect statement zero or the linker refuses to link the code sections together. Using Multiple psects Direct your attention to the source code in listings 3, 4 and 5. These three psects, when linked together, form a program that waits for a keypress from the keyboard. dumps the values of the 6809 registers to the screen, and prints the ASCII value, in decimal, of the key pressed. Each listing is a separate file and is assembled separately from the others. Notice that each psect has a symbol named buffer that labels a non direct-page Offset Size Description 0,s 2 bytes return address — don't change it 2,s 2 bytes number of parameters passed 4,s 2 bytes pointer to first parameter 6,s 2 bytes size, in bytes, of first parameter 8,s 2 bytes pointer to second parameter 10,s 2 bytes size, in bytes, of second parameter Figure 1: A Sample Data Block Type Size Comments BOOLEAN 1 byte $FF= true. 0= false BYTE 1 byte INTEGER 2 bytes if left-most bit set, number is negative REAL 5 bytes 8 bit exponent, 31 bit mantissa. 1 sign bit STRING varies terminated with $FF if less than max size Figure 2: The Five Basic Data Types variable. These three symbols are all local to their psects and, thus, have different values. There are three global symbols in this group of psects: Dec. RegDump and Divide. There is also only one mainline psect — main. Notice that the other psect directives specify the typelang, attrev, edition and entry bytes as zero. Variable names that have been declared as global symbols can be accessed from other psects simply by using that symbol name in your source code. Keep in mind that the assembler passes only unresolved references to the linker, so if the code in a psect defines a symbol with the same name as a global symbol, the global symbol is not used. To access a global variable from another psect. use one of the following methods. If the variable is a non direct-page variable, access them in the normal manner: leay IObuf,u ldx counter,u If a global direct-page variable is to be accessed, precede the variable name with the less-than symbol (<): lda