|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200 |
- # z80 assembler
-
- This is probably the most critical part of the Collapse OS project because it
- ensures its self-reproduction.
-
- ## Invocation
-
- `zasm` is invoked with 2 mandatory arguments and an optional one. The mandatory
- arguments are input blockdev id and output blockdev id. For example, `zasm 0 1`
- reads source code from blockdev 0, assembles it and spit the result in blockdev
- 1.
-
- Input blockdev needs to be seek-able, output blockdev doesn't need to (zasm
- writes in one pass, sequentially.
-
- The 3rd argument, optional, is the initial `.org` value. It's the high byte of
- the value. For example, `zasm 0 1 4f` assembles source in blockdev 0 as if it
- started with the line `.org 0x4f00`. This also means that the initial value of
- the `@` symbol is `0x4f00`.
-
- ## Running on a "modern" machine
-
- To be able to develop zasm efficiently, [libz80][libz80] is used to run zasm
- on a modern machine. The code lives in `emul` and ran be built with `make`,
- provided that you have a copy libz80 living in `emul/libz80`.
-
- The resulting `zasm` binary takes asm code in stdin and spits binary in stdout.
-
- ## Literals
-
- See "Number literals" in `apps/README.md`.
-
- On top of common literal logic, zasm also has string literals. It's a chain of
- characters surrounded by double quotes. Example: `"foo"`. This literal can only
- be used in the `.db` directive and is equivalent to each character being
- single-quoted and separated by commas (`'f', 'o', 'o'`). No null char is
- inserted in the resulting value (unlike what C does).
-
- ## Labels
-
- Lines starting with a name followed `:` are labeled. When that happens, the
- name of that label is associated with the binary offset of the following
- instruction.
-
- For example, a label placed at the beginning of the file is associated with
- offset 0. If placed right after a first instruction that is 2 bytes wide, then
- the label is going to be bound to 2.
-
- Those labels can then be referenced wherever a constant is expected. They can
- also be referenced where a relative reference is expected (`jr` and `djnz`).
-
- Labels can be forward-referenced, that is, you can reference a label that is
- defined later in the source file or in an included source file.
-
- Labels starting with a dot (`.`) are local labels: they belong only to the
- namespace of the current "global label" (any label that isn't local). Local
- namespace is wiped whenever a global label is encountered.
-
- Local labels allows reuse of common mnemonics and make the assembler use less
- memory.
-
- Global labels are all evaluated during the first pass, which makes possible to
- forward-reference them. Local labels are evaluated during the second pass, but
- we can still forward-reference them through a "first-pass-redux" hack.
-
- Labels can be alone on their line, but can also be "inlined", that is, directly
- followed by an instruction.
-
- ## Constants
-
- The `.equ` directive declares a constant. That constant's argument is an
- expression that is evaluated right at parse-time.
-
- Constants are evaluated during the second pass, which means that they can
- forward-reference labels.
-
- However, they *cannot* forward-reference other constants.
-
- When defining a constant, if the symbol specified has already been defined, no
- error occur and the first value defined stays intact. This allows for "user
- override" of programs.
-
- It's also important to note that constants always override labels, regardless
- of declaration order.
-
- ## Expressions
-
- See "Expressions" in `apps/README.md`.
-
- ## The Program Counter
-
- The `$` is a special symbol that can be placed in any expression and evaluated
- as the current output offset. That is, it's the value that a label would have if
- it was placed there.
-
- ## The Last Value
-
- Whenever a `.equ` directive is evaluated, its resulting value is saved in a
- special "last value" register that can then be used in any expression. This
- last value is referenced with the `@` special symbol. This is very useful for
- variable definitions and for jump tables.
-
- Note that `.org` also affect the last value.
-
- ## Includes
-
- The `.inc` directive is special. It takes a string literal as an argument and
- opens, in the currently active filesystem, the file with the specified name.
-
- It then proceeds to parse that file as if its content had been copy/pasted in
- the includer file, that is: global labels are kept and can be referenced
- elsewhere. Constants too. An exception is local labels: a local namespace always
- ends at the end of an included file.
-
- There an important limitation with includes: only one level of includes is
- allowed. An included file cannot have an `.inc` directive.
-
- ## Directives
-
- **.db**: Write bytes specified by the directive directly in the resulting
- binary. Each byte is separated by a comma. Example: `.db 0x42, foo`
-
- **.dw**: Same as `.db`, but outputs words. Example: `.dw label1, label2`
-
- **.equ**: Binds a symbol named after the first parameter to the value of the
- expression written as the second parameter. Example:
- `.equ foo 0x42+'A'`. See "Constants" above.
-
- **.fill**: Outputs the number of null bytes specified by its argument, an
- expression. Often used with `$` to fill our binary up to a certain
- offset. For example, if we want to place an instruction exactly at
- byte 0x38, we would precede it with `.fill 0x38-$`.
-
- The maximum value possible for `.fill` is `0xd000`. We do this to
- avoid "overshoot" errors, that is, error where `$` is greater than
- the offset you're trying to reach in an expression like `.fill X-$`
- (such an expression overflows to `0xffff`).
-
- **.org**: Sets the Program Counter to the value of the argument, an expression.
- For example, a label being defined right after a `.org 0x400`, would
- have a value of `0x400`. Does not do any filling. You have to do that
- explicitly with `.fill`, if needed. Often used to assemble binaries
- designed to run at offsets other than zero (userland).
-
- **.out**: Outputs the value of the expression supplied as an argument to
- `ZASM_DEBUG_PORT`. The value is always interpreted as a word, so
- there's always two `out` instruction executed per directive. High byte
- is sent before low byte. Useful or debugging, quickly figuring our
- RAM constants, etc. The value is only outputted during the second
- pass.
-
- **.inc**: Takes a string literal as an argument. Open the file name specified
- in the argument in the currently active filesystem, parse that file
- and output its binary content as is the code has been in the includer
- file.
-
- **.bin**: Takes a string literal as an argument. Open the file name specified
- in the argument in the currently active filesystem and outputs its
- contents directly.
-
- ## Undocumented instructions
-
- `zasm` doesn't support undocumented instructions such as the ones that involve
- using `IX` and `IY` as 8-bit registers. We used to support them, but because
- this makes our code incompatible with Z80-compatible CPUs such as the Z180, we
- prefer to avoid these in our code.
-
- ## AVR assembler
-
- `zasm` can be configured, at compile time, to be a AVR assembler instead of a
- z80 assembler. Directives, literals, symbols, they're all the same, it's just
- instructions and their arguments that change.
-
- Instructions and their arguments have a ayntax that is similar to other AVR
- assemblers: registers are referred to as `rXX`, mnemonics are the same,
- arguments are separated by commas.
-
- To assemble an AVR assembler, use the `gluea.asm` file instead of the regular
- one.
-
- Note about AVR and PC: In most assemblers, arithmetics for instructions
- addresses have words (two bytes) as their basic unit because AVR instructions
- are either 16bit in length or 32bit in length. All addresses constants in
- upcodes are in words. However, in zasm's core logic, PC is in bytes (because z80
- upcodes can be 1 byte).
-
- The AVR assembler, of course, correctly translates byte PCs to words when
- writing upcodes, however, when you write your expressions, you need to remember
- to treat with bytes. For example, in a traditional AVR assembler, jumping to
- the instruction after the "foo" label would be "rjmp foo+1". In zasm, it's
- "rjmp foo+2". If your expression results in an odd number, the low bit of your
- number will be ignored.
-
- Limitations:
-
- * `CALL` and `JMP` only support 16-bit numbers, not 22-bit ones.
- * `BRLO` and `BRSH` are not there. Use `BRCS` and `BRCC` instead.
- * No `high()` and `low()`. Use `&0xff` and `}8`.
-
- [libz80]: https://github.com/ggambetta/libz80
|