Mirror of CollapseOS
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 8.3KB

4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200
  1. # z80 assembler
  2. This is probably the most critical part of the Collapse OS project because it
  3. ensures its self-reproduction.
  4. ## Invocation
  5. `zasm` is invoked with 2 mandatory arguments and an optional one. The mandatory
  6. arguments are input blockdev id and output blockdev id. For example, `zasm 0 1`
  7. reads source code from blockdev 0, assembles it and spit the result in blockdev
  8. 1.
  9. Input blockdev needs to be seek-able, output blockdev doesn't need to (zasm
  10. writes in one pass, sequentially.
  11. The 3rd argument, optional, is the initial `.org` value. It's the high byte of
  12. the value. For example, `zasm 0 1 4f` assembles source in blockdev 0 as if it
  13. started with the line `.org 0x4f00`. This also means that the initial value of
  14. the `@` symbol is `0x4f00`.
  15. ## Running on a "modern" machine
  16. To be able to develop zasm efficiently, [libz80][libz80] is used to run zasm
  17. on a modern machine. The code lives in `emul` and ran be built with `make`,
  18. provided that you have a copy libz80 living in `emul/libz80`.
  19. The resulting `zasm` binary takes asm code in stdin and spits binary in stdout.
  20. ## Literals
  21. See "Number literals" in `apps/README.md`.
  22. On top of common literal logic, zasm also has string literals. It's a chain of
  23. characters surrounded by double quotes. Example: `"foo"`. This literal can only
  24. be used in the `.db` directive and is equivalent to each character being
  25. single-quoted and separated by commas (`'f', 'o', 'o'`). No null char is
  26. inserted in the resulting value (unlike what C does).
  27. ## Labels
  28. Lines starting with a name followed `:` are labeled. When that happens, the
  29. name of that label is associated with the binary offset of the following
  30. instruction.
  31. For example, a label placed at the beginning of the file is associated with
  32. offset 0. If placed right after a first instruction that is 2 bytes wide, then
  33. the label is going to be bound to 2.
  34. Those labels can then be referenced wherever a constant is expected. They can
  35. also be referenced where a relative reference is expected (`jr` and `djnz`).
  36. Labels can be forward-referenced, that is, you can reference a label that is
  37. defined later in the source file or in an included source file.
  38. Labels starting with a dot (`.`) are local labels: they belong only to the
  39. namespace of the current "global label" (any label that isn't local). Local
  40. namespace is wiped whenever a global label is encountered.
  41. Local labels allows reuse of common mnemonics and make the assembler use less
  42. memory.
  43. Global labels are all evaluated during the first pass, which makes possible to
  44. forward-reference them. Local labels are evaluated during the second pass, but
  45. we can still forward-reference them through a "first-pass-redux" hack.
  46. Labels can be alone on their line, but can also be "inlined", that is, directly
  47. followed by an instruction.
  48. ## Constants
  49. The `.equ` directive declares a constant. That constant's argument is an
  50. expression that is evaluated right at parse-time.
  51. Constants are evaluated during the second pass, which means that they can
  52. forward-reference labels.
  53. However, they *cannot* forward-reference other constants.
  54. When defining a constant, if the symbol specified has already been defined, no
  55. error occur and the first value defined stays intact. This allows for "user
  56. override" of programs.
  57. It's also important to note that constants always override labels, regardless
  58. of declaration order.
  59. ## Expressions
  60. See "Expressions" in `apps/README.md`.
  61. ## The Program Counter
  62. The `$` is a special symbol that can be placed in any expression and evaluated
  63. as the current output offset. That is, it's the value that a label would have if
  64. it was placed there.
  65. ## The Last Value
  66. Whenever a `.equ` directive is evaluated, its resulting value is saved in a
  67. special "last value" register that can then be used in any expression. This
  68. last value is referenced with the `@` special symbol. This is very useful for
  69. variable definitions and for jump tables.
  70. Note that `.org` also affect the last value.
  71. ## Includes
  72. The `.inc` directive is special. It takes a string literal as an argument and
  73. opens, in the currently active filesystem, the file with the specified name.
  74. It then proceeds to parse that file as if its content had been copy/pasted in
  75. the includer file, that is: global labels are kept and can be referenced
  76. elsewhere. Constants too. An exception is local labels: a local namespace always
  77. ends at the end of an included file.
  78. There an important limitation with includes: only one level of includes is
  79. allowed. An included file cannot have an `.inc` directive.
  80. ## Directives
  81. **.db**: Write bytes specified by the directive directly in the resulting
  82. binary. Each byte is separated by a comma. Example: `.db 0x42, foo`
  83. **.dw**: Same as `.db`, but outputs words. Example: `.dw label1, label2`
  84. **.equ**: Binds a symbol named after the first parameter to the value of the
  85. expression written as the second parameter. Example:
  86. `.equ foo 0x42+'A'`. See "Constants" above.
  87. **.fill**: Outputs the number of null bytes specified by its argument, an
  88. expression. Often used with `$` to fill our binary up to a certain
  89. offset. For example, if we want to place an instruction exactly at
  90. byte 0x38, we would precede it with `.fill 0x38-$`.
  91. The maximum value possible for `.fill` is `0xd000`. We do this to
  92. avoid "overshoot" errors, that is, error where `$` is greater than
  93. the offset you're trying to reach in an expression like `.fill X-$`
  94. (such an expression overflows to `0xffff`).
  95. **.org**: Sets the Program Counter to the value of the argument, an expression.
  96. For example, a label being defined right after a `.org 0x400`, would
  97. have a value of `0x400`. Does not do any filling. You have to do that
  98. explicitly with `.fill`, if needed. Often used to assemble binaries
  99. designed to run at offsets other than zero (userland).
  100. **.out**: Outputs the value of the expression supplied as an argument to
  101. `ZASM_DEBUG_PORT`. The value is always interpreted as a word, so
  102. there's always two `out` instruction executed per directive. High byte
  103. is sent before low byte. Useful or debugging, quickly figuring our
  104. RAM constants, etc. The value is only outputted during the second
  105. pass.
  106. **.inc**: Takes a string literal as an argument. Open the file name specified
  107. in the argument in the currently active filesystem, parse that file
  108. and output its binary content as is the code has been in the includer
  109. file.
  110. **.bin**: Takes a string literal as an argument. Open the file name specified
  111. in the argument in the currently active filesystem and outputs its
  112. contents directly.
  113. ## Undocumented instructions
  114. `zasm` doesn't support undocumented instructions such as the ones that involve
  115. using `IX` and `IY` as 8-bit registers. We used to support them, but because
  116. this makes our code incompatible with Z80-compatible CPUs such as the Z180, we
  117. prefer to avoid these in our code.
  118. ## AVR assembler
  119. `zasm` can be configured, at compile time, to be a AVR assembler instead of a
  120. z80 assembler. Directives, literals, symbols, they're all the same, it's just
  121. instructions and their arguments that change.
  122. Instructions and their arguments have a ayntax that is similar to other AVR
  123. assemblers: registers are referred to as `rXX`, mnemonics are the same,
  124. arguments are separated by commas.
  125. To assemble an AVR assembler, use the `gluea.asm` file instead of the regular
  126. one.
  127. Note about AVR and PC: In most assemblers, arithmetics for instructions
  128. addresses have words (two bytes) as their basic unit because AVR instructions
  129. are either 16bit in length or 32bit in length. All addresses constants in
  130. upcodes are in words. However, in zasm's core logic, PC is in bytes (because z80
  131. upcodes can be 1 byte).
  132. The AVR assembler, of course, correctly translates byte PCs to words when
  133. writing upcodes, however, when you write your expressions, you need to remember
  134. to treat with bytes. For example, in a traditional AVR assembler, jumping to
  135. the instruction after the "foo" label would be "rjmp foo+1". In zasm, it's
  136. "rjmp foo+2". If your expression results in an odd number, the low bit of your
  137. number will be ignored.
  138. Limitations:
  139. * `CALL` and `JMP` only support 16-bit numbers, not 22-bit ones.
  140. * `BRLO` and `BRSH` are not there. Use `BRCS` and `BRCC` instead.
  141. * No `high()` and `low()`. Use `&0xff` and `}8`.
  142. [libz80]: https://github.com/ggambetta/libz80