Mirror of CollapseOS
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

271 lines
9.7KB

  1. # Implementation notes
  2. # Execution model
  3. After having read a line through readln, we want to interpret
  4. it. As a general rule, we go like this:
  5. 1. read single word from line
  6. 2. Can we find the word in dict?
  7. 3. If yes, execute that word, goto 1
  8. 4. Is it a number?
  9. 5. If yes, push that number to PS, goto 1
  10. 6. Error: undefined word.
  11. # What is a word?
  12. A word is a place in memory having a particular structure. Its
  13. first byte is a "word type" byte (see below), followed by a
  14. structure that depends on the word type. This structure is
  15. generally refered to as the Parameter Field (PF).
  16. # Stack management
  17. In all supported arches, The Parameter Stack and Return Stack
  18. tops are tracked by a registered assigned to this purpose. For
  19. example, in z80, it's SP and IX that do that. The value in those
  20. registers are referred to as PS Pointer (PSP) and RS Pointer
  21. (RSP).
  22. Those stacks are contiguous and grow in opposite directions. PS
  23. grows "down", RS grows "up".
  24. Stack underflow and overflow: In each native word involving
  25. PS popping, we check whether the stack is big enough. If it's
  26. not we go in "uflw" (underflow) error condition, then abort.
  27. This means that if you implement a native word that involves
  28. popping from PS, you are expected to call chkPS, for under-
  29. flow situations.
  30. We don't check RS for underflow because the cost of the check
  31. is significant and its usefulness is dubious: if RS isn't
  32. tightly in control, we're screwed anyways, and that, well
  33. before we reach underflow.
  34. Overflow condition happen when RSP and PSP meet somewhere in
  35. the middle. That check is made at each "next" call.
  36. # Dictionary entry
  37. A dictionary entry has this structure:
  38. - Xb name. Arbitrary long number of character (but can't be
  39. bigger than input buffer, of course). not null-terminated
  40. - 2b prev offset
  41. - 1b name size + IMMEDIATE flag (7th bit)
  42. - 1b entry type
  43. - Parameter field (PF)
  44. The prev offset is the number of bytes between the prev field
  45. and the previous word's entry type.
  46. The size + flag indicate the size of the name field, with the
  47. 7th bit being the IMMEDIATE flag.
  48. The entry type is simply a number corresponding to a type which
  49. will determine how the word will be executed. See "Word types"
  50. below.
  51. The vast majority of the time, a dictionary entry refers to a
  52. word. However, sometimes, it refers to something else. A "hook
  53. word" (see bootstrap.txt) is such an example.
  54. # Word types
  55. There are 6 word types in Collapse OS. Whenever you have a
  56. wordref, it's pointing to a byte with numbers 0 to 5. This
  57. number is the word type and the word's behavior depends on it.
  58. 0: native. This words PFA contains native binary code and is
  59. jumped to directly.
  60. 1: compiled. This word's PFA contains a list of wordrefs and its
  61. execution is described in "Executing a compiled word" below.
  62. 2: cell. This word is usually followed by a 2-byte value in its
  63. PFA. Upon execution, the address of the PFA is pushed to PS.
  64. 3: DOES>. This word is created by "DOES>" and is followed
  65. by a 2-bytes value as well as the address where "DOES>" was
  66. compiled. At that address is an wordref list exactly like in a
  67. compiled word. Upon execution, after having pushed its cell
  68. addr to PSP, it executes its reference exactly like a
  69. compiled word.
  70. 4: alias. See usage.txt. PFA is like a cell, but instead of
  71. pushing it to PS, we execute it.
  72. 5: ialias. Same as alias, but with an added indirection.
  73. # Executing a compiled word
  74. At its core, executing a word is pushing the wordref on PS and
  75. calling EXECUTE. Then, we let the word do its things. Some
  76. words are special, but most of them are of the "compiled"
  77. type, and that's their execution that we describe here.
  78. First of all, at all time during execution, the Interpreter
  79. Pointer (IP) points to the wordref we're executing next.
  80. When we execute a compiled word, the first thing we do is push
  81. IP to the Return Stack (RS). Therefore, RS' top of stack will
  82. contain a wordref to execute next, after we EXIT.
  83. At the end of every compiled word is an EXIT. This pops RS, sets
  84. IP to it, and continues.
  85. A compiled word is simply a list of wordrefs, but not all those
  86. wordrefs are 2 bytes in length. Some wordrefs are special. For
  87. example, a reference to (n) will be followed by an extra 2 bytes
  88. number. It's the responsibility of the (n) word to advance IP
  89. by 2 extra bytes.
  90. To be clear: It's not (n)'s word type that is special, it's a
  91. regular "native" word. It's the compilation of the (n) type,
  92. done in LITN, that is special. We manually compile a number
  93. constant at compilation time, which is what is expected in (n)'s
  94. implementation. Similar special things happen in (s), (br),
  95. (?br) and (loop).
  96. For example, the word defined by ": FOO 42 EMIT ;" would have
  97. an 8 bytes PF: a 2b ref to (n), 2b with 0x002a, a 2b ref to EMIT
  98. and then a 2b ref to EXIT.
  99. When executing this word, we first set IP to PF+2, then exec
  100. PF+0, that is, the (n) reference. (n), when executing, reads IP,
  101. pushes that value to PS, then advances IP by 2. This means that
  102. when we return to the "next" routine, IP points to PF+4, which
  103. next will execute. Before executing, IP is increased by 2, but
  104. it's the "not-increased" value (PF+4) that is executed, that is,
  105. EMIT. EMIT does its thing, doesn't touch IP, then returns to
  106. "next". We're still at PF+6, which then points to EXIT. EXIT
  107. pops RS into IP, which is the value that IP had before FOO was
  108. called. The "next" dance continues...
  109. # System variables
  110. There are some core variables in the core system that are
  111. referred to directly by their address in memory throughout the
  112. code. The place where they live is configurable by the SYSVARS
  113. constant in xcomp unit, but their relative offset is not. In
  114. fact, they're mostly referred to directly as their numerical
  115. offset along with a comment indicating what this offset refers
  116. to.
  117. SYSVARS occupy 0xa0 bytes in memory in addition to driver mem-
  118. ory, which typically follows SYSVARS.
  119. This system is a bit fragile because every time we change those
  120. offsets, we have to be careful to adjust all system variables
  121. offsets, but thankfully, there aren't many system variables.
  122. Here's a list of them:
  123. SYSVARS FUTURE USES +3c BLK(*
  124. +02 CURRENT +3e ~C!*
  125. +04 HERE +41 ~C!ERR
  126. +06 C<? +42 FUTURE USES
  127. +08 FUTURE USES +50 NL> character
  128. +0a FUTURE USES +51 CURRENTPTR
  129. +0c C<* +53 EMIT ialias
  130. +0e WORDBUF +55 KEY? ialias
  131. +2e BOOT C< PTR +57 FUTURE USES
  132. +30 IN> +60 INPUT BUFFER
  133. +32 FUTURE USES +a0 DRIVERS
  134. +34 BLK@*
  135. +36 BLK!*
  136. +38 BLK>
  137. +3a BLKDTY
  138. CURRENT points to the last dict entry.
  139. HERE points to current write offset.
  140. C<* holds routine address called on C<. If the C<* override
  141. at 0x08 is nonzero, this routine is called instead.
  142. IN> is the current position in IN(, which is the input buffer.
  143. IN(* is a pointer to the input buffer, allocated at runtime.
  144. CURRENTPTR points to current CURRENT. The Forth CURRENT word
  145. doesn't return RAM+2 directly, but rather the value at this
  146. address. Most of the time, it points to RAM+2, but sometimes,
  147. when maintaining alternative dicts (during cross compilation
  148. for example), it can point elsewhere.
  149. BLK* "Disk blocks" in usage.txt.
  150. ~C!* if nonzero, contains a jump to assembly code that overrides
  151. the routine that writes a byte to memory and then returns.
  152. Register usage is arch-dependent, see boot code for details.
  153. ~C!ERR: When an error happens during ~C! write overrides, sets
  154. this byte to a nonzero value. Otherwise, stays at zero. Has to
  155. be reset to zero manually after an error.
  156. NL> is a single byte. If zero (default), NL> spits CR/LF. Other-
  157. wise, it spits the value of RAM+50.
  158. DRIVERS section is reserved for recipe-specific drivers.
  159. FUTURE USES section is unused for now.
  160. # Initialization sequence
  161. (this describes the z80 boot sequence, but other arches have
  162. a very similar sequence, and, of course, once we enter Forth
  163. territory, identical)
  164. On boot, we jump to the "main" routine in B289 which does
  165. very few things.
  166. 1. Set SP to PS_ADDR and IX to RS_ADDR.
  167. 2. Set CURRENT to value of LATEST field in stable ABI.
  168. 3. Set HERE to HERESTART const if defined, to CURRENT other-
  169. wise.
  170. 4. Initialize ~C! and ~C!ERR to 0.
  171. 5. Execute the word referred to by 0x04 (BOOT) in stable ABI.
  172. In a normal system, BOOT is in core words at B396 and does a
  173. few things:
  174. 1. Initialize a few core variables:
  175. CURRENT* -> CURRENT (RAM+02)
  176. BOOT C< PTR -> LATEST
  177. C<* override -> 0
  178. 2. Initialized ialiases in this way:
  179. EMIT -> (emit)
  180. KEY -> (key)
  181. NL -> CRLF
  182. 3. Set "C<*", the word that C< calls, to (boot<).
  183. 4. Call INTERPRET which interprets boot source code until
  184. ASCII EOT (4) is met. This usually initializes drivers.
  185. 5. Initialize rdln buffer, _sys entry (for EMPTY), prints
  186. "CollapseOS" and then calls (main).
  187. 6. (main) interprets from rdln input (usually from KEY) until
  188. EOT is met, then calls BYE.
  189. If, for some reason, you need to override an ialias at some
  190. point, you de-override it by re-setting it to the address of
  191. the word specified at step 2.
  192. # Stable ABI
  193. The Stable ABI lives at the beginning of the binary and prov-
  194. ides a way for Collapse OS code to access values that would
  195. otherwise be difficult to access. Here's the complete list of
  196. these references:
  197. 04 BOOT addr 06 (uflw) addr 08 LATEST
  198. 13 (oflw) addr 1a next addr
  199. BOOT, (uflw) and (oflw) exist because they are referred to
  200. before those words are defined (in core words). LATEST is a
  201. critical part of the initialization sequence.
  202. All Collapse OS binaries, regardless of architecture, have
  203. those values at those offsets of them. Some binaries are built
  204. to run at offset different than zero. This stable ABI lives at
  205. that offset, not 0.