doc: take implementation notes out of blkfs

2020-08-16 08:08:23 -04:00 · 2020-08-16 08:08:23 -04:00 · d03d93668f
commit d03d93668f
parent 5c4bbaabf4
21 changed files with 225 additions and 250 deletions
--- a/blk/001
+++ b/blk/001
@ -1,7 +1,6 @@
 MASTER INDEX

- 30 Dictionary
- 70 Implementation notes      100 Block editor
+ 30 Dictionary                100 Block editor
 120 Visual Editor             150 Extra words
 200 Z80 assembler             260 Cross compilation
 280 Z80 boot code             350 Core words
--- a/blk/070
+++ b/blk/070
@ -1,6 +0,0 @@
-Implementation notes
-
-71 Execution model             73 Executing a word
-75 Stack management            77 Dictionary
-80 System variables            85 Word types
-89 Initialization sequence     91 Stable ABI
--- a/blk/071
+++ b/blk/071
@ -1,11 +0,0 @@
-EXECUTION MODEL
-
-After having read a line through readln, we want to interpret
-it. As a general rule, we go like this:
-
-1. read single word from line
-2. Can we find the word in dict?
-3. If yes, execute that word, goto 1
-4. Is it a number?
-5. If yes, push that number to PS, goto 1
-6. Error: undefined word.
--- a/blk/073
+++ b/blk/073
@ -1,16 +0,0 @@
-EXECUTING A WORD
-
-At it's core, executing a word is pushing the wordref on PS and
-calling EXECUTE. Then, we let the word do its things. Some
-words are special, but most of them are of the compiledWord
-type, and that's their execution that we describe here.
-
-First of all, at all time during execution, the Interpreter
-Pointer (IP) points to the wordref we're executing next.
-
-When we execute a compiledWord, the first thing we do is push
-IP to the Return Stack (RS). Therefore, RS' top of stack will
-contain a wordref to execute next, after we EXIT.
-
-At the end of every compiledWord is an EXIT. This pops RS, sets
-IP to it, and continues.
--- a/blk/075
+++ b/blk/075
@ -1,16 +0,0 @@
-Stack management
-
-The Parameter stack (PS) is maintained by SP and the Return
-stack (RS) is maintained by IX. This allows us to generally use
-push and pop freely because PS is the most frequently used.
-However, this causes a problem with routine calls: because in
-Forth, the stack isn't balanced within each call, our return
-offset, when placed by a CALL, messes everything up. This is
-one of the reasons why we need stack management routines below.
-IX always points to RS' Top Of Stack (TOS)
-
-This return stack contain "Interpreter pointers", that is a
-pointer to the address of a word, as seen in a compiled list of
-words.
-
-                                                        (cont.)
--- a/blk/076
+++ b/blk/076
@ -1,11 +0,0 @@
-Stack underflow and overflow: In each native word involving
-PSP popping, we check whether the stack is big enough. If it's
-not we go in "uflw" (underflow) error condition, then abort.
-
-We don't check RSP for underflow because the cost of the check
-is significant and its usefulness is dubious: if RSP isn't
-tightly in control, we're screwed anyways, and that, well
-before we reach underflow.
-
-Overflow condition happen when RSP and PSP meet somewhere in
-the middle. That check is made at each "next" call.
--- a/blk/077
+++ b/blk/077
@ -1,16 +0,0 @@
-Dictionary
-
-A dictionary entry has this structure:
-
- Xb name. Arbitrary long number of character (but can't be
-  bigger than input buffer, of course). not null-terminated
- 2b prev offset
- 1b size + IMMEDIATE flag
- 1b code pointer (always jumps in the <0x100 range)
- Parameter field (PF)
-
-The prev offset is the number of bytes between the prev field
-and the previous word's code pointer.
-
-The size + flag indicate the size of the name field, with the
-7th bit being the IMMEDIATE flag.                       (cont.)
--- a/blk/078
+++ b/blk/078
@ -1,12 +0,0 @@
-(cont.) The code pointer point to "word routines". These
-routines expect to be called with IY pointing to the PF. They
-themselves are expected to end by jumping to the address at
-(IP). They will usually do so with "jp next". They are 1b
-because all those routines live in the first 0x100 bytes of
-the boot binary. The 0 MSB is assumed.
-
-That's for "regular" words (words that are part of the dict
-chain). There are also "special words", for example NUMBER,
-LIT, FBR, that have a slightly different structure. They're
-also a pointer to an executable, but as for the other fields,
-the only one they have is the "flags" field.
--- a/blk/080
+++ b/blk/080
@ -1,16 +0,0 @@
-System variables
-
-There are some core variables in the core system that are
-referred to directly by their address in memory throughout the
-code. The place where they live is configurable by the SYSVARS
-constant in xcomp unit, but their relative offset is not. In
-fact, they're mostly referred to directly as their numerical
-offset along with a comment indicating what this offset refers
-to.
-
-This system is a bit fragile because every time we change those
-offsets, we have to be careful to adjust all system variables
-offsets, but thankfully, there aren't many system variables.
-Here's a list of them:
-
-                                                        (cont.)
--- a/blk/081
+++ b/blk/081
@ -1,16 +0,0 @@
-SYSVARS   FUTURE USES          +3c       BLK(*
-+02       CURRENT              +3e       A@*
-+04       HERE                 +40       A!*
-+06       C<?                  +42       FUTURE USES
-+08       C<* override         +51       CURRENTPTR
-+0a       NLPTR                +53       (emit) override
-+0c       C<*                  +55       (key) override
-+0e       WORDBUF              +57       FUTURE USES
-+2e       BOOT C< PTR
-+30       IN>
-+32       IN(*                 +70       DRIVERS
-+34       BLK@*                +80       RAMEND
-+36       BLK!*
-+38       BLK>
-+3a       BLKDTY
-                                                        (cont.)
--- a/blk/082
+++ b/blk/082
@ -1,16 +0,0 @@
-CURRENT points to the last dict entry.
-
-HERE points to current write offset.
-
-IP is the Interpreter Pointer
-
-PARSEPTR holds routine address called on (parse)
-
-C<* holds routine address called on C<. If the C<* override
-at 0x08 is nonzero, this routine is called instead.
-
-IN> is the current position in IN(, which is the input buffer.
-
-IN(* is a pointer to the input buffer, allocated at runtime.
-
-                                                        (cont.)
--- a/blk/083
+++ b/blk/083
@ -1,16 +0,0 @@
-C<? is a flag indicating whether a character is waiting in the
-input stream. 1 means yes, 0 means no. It is the responsibility
-of C<* to update that flag.
-
-WORDBUF is the buffer used by WORD
-
-BOOT C< PTR is used when Forth boots from in-memory
-source. See "Initialization sequence" below.
-
-
-
-
-
-
-
-                                                        (cont.)
--- a/blk/084
+++ b/blk/084
@ -1,14 +0,0 @@
-CURRENTPTR points to current CURRENT. The Forth CURRENT word
-doesn't return RAM+2 directly, but rather the value at this
-address. Most of the time, it points to RAM+2, but sometimes,
-when maintaining alternative dicts (during cross compilation
-for example), it can point elsewhere.
-
-NLPTR points to an alternative routine for NL (by default,
-CRLF).
-
-BLK* see B416.
-
-FUTURE USES section is unused for now.          
-
-DRIVERS section is reserved for recipe-specific drivers.
--- a/blk/085
+++ b/blk/085
@ -1,15 +0,0 @@
-Word types
-
-There are 4 word types in Collapse OS. Whenever you have a
-wordref, it's pointing to a byte with numbers 0 to 3. This
-number is the word type and the word's behavior depends on it.
-
-0: native. This words PFA contains native binary code and is
-jumped to directly.
-
-1: compiled. This word's PFA contains an atom list and its
-execution is described in "EXECUTION MODEL" above.
-
-2: cell. This word is usually followed by a 2-byte value in its
-PFA. Upon execution, the address of the PFA is pushed to PS.
-                                                        (cont.)
--- a/blk/086
+++ b/blk/086
@ -1,6 +0,0 @@
-3: DOES>. This word is created by "DOES>" and is followed
-by a 2-byte value as well as the address where "DOES>" was
-compiled. At that address is an atom list exactly like in a
-compiled word. Upon execution, after having pushed its cell
-addr to PSP, it executes its reference exactly like a
-compiled word.
--- a/blk/089
+++ b/blk/089
@ -1,16 +0,0 @@
-Initialization sequence
-
-On boot, we jump to the "main" routine in B289 which does
-very few things.
-
-1. Set SP to PS_ADDR and IX to RS_ADDR
-2. Sets HERE to SYSVARS+0x80.
-3. Sets CURRENT to value of LATEST field in stable ABI.
-4. Execute the word referred to by 0x04 (BOOT) in stable ABI.
-
-In a normal system, BOOT is in core words at B396 and does a
-few things:
-
-1. Initialize all overrides to 0.
-2. Write LATEST in BOOT C< PTR ( see below )
-3. Set "C<*", the word that C< calls to (boot<).        (cont.)
--- a/blk/090
+++ b/blk/090
@ -1,10 +0,0 @@
-4. Call INTERPRET which interprets boot source code until
-   ASCII EOT (4) is met. This usually init drivers.
-5. Initialize rdln buffer, _sys entry (for EMPTY), prints
-   "CollapseOS" and then calls (main).
-6. (main) interprets from rdln input (usually from KEY) until
-   EOT is met, then calls BYE.
-
-In RAM-only environment, we will typically have a
-"CURRENT @ HERE !" line during init to have HERE begin at the
-end of the binary instead of RAMEND.
--- a/blk/091
+++ b/blk/091
@ -1,16 +0,0 @@
-Stable ABI
-
-Across all architectures, some offset are referred to by off-
-sets that don't change (well, not without some binary manipu-
-lation). Here's the complete list of these references:
-
-04 BOOT addr         06 (uflw) addr      08 LATEST
-13 (oflw) addr       2b (s) wordref      33 2>R wordref
-42 EXIT wordref      53 (br) wordref     67 (?br) wordref
-80 (loop) wordref    bf (n) wordref
-
-BOOT, (uflw) and (oflw) exist because they are referred to
-before those words are defined (in core words). LATEST is a
-critical part of the initialization sequence.
-
-                                                        (cont.)
--- a/blk/092
+++ b/blk/092
@ -1,16 +0,0 @@
-Stable wordrefs are there for more complicated reasons. When
-cross-compiling Collapse OS, we use immediate words from the
-host and some of them compile wordrefs (IF compiles (?br),
-LOOP compiles (loop), etc.). These compiled wordref need to
-be stable across binaries, so they're part of the stable ABI.
-
-Another layer of complexity is the fact that some binaries
-don't begin at offset 0. In that case, the stable ABI doesn't
-begin at 0 either. The EXECUTE word has a special handling of
-those case where any wordref < 0x100 has the binary offset
-applied to it.
-
-But that's not the end of our problems. If an offsetted binary
-cross compiles a binary with a different offset, stable ABI
-references will be > 0x100 and be broken.
-                                                        (cont.)
--- a/blk/093
+++ b/blk/093
@ -1,3 +0,0 @@
-For this reason, any stable wordref compiled in the "hot zone"
-(B397-B400) has to be compiled by direct offset reference to
-avoid having any binary offset applied to it.
--- a/doc/impl.txt
+++ b/doc/impl.txt
@ -0,0 +1,224 @@
+# Implementation notes
+
+# Execution model
+
+After having read a line through readln, we want to interpret
+it. As a general rule, we go like this:
+
+1. read single word from line
+2. Can we find the word in dict?
+3. If yes, execute that word, goto 1
+4. Is it a number?
+5. If yes, push that number to PS, goto 1
+6. Error: undefined word.
+
+# Executing a word
+
+At it's core, executing a word is pushing the wordref on PS and
+calling EXECUTE. Then, we let the word do its things. Some
+words are special, but most of them are of the "compiled"
+type (regular nonnative word), and that's their execution that
+we describe here.
+
+First of all, at all time during execution, the Interpreter
+Pointer (IP) points to the wordref we're executing next.
+
+When we execute a compiled word, the first thing we do is push
+IP to the Return Stack (RS). Therefore, RS' top of stack will
+contain a wordref to execute next, after we EXIT.
+
+At the end of every compiled word is an EXIT. This pops RS, sets
+IP to it, and continues.
+
+# Stack management
+
+In all supported arches, The Parameter Stack and Return Stack
+tops are trackes by a registered assigned to this purpose. For
+example, in z80, it's SP and IX that do that. The value in those
+registers are referred to as PS Pointer (PSP) and RS Pointer
+(RSP).
+
+Those stacks are contiguous and grow in opposite directions. PS
+grows "down", RS grows "up".
+
+Stack underflow and overflow: In each native word involving
+PS popping, we check whether the stack is big enough. If it's
+not we go in "uflw" (underflow) error condition, then abort.
+
+We don't check RS for underflow because the cost of the check
+is significant and its usefulness is dubious: if RS isn't
+tightly in control, we're screwed anyways, and that, well
+before we reach underflow.
+
+Overflow condition happen when RSP and PSP meet somewhere in
+the middle. That check is made at each "next" call.
+
+# Dictionary entry
+
+A dictionary entry has this structure:
+
+- Xb name. Arbitrary long number of character (but can't be
+  bigger than input buffer, of course). not null-terminated
+- 2b prev offset
+- 1b name size + IMMEDIATE flag (7th bit)
+- 1b entry type
+- Parameter field (PF)
+
+The prev offset is the number of bytes between the prev field
+and the previous word's code pointer.
+
+The size + flag indicate the size of the name field, with the
+7th bit being the IMMEDIATE flag.
+
+The entry type is simply a number corresponding to a type which
+will determine how the word will be executed. See "Word types"
+below.
+
+# Word types
+
+There are 4 word types in Collapse OS. Whenever you have a
+wordref, it's pointing to a byte with numbers 0 to 3. This
+number is the word type and the word's behavior depends on it.
+
+0: native. This words PFA contains native binary code and is
+jumped to directly.
+
+1: compiled. This word's PFA contains an atom list and its
+execution is described in "Execution model" above.
+
+2: cell. This word is usually followed by a 2-byte value in its
+PFA. Upon execution, the address of the PFA is pushed to PS.
+
+3: DOES>. This word is created by "DOES>" and is followed
+by a 2-byte value as well as the address where "DOES>" was
+compiled. At that address is an atom list exactly like in a
+compiled word. Upon execution, after having pushed its cell
+addr to PSP, it executes its reference exactly like a
+compiled word.
+
+# System variables
+
+There are some core variables in the core system that are
+referred to directly by their address in memory throughout the
+code. The place where they live is configurable by the SYSVARS
+constant in xcomp unit, but their relative offset is not. In
+fact, they're mostly referred to directly as their numerical
+offset along with a comment indicating what this offset refers
+to.
+
+This system is a bit fragile because every time we change those
+offsets, we have to be careful to adjust all system variables
+offsets, but thankfully, there aren't many system variables.
+Here's a list of them:
+
+SYSVARS   FUTURE USES          +3c       BLK(*
+02       CURRENT              +3e       A@*
+04       HERE                 +40       A!*
+06       C<?                  +42       FUTURE USES
+08       C<* override         +51       CURRENTPTR
+0a       NLPTR                +53       (emit) override
+0c       C<*                  +55       (key) override
+0e       WORDBUF              +57       FUTURE USES
+2e       BOOT C< PTR
+30       IN>
+32       IN(*                 +70       DRIVERS
+34       BLK@*                +80       RAMEND
+36       BLK!*
+38       BLK>
+3a       BLKDTY
+
+CURRENT points to the last dict entry.
+
+HERE points to current write offset.
+
+IP is the Interpreter Pointer
+
+PARSEPTR holds routine address called on (parse)
+
+C<* holds routine address called on C<. If the C<* override
+at 0x08 is nonzero, this routine is called instead.
+
+IN> is the current position in IN(, which is the input buffer.
+
+IN(* is a pointer to the input buffer, allocated at runtime.
+
+CURRENTPTR points to current CURRENT. The Forth CURRENT word
+doesn't return RAM+2 directly, but rather the value at this
+address. Most of the time, it points to RAM+2, but sometimes,
+when maintaining alternative dicts (during cross compilation
+for example), it can point elsewhere.
+
+NLPTR points to an alternative routine for NL (by default,
+CRLF).
+
+BLK* see B416.
+
+FUTURE USES section is unused for now.          
+
+DRIVERS section is reserved for recipe-specific drivers.
+
+# Initialization sequence
+
+(this describes the z80 boot sequence, but other arches have
+a very similar sequence, and, of course, once we enter Forth
+territory, identical)
+
+On boot, we jump to the "main" routine in B289 which does
+very few things.
+
+1. Set SP to PS_ADDR and IX to RS_ADDR
+2. Sets HERE to SYSVARS+0x80.
+3. Sets CURRENT to value of LATEST field in stable ABI.
+4. Execute the word referred to by 0x04 (BOOT) in stable ABI.
+
+In a normal system, BOOT is in core words at B396 and does a
+few things:
+
+1. Initialize all overrides to 0.
+2. Write LATEST in BOOT C< PTR ( see below )
+3. Set "C<*", the word that C< calls to (boot<).
+4. Call INTERPRET which interprets boot source code until
+   ASCII EOT (4) is met. This usually init drivers.
+5. Initialize rdln buffer, _sys entry (for EMPTY), prints
+   "CollapseOS" and then calls (main).
+6. (main) interprets from rdln input (usually from KEY) until
+   EOT is met, then calls BYE.
+
+In RAM-only environment, we will typically have a
+"CURRENT @ HERE !" line during init to have HERE begin at the
+end of the binary instead of RAMEND.
+
+# Stable ABI
+
+Across all architectures, some offset are referred to by off-
+sets that don't change (well, not without some binary manipu-
+lation). Here's the complete list of these references:
+
+04 BOOT addr         06 (uflw) addr      08 LATEST
+13 (oflw) addr       2b (s) wordref      33 2>R wordref
+42 EXIT wordref      53 (br) wordref     67 (?br) wordref
+80 (loop) wordref    bf (n) wordref
+
+BOOT, (uflw) and (oflw) exist because they are referred to
+before those words are defined (in core words). LATEST is a
+critical part of the initialization sequence.
+
+Stable wordrefs are there for more complicated reasons. When
+cross-compiling Collapse OS, we use immediate words from the
+host and some of them compile wordrefs (IF compiles (?br),
+LOOP compiles (loop), etc.). These compiled wordref need to
+be stable across binaries, so they're part of the stable ABI.
+
+Another layer of complexity is the fact that some binaries
+don't begin at offset 0. In that case, the stable ABI doesn't
+begin at 0 either. The EXECUTE word has a special handling of
+those case where any wordref < 0x100 has the binary offset
+applied to it.
+
+But that's not the end of our problems. If an offsetted binary
+cross compiles a binary with a different offset, stable ABI
+references will be > 0x100 and be broken.
+
+For this reason, any stable wordref compiled in the "hot zone"
+(B397-B400) has to be compiled by direct offset reference to
+avoid having any binary offset applied to it.