diff --git a/blk/001 b/blk/001 index cb6f6c4..b516af9 100644 --- a/blk/001 +++ b/blk/001 @@ -1,7 +1,6 @@ MASTER INDEX - 30 Dictionary - 70 Implementation notes 100 Block editor + 30 Dictionary 100 Block editor 120 Visual Editor 150 Extra words 200 Z80 assembler 260 Cross compilation 280 Z80 boot code 350 Core words diff --git a/blk/070 b/blk/070 deleted file mode 100644 index 2fb7318..0000000 --- a/blk/070 +++ /dev/null @@ -1,6 +0,0 @@ -Implementation notes - -71 Execution model 73 Executing a word -75 Stack management 77 Dictionary -80 System variables 85 Word types -89 Initialization sequence 91 Stable ABI diff --git a/blk/071 b/blk/071 deleted file mode 100644 index 6bd9150..0000000 --- a/blk/071 +++ /dev/null @@ -1,11 +0,0 @@ -EXECUTION MODEL - -After having read a line through readln, we want to interpret -it. As a general rule, we go like this: - -1. read single word from line -2. Can we find the word in dict? -3. If yes, execute that word, goto 1 -4. Is it a number? -5. If yes, push that number to PS, goto 1 -6. Error: undefined word. diff --git a/blk/073 b/blk/073 deleted file mode 100644 index b10e355..0000000 --- a/blk/073 +++ /dev/null @@ -1,16 +0,0 @@ -EXECUTING A WORD - -At it's core, executing a word is pushing the wordref on PS and -calling EXECUTE. Then, we let the word do its things. Some -words are special, but most of them are of the compiledWord -type, and that's their execution that we describe here. - -First of all, at all time during execution, the Interpreter -Pointer (IP) points to the wordref we're executing next. - -When we execute a compiledWord, the first thing we do is push -IP to the Return Stack (RS). Therefore, RS' top of stack will -contain a wordref to execute next, after we EXIT. - -At the end of every compiledWord is an EXIT. This pops RS, sets -IP to it, and continues. diff --git a/blk/075 b/blk/075 deleted file mode 100644 index 1ab2441..0000000 --- a/blk/075 +++ /dev/null @@ -1,16 +0,0 @@ -Stack management - -The Parameter stack (PS) is maintained by SP and the Return -stack (RS) is maintained by IX. This allows us to generally use -push and pop freely because PS is the most frequently used. -However, this causes a problem with routine calls: because in -Forth, the stack isn't balanced within each call, our return -offset, when placed by a CALL, messes everything up. This is -one of the reasons why we need stack management routines below. -IX always points to RS' Top Of Stack (TOS) - -This return stack contain "Interpreter pointers", that is a -pointer to the address of a word, as seen in a compiled list of -words. - - (cont.) diff --git a/blk/076 b/blk/076 deleted file mode 100644 index f6d0098..0000000 --- a/blk/076 +++ /dev/null @@ -1,11 +0,0 @@ -Stack underflow and overflow: In each native word involving -PSP popping, we check whether the stack is big enough. If it's -not we go in "uflw" (underflow) error condition, then abort. - -We don't check RSP for underflow because the cost of the check -is significant and its usefulness is dubious: if RSP isn't -tightly in control, we're screwed anyways, and that, well -before we reach underflow. - -Overflow condition happen when RSP and PSP meet somewhere in -the middle. That check is made at each "next" call. diff --git a/blk/077 b/blk/077 deleted file mode 100644 index 17e7a12..0000000 --- a/blk/077 +++ /dev/null @@ -1,16 +0,0 @@ -Dictionary - -A dictionary entry has this structure: - -- Xb name. Arbitrary long number of character (but can't be - bigger than input buffer, of course). not null-terminated -- 2b prev offset -- 1b size + IMMEDIATE flag -- 1b code pointer (always jumps in the <0x100 range) -- Parameter field (PF) - -The prev offset is the number of bytes between the prev field -and the previous word's code pointer. - -The size + flag indicate the size of the name field, with the -7th bit being the IMMEDIATE flag. (cont.) diff --git a/blk/078 b/blk/078 deleted file mode 100644 index 5b0f02c..0000000 --- a/blk/078 +++ /dev/null @@ -1,12 +0,0 @@ -(cont.) The code pointer point to "word routines". These -routines expect to be called with IY pointing to the PF. They -themselves are expected to end by jumping to the address at -(IP). They will usually do so with "jp next". They are 1b -because all those routines live in the first 0x100 bytes of -the boot binary. The 0 MSB is assumed. - -That's for "regular" words (words that are part of the dict -chain). There are also "special words", for example NUMBER, -LIT, FBR, that have a slightly different structure. They're -also a pointer to an executable, but as for the other fields, -the only one they have is the "flags" field. diff --git a/blk/080 b/blk/080 deleted file mode 100644 index d582ca7..0000000 --- a/blk/080 +++ /dev/null @@ -1,16 +0,0 @@ -System variables - -There are some core variables in the core system that are -referred to directly by their address in memory throughout the -code. The place where they live is configurable by the SYSVARS -constant in xcomp unit, but their relative offset is not. In -fact, they're mostly referred to directly as their numerical -offset along with a comment indicating what this offset refers -to. - -This system is a bit fragile because every time we change those -offsets, we have to be careful to adjust all system variables -offsets, but thankfully, there aren't many system variables. -Here's a list of them: - - (cont.) diff --git a/blk/081 b/blk/081 deleted file mode 100644 index 9daccf9..0000000 --- a/blk/081 +++ /dev/null @@ -1,16 +0,0 @@ -SYSVARS FUTURE USES +3c BLK(* -+02 CURRENT +3e A@* -+04 HERE +40 A!* -+06 C -+32 IN(* +70 DRIVERS -+34 BLK@* +80 RAMEND -+36 BLK!* -+38 BLK> -+3a BLKDTY - (cont.) diff --git a/blk/082 b/blk/082 deleted file mode 100644 index e02d3f3..0000000 --- a/blk/082 +++ /dev/null @@ -1,16 +0,0 @@ -CURRENT points to the last dict entry. - -HERE points to current write offset. - -IP is the Interpreter Pointer - -PARSEPTR holds routine address called on (parse) - -C<* holds routine address called on C<. If the C<* override -at 0x08 is nonzero, this routine is called instead. - -IN> is the current position in IN(, which is the input buffer. - -IN(* is a pointer to the input buffer, allocated at runtime. - - (cont.) diff --git a/blk/083 b/blk/083 deleted file mode 100644 index a45d747..0000000 --- a/blk/083 +++ /dev/null @@ -1,16 +0,0 @@ -C. This word is created by "DOES>" and is followed -by a 2-byte value as well as the address where "DOES>" was -compiled. At that address is an atom list exactly like in a -compiled word. Upon execution, after having pushed its cell -addr to PSP, it executes its reference exactly like a -compiled word. diff --git a/blk/089 b/blk/089 deleted file mode 100644 index 16670c5..0000000 --- a/blk/089 +++ /dev/null @@ -1,16 +0,0 @@ -Initialization sequence - -On boot, we jump to the "main" routine in B289 which does -very few things. - -1. Set SP to PS_ADDR and IX to RS_ADDR -2. Sets HERE to SYSVARS+0x80. -3. Sets CURRENT to value of LATEST field in stable ABI. -4. Execute the word referred to by 0x04 (BOOT) in stable ABI. - -In a normal system, BOOT is in core words at B396 and does a -few things: - -1. Initialize all overrides to 0. -2. Write LATEST in BOOT C< PTR ( see below ) -3. Set "C<*", the word that C< calls to (boot<). (cont.) diff --git a/blk/090 b/blk/090 deleted file mode 100644 index 156d5a5..0000000 --- a/blk/090 +++ /dev/null @@ -1,10 +0,0 @@ -4. Call INTERPRET which interprets boot source code until - ASCII EOT (4) is met. This usually init drivers. -5. Initialize rdln buffer, _sys entry (for EMPTY), prints - "CollapseOS" and then calls (main). -6. (main) interprets from rdln input (usually from KEY) until - EOT is met, then calls BYE. - -In RAM-only environment, we will typically have a -"CURRENT @ HERE !" line during init to have HERE begin at the -end of the binary instead of RAMEND. diff --git a/blk/091 b/blk/091 deleted file mode 100644 index dc6eb36..0000000 --- a/blk/091 +++ /dev/null @@ -1,16 +0,0 @@ -Stable ABI - -Across all architectures, some offset are referred to by off- -sets that don't change (well, not without some binary manipu- -lation). Here's the complete list of these references: - -04 BOOT addr 06 (uflw) addr 08 LATEST -13 (oflw) addr 2b (s) wordref 33 2>R wordref -42 EXIT wordref 53 (br) wordref 67 (?br) wordref -80 (loop) wordref bf (n) wordref - -BOOT, (uflw) and (oflw) exist because they are referred to -before those words are defined (in core words). LATEST is a -critical part of the initialization sequence. - - (cont.) diff --git a/blk/092 b/blk/092 deleted file mode 100644 index 045008c..0000000 --- a/blk/092 +++ /dev/null @@ -1,16 +0,0 @@ -Stable wordrefs are there for more complicated reasons. When -cross-compiling Collapse OS, we use immediate words from the -host and some of them compile wordrefs (IF compiles (?br), -LOOP compiles (loop), etc.). These compiled wordref need to -be stable across binaries, so they're part of the stable ABI. - -Another layer of complexity is the fact that some binaries -don't begin at offset 0. In that case, the stable ABI doesn't -begin at 0 either. The EXECUTE word has a special handling of -those case where any wordref < 0x100 has the binary offset -applied to it. - -But that's not the end of our problems. If an offsetted binary -cross compiles a binary with a different offset, stable ABI -references will be > 0x100 and be broken. - (cont.) diff --git a/blk/093 b/blk/093 deleted file mode 100644 index 1ff427d..0000000 --- a/blk/093 +++ /dev/null @@ -1,3 +0,0 @@ -For this reason, any stable wordref compiled in the "hot zone" -(B397-B400) has to be compiled by direct offset reference to -avoid having any binary offset applied to it. diff --git a/doc/impl.txt b/doc/impl.txt new file mode 100644 index 0000000..4826d4d --- /dev/null +++ b/doc/impl.txt @@ -0,0 +1,224 @@ +# Implementation notes + +# Execution model + +After having read a line through readln, we want to interpret +it. As a general rule, we go like this: + +1. read single word from line +2. Can we find the word in dict? +3. If yes, execute that word, goto 1 +4. Is it a number? +5. If yes, push that number to PS, goto 1 +6. Error: undefined word. + +# Executing a word + +At it's core, executing a word is pushing the wordref on PS and +calling EXECUTE. Then, we let the word do its things. Some +words are special, but most of them are of the "compiled" +type (regular nonnative word), and that's their execution that +we describe here. + +First of all, at all time during execution, the Interpreter +Pointer (IP) points to the wordref we're executing next. + +When we execute a compiled word, the first thing we do is push +IP to the Return Stack (RS). Therefore, RS' top of stack will +contain a wordref to execute next, after we EXIT. + +At the end of every compiled word is an EXIT. This pops RS, sets +IP to it, and continues. + +# Stack management + +In all supported arches, The Parameter Stack and Return Stack +tops are trackes by a registered assigned to this purpose. For +example, in z80, it's SP and IX that do that. The value in those +registers are referred to as PS Pointer (PSP) and RS Pointer +(RSP). + +Those stacks are contiguous and grow in opposite directions. PS +grows "down", RS grows "up". + +Stack underflow and overflow: In each native word involving +PS popping, we check whether the stack is big enough. If it's +not we go in "uflw" (underflow) error condition, then abort. + +We don't check RS for underflow because the cost of the check +is significant and its usefulness is dubious: if RS isn't +tightly in control, we're screwed anyways, and that, well +before we reach underflow. + +Overflow condition happen when RSP and PSP meet somewhere in +the middle. That check is made at each "next" call. + +# Dictionary entry + +A dictionary entry has this structure: + +- Xb name. Arbitrary long number of character (but can't be + bigger than input buffer, of course). not null-terminated +- 2b prev offset +- 1b name size + IMMEDIATE flag (7th bit) +- 1b entry type +- Parameter field (PF) + +The prev offset is the number of bytes between the prev field +and the previous word's code pointer. + +The size + flag indicate the size of the name field, with the +7th bit being the IMMEDIATE flag. + +The entry type is simply a number corresponding to a type which +will determine how the word will be executed. See "Word types" +below. + +# Word types + +There are 4 word types in Collapse OS. Whenever you have a +wordref, it's pointing to a byte with numbers 0 to 3. This +number is the word type and the word's behavior depends on it. + +0: native. This words PFA contains native binary code and is +jumped to directly. + +1: compiled. This word's PFA contains an atom list and its +execution is described in "Execution model" above. + +2: cell. This word is usually followed by a 2-byte value in its +PFA. Upon execution, the address of the PFA is pushed to PS. + +3: DOES>. This word is created by "DOES>" and is followed +by a 2-byte value as well as the address where "DOES>" was +compiled. At that address is an atom list exactly like in a +compiled word. Upon execution, after having pushed its cell +addr to PSP, it executes its reference exactly like a +compiled word. + +# System variables + +There are some core variables in the core system that are +referred to directly by their address in memory throughout the +code. The place where they live is configurable by the SYSVARS +constant in xcomp unit, but their relative offset is not. In +fact, they're mostly referred to directly as their numerical +offset along with a comment indicating what this offset refers +to. + +This system is a bit fragile because every time we change those +offsets, we have to be careful to adjust all system variables +offsets, but thankfully, there aren't many system variables. +Here's a list of them: + +SYSVARS FUTURE USES +3c BLK(* ++02 CURRENT +3e A@* ++04 HERE +40 A!* ++06 C ++32 IN(* +70 DRIVERS ++34 BLK@* +80 RAMEND ++36 BLK!* ++38 BLK> ++3a BLKDTY + +CURRENT points to the last dict entry. + +HERE points to current write offset. + +IP is the Interpreter Pointer + +PARSEPTR holds routine address called on (parse) + +C<* holds routine address called on C<. If the C<* override +at 0x08 is nonzero, this routine is called instead. + +IN> is the current position in IN(, which is the input buffer. + +IN(* is a pointer to the input buffer, allocated at runtime. + +CURRENTPTR points to current CURRENT. The Forth CURRENT word +doesn't return RAM+2 directly, but rather the value at this +address. Most of the time, it points to RAM+2, but sometimes, +when maintaining alternative dicts (during cross compilation +for example), it can point elsewhere. + +NLPTR points to an alternative routine for NL (by default, +CRLF). + +BLK* see B416. + +FUTURE USES section is unused for now. + +DRIVERS section is reserved for recipe-specific drivers. + +# Initialization sequence + +(this describes the z80 boot sequence, but other arches have +a very similar sequence, and, of course, once we enter Forth +territory, identical) + +On boot, we jump to the "main" routine in B289 which does +very few things. + +1. Set SP to PS_ADDR and IX to RS_ADDR +2. Sets HERE to SYSVARS+0x80. +3. Sets CURRENT to value of LATEST field in stable ABI. +4. Execute the word referred to by 0x04 (BOOT) in stable ABI. + +In a normal system, BOOT is in core words at B396 and does a +few things: + +1. Initialize all overrides to 0. +2. Write LATEST in BOOT C< PTR ( see below ) +3. Set "C<*", the word that C< calls to (boot<). +4. Call INTERPRET which interprets boot source code until + ASCII EOT (4) is met. This usually init drivers. +5. Initialize rdln buffer, _sys entry (for EMPTY), prints + "CollapseOS" and then calls (main). +6. (main) interprets from rdln input (usually from KEY) until + EOT is met, then calls BYE. + +In RAM-only environment, we will typically have a +"CURRENT @ HERE !" line during init to have HERE begin at the +end of the binary instead of RAMEND. + +# Stable ABI + +Across all architectures, some offset are referred to by off- +sets that don't change (well, not without some binary manipu- +lation). Here's the complete list of these references: + +04 BOOT addr 06 (uflw) addr 08 LATEST +13 (oflw) addr 2b (s) wordref 33 2>R wordref +42 EXIT wordref 53 (br) wordref 67 (?br) wordref +80 (loop) wordref bf (n) wordref + +BOOT, (uflw) and (oflw) exist because they are referred to +before those words are defined (in core words). LATEST is a +critical part of the initialization sequence. + +Stable wordrefs are there for more complicated reasons. When +cross-compiling Collapse OS, we use immediate words from the +host and some of them compile wordrefs (IF compiles (?br), +LOOP compiles (loop), etc.). These compiled wordref need to +be stable across binaries, so they're part of the stable ABI. + +Another layer of complexity is the fact that some binaries +don't begin at offset 0. In that case, the stable ABI doesn't +begin at 0 either. The EXECUTE word has a special handling of +those case where any wordref < 0x100 has the binary offset +applied to it. + +But that's not the end of our problems. If an offsetted binary +cross compiles a binary with a different offset, stable ABI +references will be > 0x100 and be broken. + +For this reason, any stable wordref compiled in the "hot zone" +(B397-B400) has to be compiled by direct offset reference to +avoid having any binary offset applied to it.