252 lines
28 KiB
C
252 lines
28 KiB
C
/*
|
|
Copyright (c) 2023 : Ognjen 'xolatile' Milan Robovic
|
|
|
|
Xhartae is free software! You will redistribute it or modify it under the terms of the GNU General Public License by Free Software Foundation.
|
|
And when you do redistribute it or modify it, it will use either version 3 of the License, or (at yours truly opinion) any later version.
|
|
It is distributed in the hope that it will be useful or harmful, it really depends... But no warranty what so ever, seriously. See GNU/GPLv3.
|
|
*/
|
|
|
|
#ifndef CHAPTER_1_HEADER
|
|
#define CHAPTER_1_HEADER
|
|
|
|
#include "chapter_0.h" // We need this header file for some core enumeration definitions and function declarations.
|
|
|
|
/*
|
|
Now that you've read the chapter zero, we should get into some technical details about C programming language. This will be the most imporant chapter for people who know some
|
|
other lower level programming language, or who intuitively understand "building blocks" of some system. Below is just a simple matrix of them, 8 x 4, so you can see that there are
|
|
really 32 keywords (ANSI C standard, I dislike newer standards), and even more below we'll categorize them. Also, keep in mind that I'll briefly talk about other C standards such
|
|
as K&R, C99, etc., and use parts of them in some places, so you don't get confused when you see them in other peoples' source code.
|
|
|
|
- Keywords:
|
|
|
|
static signed if typedef
|
|
extern unsigned else enum
|
|
const float for union
|
|
auto double do struct
|
|
register char while return
|
|
volatile int switch goto
|
|
sizeof short case break
|
|
void long default continue
|
|
|
|
Keywords that you should never use, under any circumstances (except for writing your own C compiler or working in embedded) are:
|
|
|
|
- register: Hint to the compiler that some variable will be used a lot, so it can be stored in CPU register, modern compilers ignore this keyword.
|
|
- volatile: Hint to the compiler that some variable may be changed by some external source. Such a cool keyword, but never use it.
|
|
- auto: It specifies automatic storage duration for local variables inside a block. Simply, variable won't exist outside {} block it's defined in.
|
|
- signed: We'll talk about types in C more, but this is always assumed, so use only 'unsigned', when you really need to.
|
|
- union: You'll have very little excuse to use unions, because they often don't go well with type checking and introduce complexity for no reason.
|
|
- goto: I'm kidding, feel free to use 'goto' statement, it's not harmful, some people just abused it to the point of it being bullied.
|
|
|
|
Keywords that you should really consider not to use in general, but only for some very specific cases (compiler warnings or using some libraries):
|
|
|
|
- const: Constant qualifier sometimes tells to the compiler that a value won't be changed. Use it only to silence the compiler warnings...
|
|
- unsigned: Again, you shouldn't care if your variable is signed or unsigned, because C really likes integers, and hates naturals. We'll talk more about it.
|
|
- float: Unless you're making a game engine with GPU acceleration or neural network, I wouldn't use this type at all due to it's limitations.
|
|
- double: Well, same as 'float', but twice bigger and more precise floating point values. You'll see it being used very rarely.
|
|
- short: Use it only to silence warnings, especially those from XCB and Xlib libraries, for whatever reason (read: X11) they use million different types.
|
|
- long: Also use this only to silence warnings, because some standard library functions use this type, pure C-hating cancer...
|
|
- typedef: I don't like it at all, but this one is my personal preference, because it just introduces unneeded mental overhead when programming.
|
|
- do: It can only be used before '{', then you do the loop thing, then '}' and then 'while' statement. I prefer just to use 'for' everywhere.
|
|
|
|
Okay, now we're left with following actually useful and C-loving keywords, 18 of them, that I'll cover more:
|
|
|
|
- char: Type for storing ASCII characters, 8 bits wide, implicitly casted to 'int' in sometimes, arrays that use them are string literals.
|
|
- int: Type for storing signed numerical values, 32 bits wide, most number literals are casted to this type, compiler will warn you for mixing types.
|
|
- void: This is a black hole, nothing can escape it.
|
|
- sizeof: Some people say this behaves like a function, but they are wrong, it's a statement. It literally does what is says, more on that later...
|
|
- static: This qualifier has several meanings, that function or variable will only be used in that file or when inside a function, that variables persists.
|
|
- extern: This qualifier is more simple, and implicit in older compilers, newer ones warn when it's not used. Something will be used outside that file.
|
|
- if: If some condition(s) are met (equal to true or non-zero value), do the thing specified in the block.
|
|
- else: With 'else' you can "bind" multiple 'if' statements, but know that you can completely avoid it, and get the same results.
|
|
- for: Very complex kind of loop statement, that leads to segmentation faults and many other memory and safety related bugs. I love to use it.
|
|
- while: Very simple king of loop statement, that leads to segmentation faults and many other memory and safety related bugs. I don't love to use it.
|
|
- switch: This statement is the most powerful in the whole C language, it's not just if-else made pretty, we'll see examples later.
|
|
- case: Used only inside 'switch' statement, it sets some number literal as an implicit label, so when the expression in 'switch' is equals it, it jumps there.
|
|
- default: Used only because C didn't force curly brackets everywhere back when it was made, but every 'switch' should have just one 'default'.
|
|
- enum: Sometimes treated as type (with 'typedef'), and always very good way to define a lot of constants, we'll talk about them more.
|
|
- struct: I really advice against using structures, but most libraries use them, so you need to know how they work. In C, they are very weak part of the language.
|
|
- return: Used in functions to return from them with or without a result (return value). It's best when function only uses it once.
|
|
- break: When inside a loop, this statement will exit the loop. In newer standards it can take simple arguments, but we'll see why that's bad.
|
|
- continue: Even more specific case, when inside a loop, this statement will skip to the end of the loop, and then continue again.
|
|
|
|
Now, of those 18 actually useful C keywords, I like to avoid 'struct', 'switch', 'case', 'default', 'while', and use (functional) 'static' and 'extern' only in order to silence
|
|
compiler warnings, 'static' inside a function is more useful. That leaves us (me) with 12 C keywords that I love, out of complete 32 keywords in ANSI C standard. However, we'll
|
|
see that sometimes is preferable to use switch statement somewhere, or while loop when for loop feels like overkill. In most real-world cases, you'll need to use some API or
|
|
library that internally used structures everywhere, so you'll need to adapt to it, we'll see examples later...
|
|
|
|
So, real men need these keywords { char, int, void, sizeof, static, if, else, for, enum, return }, and use the rest of them in order to silence compiler warnings, use some
|
|
standard library functions, clean the source code or access an API / library / header file. Lets see some more examples and keep in mind code formatting...
|
|
|
|
@C
|
|
extern int this_is_kind (kind_type this);
|
|
// extern - we declare this function as external one, because we'll create an object file (.o), and link it with other programs (or object files).
|
|
// int - we set 'int', integer as return type of this function, and use 0 as false, and 1 as true, instead of 'bool' type from <stdbool.h>.
|
|
// this_is_kind - we named our function like this, you'll see more of verbose naming when I write programs, because I think it's better...
|
|
// kind_type - we choose the type of arguments in our function, since our function deals with 'kind', we choose proper 'kind_type'.
|
|
// this - we named our argument about what's it supposed to be, use similar approach in return type, function, argument type and argument names.
|
|
@
|
|
|
|
Before continuing, lets describe some imporant types in C very simply:
|
|
|
|
Word: Bytes: Bits: Kind: Minimum: Maximum:
|
|
void / / Black hole / /
|
|
void * 8 64 Memory address / /
|
|
char 1 8 Integer -128 127
|
|
short 2 16 Integer -32768 32767
|
|
int 4 32 Integer -2147483648 2147483647
|
|
long 8 64 Integer -9223372036854775808 9223372036854775807
|
|
unsigned char 1 8 Natural 0 256
|
|
unsigned short 2 16 Natural 0 65535
|
|
unsigned int 4 32 Natural 0 4294967295
|
|
unsigned long 8 64 Natural 0 18446744073709551615
|
|
float 4 32 Real (IEEE754) / /
|
|
double 8 64 Real (IEEE754) / /
|
|
|
|
Note that you shouldn't care for now about 'void' and 'void *', because they're special cases, nor about their minimum and maximum. Also, the less types you use, the more
|
|
type-safe your code is. That's the reason why I use 'int', 'char *' and 'char', and sometimes 'void *' and 'int *'. You get less compiler and linter warnings about conversions,
|
|
but you need to know exactly what you're doing and what your code will do in order to have no bugs. Again, think twice, write once.
|
|
*/
|
|
|
|
enum { // We won't even cover all of those, this is just an example of how to do similar task without hardcoding file extensions.
|
|
FILE_TYPE_TEXT, FILE_TYPE_COMMON_ASSEMBLY, FILE_TYPE_FLAT_ASSEMBLY, FILE_TYPE_GNU_ASSEMBLY,
|
|
FILE_TYPE_NETWIDE_ASSEMBLY, FILE_TYPE_YET_ANOTHER_ASSEMBLY, FILE_TYPE_C_SOURCE, FILE_TYPE_C_HEADER,
|
|
FILE_TYPE_ADA_BODY, FILE_TYPE_ADA_SPECIFICATION, FILE_TYPE_CPP_SOURCE, FILE_TYPE_CPP_HEADER,
|
|
FILE_TYPE_COUNT
|
|
};
|
|
|
|
/*
|
|
Here are some "utility" functions that we'll maybe use, reimplementation of those from standard library header file <ctype.h>. I prefer names like this, and this is for learning
|
|
purposes, so it's nice that you can see it here instead of searching through folders such as "/usr/include/". Lets talk about ASCII table now!
|
|
|
|
In functions 'character_is_uppercase', 'character_is_lowercase' and 'character_is_digit', we use characters that are in certain range on ASCII table, which we'll show just below.
|
|
So, it's safe to use '>=' and '<=' operators, but in other cases, we want to compare them selectively, and for simplicity we use function 'character_compare_array'... Here's how
|
|
ASCII table looks like, I don't like encodings like UTF-8 and others, so neither should you. We'll also write a subprogram that prints this to terminal or graphical window.
|
|
|
|
ASCII table:
|
|
_______________________________________________________________________________________________________________________________________________________________
|
|
|_0B______|_0O__|_0D__|_0X_|_SYM_|_Full_name____________________________________|_0B______|_0O__|_0D__|_0X_|_SYM_|_Full_name____________________________________|
|
|
| | |
|
|
| 0000000 | 000 | 0 | 00 | NUL | Null | 0000001 | 001 | 1 | 01 | SOH | Start of heading |
|
|
| 0000010 | 002 | 2 | 02 | STX | Start of text | 0000011 | 003 | 3 | 03 | ETX | End of text |
|
|
| 0000100 | 004 | 4 | 04 | EOT | End of transmission | 0000101 | 005 | 5 | 05 | ENQ | Enquiry |
|
|
| 0000110 | 006 | 6 | 06 | ACK | Acknowledge | 0000111 | 007 | 7 | 07 | BEL | Bell |
|
|
| 0001000 | 010 | 8 | 08 | BS | Backspace | 0001001 | 011 | 9 | 09 | HT | Horizontal tab |
|
|
| 0001010 | 012 | 10 | 0A | LF | Line feed | 0001011 | 013 | 11 | 0B | VT | Vertical tab |
|
|
| 0001100 | 014 | 12 | 0C | FF | Form feed | 0001101 | 015 | 13 | 0D | CR | Carriage return |
|
|
| 0001110 | 016 | 14 | 0E | SO | Shift out | 0001111 | 017 | 15 | 0F | SI | Shift in |
|
|
| 0010000 | 020 | 16 | 10 | DLE | Data link escape | 0010001 | 021 | 17 | 11 | DC1 | Device control 1 |
|
|
| 0010010 | 022 | 18 | 12 | DC2 | Device control 2 | 0010011 | 023 | 19 | 13 | DC3 | Device control 3 |
|
|
| 0010100 | 024 | 20 | 14 | DC4 | Device control 4 | 0010101 | 025 | 21 | 15 | NAK | Negative acknowledge |
|
|
| 0010110 | 026 | 22 | 16 | SYN | Synchronous idle | 0010111 | 027 | 23 | 17 | ETB | End transmission block |
|
|
| 0011000 | 030 | 24 | 18 | CAN | Cancel | 0011001 | 031 | 25 | 19 | EM | End of medium |
|
|
| 0011010 | 032 | 26 | 1A | SUB | Substitute | 0011011 | 033 | 27 | 1B | ESC | Escape |
|
|
| 0011100 | 034 | 28 | 1C | FS | File separator | 0011101 | 035 | 29 | 1D | GS | Group separator |
|
|
| 0011110 | 036 | 30 | 1E | RS | Record separator | 0011111 | 037 | 31 | 1F | US | Unit separator |
|
|
| 0100000 | 040 | 32 | 20 | | Space | 0100001 | 041 | 33 | 21 | ! | Exclamation mark |
|
|
| 0100010 | 042 | 34 | 22 | " | Speech mark | 0100011 | 043 | 35 | 23 | # | Number sign |
|
|
| 0100100 | 044 | 36 | 24 | $ | Dollar sign | 0100101 | 045 | 37 | 25 | % | Percent |
|
|
| 0100110 | 046 | 38 | 26 | & | Ampersand | 0100111 | 047 | 39 | 27 | ' | Quote |
|
|
| 0101000 | 050 | 40 | 28 | ( | Open parenthesis | 0101001 | 051 | 41 | 29 | ) | Close parenthesis |
|
|
| 0101010 | 052 | 42 | 2A | * | Asterisk | 0101011 | 053 | 43 | 2B | + | Plus |
|
|
| 0101100 | 054 | 44 | 2C | , | Comma | 0101101 | 055 | 45 | 2D | - | Minus |
|
|
| 0101110 | 056 | 46 | 2E | . | Period | 0101111 | 057 | 47 | 2F | / | Slash |
|
|
| 0110000 | 060 | 48 | 30 | 0 | Zero | 0110001 | 061 | 49 | 31 | 1 | One |
|
|
| 0110010 | 062 | 50 | 32 | 2 | Two | 0110011 | 063 | 51 | 33 | 3 | Three |
|
|
| 0110100 | 064 | 52 | 34 | 4 | Four | 0110101 | 065 | 53 | 35 | 5 | Five |
|
|
| 0110110 | 066 | 54 | 36 | 6 | Six | 0110111 | 067 | 55 | 37 | 7 | Seven |
|
|
| 0111000 | 070 | 56 | 38 | 8 | Eight | 0111001 | 071 | 57 | 39 | 9 | Nine |
|
|
| 0111010 | 072 | 58 | 3A | : | Colon | 0111011 | 073 | 59 | 3B | ; | Semicolon |
|
|
| 0111100 | 074 | 60 | 3C | < | Open angled bracket | 0111101 | 075 | 61 | 3D | = | Equal |
|
|
| 0111110 | 076 | 62 | 3E | > | Close angled bracket | 0111111 | 077 | 63 | 3F | ? | Question mark |
|
|
| 1000000 | 100 | 64 | 40 | @ | At sign | 1000001 | 101 | 65 | 41 | A | Uppercase A |
|
|
| 1000010 | 102 | 66 | 42 | B | Uppercase B | 1000011 | 103 | 67 | 43 | C | Uppercase C |
|
|
| 1000100 | 104 | 68 | 44 | D | Uppercase D | 1000101 | 105 | 69 | 45 | E | Uppercase E |
|
|
| 1000110 | 106 | 70 | 46 | F | Uppercase F | 1000111 | 107 | 71 | 47 | G | Uppercase G |
|
|
| 1001000 | 110 | 72 | 48 | H | Uppercase H | 1001001 | 111 | 73 | 49 | I | Uppercase I |
|
|
| 1001010 | 112 | 74 | 4A | J | Uppercase J | 1001011 | 113 | 75 | 4B | K | Uppercase K |
|
|
| 1001100 | 114 | 76 | 4C | L | Uppercase L | 1001101 | 115 | 77 | 4D | M | Uppercase M |
|
|
| 1001110 | 116 | 78 | 4E | N | Uppercase N | 1001111 | 117 | 79 | 4F | O | Uppercase O |
|
|
| 1010000 | 120 | 80 | 50 | P | Uppercase P | 1010001 | 121 | 81 | 51 | Q | Uppercase Q |
|
|
| 1010010 | 122 | 82 | 52 | R | Uppercase R | 1010011 | 123 | 83 | 53 | S | Uppercase S |
|
|
| 1010100 | 124 | 84 | 54 | T | Uppercase T | 1010101 | 125 | 85 | 55 | U | Uppercase U |
|
|
| 1010110 | 126 | 86 | 56 | V | Uppercase V | 1010111 | 127 | 87 | 57 | W | Uppercase W |
|
|
| 1011000 | 130 | 88 | 58 | X | Uppercase X | 1011001 | 131 | 89 | 59 | Y | Uppercase Y |
|
|
| 1011010 | 132 | 90 | 5A | Z | Uppercase Z | 1011011 | 133 | 91 | 5B | [ | Opening bracket |
|
|
| 1011100 | 134 | 92 | 5C | \ | Backslash | 1011101 | 135 | 93 | 5D | ] | Closing bracket |
|
|
| 1011110 | 136 | 94 | 5E | ^ | Caret | 1011111 | 137 | 95 | 5F | _ | Underscore |
|
|
| 1100000 | 140 | 96 | 60 | ` | Grave | 1100001 | 141 | 97 | 61 | a | Lowercase a |
|
|
| 1100010 | 142 | 98 | 62 | b | Lowercase b | 1100011 | 143 | 99 | 63 | c | Lowercase c |
|
|
| 1100100 | 144 | 100 | 64 | d | Lowercase d | 1100101 | 145 | 101 | 65 | e | Lowercase e |
|
|
| 1100110 | 146 | 102 | 66 | f | Lowercase f | 1100111 | 147 | 103 | 67 | g | Lowercase g |
|
|
| 1101000 | 150 | 104 | 68 | h | Lowercase h | 1101001 | 151 | 105 | 69 | i | Lowercase i |
|
|
| 1101010 | 152 | 106 | 6A | j | Lowercase j | 1101011 | 153 | 107 | 6B | k | Lowercase k |
|
|
| 1101100 | 154 | 108 | 6C | l | Lowercase l | 1101101 | 155 | 109 | 6D | m | Lowercase m |
|
|
| 1101110 | 156 | 110 | 6E | n | Lowercase n | 1101111 | 157 | 111 | 6F | o | Lowercase o |
|
|
| 1110000 | 160 | 112 | 70 | p | Lowercase p | 1110001 | 161 | 113 | 71 | q | Lowercase q |
|
|
| 1110010 | 162 | 114 | 72 | r | Lowercase r | 1110011 | 163 | 115 | 73 | s | Lowercase s |
|
|
| 1110100 | 164 | 116 | 74 | t | Lowercase t | 1110101 | 165 | 117 | 75 | u | Lowercase u |
|
|
| 1110110 | 166 | 118 | 76 | v | Lowercase v | 1110111 | 167 | 119 | 77 | w | Lowercase w |
|
|
| 1111000 | 170 | 120 | 78 | x | Lowercase x | 1111001 | 171 | 121 | 79 | y | Lowercase y |
|
|
| 1111010 | 172 | 122 | 7A | z | Lowercase z | 1111011 | 173 | 123 | 7B | { | Opening brace |
|
|
| 1111100 | 174 | 124 | 7C | | | Vertical bar | 1111101 | 175 | 125 | 7D | } | Closing brace |
|
|
| 1111110 | 176 | 126 | 7E | ~ | Tilde | 1111111 | 177 | 127 | 7F | DEL | Delete |
|
|
|_______________________________________________________________________________|_______________________________________________________________________________|
|
|
|
|
You can see that values of 'A' ... 'Z', 'a' ... 'z' and '0' ... '9' are sequential, but symbols and "system" characters are mixed up.
|
|
|
|
In C language, we have C source files with the extension '.c', and C header files with the extension '.h'. Both of those are just plain text files, and please use 7-bit ASCII
|
|
encoding, since it's common sense, UTF is cancer, and 8-bit ASCII is for enlightened people like Terrence Andrew Davis. C language is completely separate (on some C compilers)
|
|
from its' preprocessor, whose directives start with '#' character, continue on '\' character and break on '\n' (read: LINE FEED) character.
|
|
|
|
@C
|
|
#include <path/to/file/file_name.h> // Copy the entire file from '/usr/include/' directory into this file, on the place where it was specified.
|
|
#include "path/to/file/file_name.h" // Copy the entire file from current directory into this file, again on the place where it was specified.
|
|
|
|
#define SOMETHING // This will add additional information to the preprocessor about this file, it's mostly used for flags and header-guards.
|
|
#undef SOMETHING // This will remove that additional information you've provided...
|
|
#if SOMETHING // If SOMETHING (condition obviously) is true, then code until '#elif', '#else' or '#endif' will be included.
|
|
#ifdef SOMETHING // If SOMETHING was previously '#define'-d, then code until '#elif', '#else' or '#endif' will be included.
|
|
#ifndef SOMETHING // If SOMETHING wasn't (NOT!) previously '#define'-d, then code until '#elif', '#else' or '#endif' will be included.
|
|
#elif // Essentially "else if" for preprocessor, it's very ugly, and nesting them looks bad and is a bad practice.
|
|
#else // Essentially "else" for preprocessor, I don't think I ever used it in my entire life, but I saw other people use it.
|
|
#endif // End if... Self-explanatory, and a sad thing that we need to ruin the beautiful C code with it.
|
|
@
|
|
|
|
Okay, that's all you really need to know about C preprocessor, since we won't use it much. You can write a completely pure C project, using only C language, but you'll end up with
|
|
copying and pasting a lot of code, especially external function and variable declarations. Because of that we need '#include' directive, and because of it, we need header guards,
|
|
so it's all C-hating in the end. However, we need to cover some simple macros, so you can deal with other peoples' code bases. Remember, the less "building blocks" you have, if
|
|
you learn them well, you can make anything, and you should be proud of "reinventing the wheel". If wheels weren't reinvented over and over again, then some expensive BMW would've
|
|
wooden wheels attached to it.
|
|
*/
|
|
|
|
extern int character_is_uppercase (char character); // Notice how we align those functions, I believe this improves the readability of any program, in any programming language.
|
|
extern int character_is_lowercase (char character); // Some people would just use 'ischrupp' or something, but I hate reading code written like that...
|
|
extern int character_is_digit (char character); // Important note is also that a programming language is not, and it should be like natural language, why?
|
|
extern int character_is_blank (char character); // Because we need strict rules in programming language, same like in mathematical languages, now now, don't be scared.
|
|
extern int character_is_alpha (char character);
|
|
extern int character_is_symbol (char character);
|
|
extern int character_is_visible (char character);
|
|
extern int character_is_invisible (char character);
|
|
extern int character_is_escape (char character);
|
|
extern int character_is_underscore (char character);
|
|
extern int character_is_hexadecimal (char character);
|
|
|
|
extern int character_compare_array (char character, char * character_array); // This function is singled out, because it's different from those above, and we use it internally.
|
|
|
|
/*
|
|
And here are also utility functions that handle files, most of them are reimplemented using "system calls" from <fcntl.h> and <unistd.h>, but you also have access to <stdio.h>,
|
|
which is probably the most used header file in C language. It handles the 'FILE *' type, not a file descriptors which are 'int', and has functions that are prefixed with character
|
|
'f', for example, 'fopen / fclose / fread / fwrite / fseek' and many more.
|
|
*/
|
|
|
|
extern int file_open (char * name, int mode); // We open a file descriptor 'name' with 'mode', obviously...
|
|
extern int file_close (int file); // We. Every opened file descriptor should be closed when program finishes.
|
|
extern void file_read (int file, void * data, int size); // We read from 'file' into 'data', by 'size' amount, similar to 'in'.
|
|
extern void file_write (int file, void * data, int size); // We write from 'data' into 'file', by 'size' amount, similar to 'out'.
|
|
extern int file_seek (int file, int whence); // We retrieve data about offsets in 'file'.
|
|
extern int file_size (char * name); // We get the size of the file by its' 'name'.
|
|
extern int file_type (char * name); // We get the type of the file by its' 'name' (by file name extension).
|
|
extern void * file_record (char * name); // We store an entire file into some memory address.
|
|
|
|
#endif
|