[Back to blog]You can get this blog via IPFS: ipfs get /ipns/itjac.me/blog.pdf
[Link] tags: C code breakout blog
02 June 2018
I have managed to get my breakout clone to draw images into a window, and so I would like to post pictures alongside my blog posts from now on. This meant adding features to my blog software.
There were two major features that I had been excited about implementing. One is images, and the other is tags. I implemented both.
When I was thinking about how to implement image support, I had the idea that I wanted my entire blog to be just one file, and therefore one connection, instead of several connections, one for each image and the blog. I'm not an expert in networking, but I do know about the caravan concept, and I've read some about how connections take time to setup. I also decided this approach, because I've unfortunately had some web development experience, and knew that it was possible to embed image data directly into the img tag, using base64 encoding.
Once I finished implementing the base64 embed, I decided to test it against just linking to a PNG file, and it turns out that there was no speed difference. I'm not used to analysing network speeds, and I just used my browsers dev tools, but it looks like the browser just waits a bunch of milliseconds to draw the embedded base64 image. The way I read it from the "waterfall", is that downloading the HTML data is about the same, but then one version takes a decent amount of time to connect, and download the PNG, while the embedded version just waits for the same amount of time as the connection would have taken.
This is disappointing because with the embedded version, the source of my website becomes much worse to read and people can't download my images, unless they copy them, and paste them into some software like The GIMP.
I have to conclude that I had a bad idea, but I'm going to keep the embedded version, because then the code gets some use and I don't want to make this project a chore. So far it has been a wonderful project that brings result, with almost no effort.
It became clear to me that instead of any complexity, I could just create a file for every tag page. So now I have a tags subdirectory, with a file for each tag, and each file contains the entire HTML necessary to work correctly. This means that everything gets copied however many times is necessary, but I consider it worth the price, when considering the savings on complexity.
It also gave me an exercise in operating on data. I decided that when each tag is parsed, a linked list of tags get searched and when a hit is made, a pointer to the blog entry is added to that node, or a new node gets added. Then when I generate the tag files, I go through the list, create a file based on the name, create a new AST for each entry on that tag node and then I run the exact same AST writing code that creates the full blog on that file.
One other major change I made is to make my parser two step. I do not know if I will keep it this way, but I've really wanted to make a two step parser for some time, and so I just decided to make that an excuse for this one.
[Link] tags: C code design breakout blog
8 May 2018
When I started writing this blog, I immediately wrote a program in C that I call "blog_composer", which in the beginning just iterated through a directory, non-recursively, and combined each file into one HTML file. I copied a quicksort algorithm from Wikipedia, so that the files get ordered before I read from them, meaning that the file with the name that has the smallest value, is the first blog post. Then I just named my files blog001, blog002, etc.
It turns out this was not sufficient, because HTML does not recognise the newline character (\n) as a new line, and so I had to do some parsing. I decided to do the easiest thing, and just replace the \n character, with the br tag. But the br tag is more characters than \n, and my code just read the file into a buffer. I then decided to just put the work in, and write a tokenizer and AST tree.
Fortunately I did it in a short day, because I've done this kind of work before, and so now my code sets a start position, then reads until EOF or it finds \n. It then sets that as an end position, creates a "token", which is just a range in the buffer, and it adds the token to an array, and one token for \n, if it wasn't EOF. It then sets the start position again, and repeats.
Once the entire file is tokenized, I read the tokens and generate nodes for the AST from that.
This worked, and I felt I could write for some time now at least, without it bothering me. But I had URLs in my first post, and it was disapointing that those weren't clickable, so I couldn't stop thinking about improving it immediately after.
A day or two later I'm again at the code, and now I've decided to properly tokenize it. Every word is a token and they get added to an array. Tokens are seperated by whitespace, and then I have a special test for '\n' that adds an extra token for newline, after adding the word. I was a bit worried that the code would get very complicated, because I don't want every token to be an AST node, I instead want one node for a sequence of text, then one node for a new line, then a text node again, etc. Surprisingly, the code for this turned out much more elegant than I had anticipated. I don't know how to describe it without going into detail about the code, which I do not want to do, but what I realized was that a node only has to be added if the token is a known token, e.g. new line, or it is not and we are not currently reading text.
With this code working, I had an easy time adding URL support. I did not want to have to write an URL tag everytime I wrote an URL, instead I wanted the parser to just match "http" and then recognise that the rest of this token is a URL. This works for