[Back to blog]You can get this blog via IPFS: ipfs get /ipns/itjac.me/blog.pdf

[Link] tags:     PDF  IPFS  
05 July 2019

IPFS is one of those curiousities that just won't leave me, and so I get possessed from time to time to tinker with it.
Usually I sort of stop myself before actually my hands dirty, but this time things just aligned too well. I have written my own blog software, and ever since I wrote it, I had the idea that it could be used to render to more than HTML, and infact PDF was one of those things I thought possible. While IPFS can be used as a webserver, I'm not happy with web technology, and part of my interest for IPFS is with possiblities of creating replacements for the web. IPFS is perfect for sending
a PDF, which is generated from my custom blog syntax and this proves that we can do things outside the status quo.
I was very happy to find that my blog code was prepared enough for a different rendering target, which made it very easy to get 80% of the work done, after I had figured out the PDF library I'm using http://libharu.org .
I'm quite passionate about the idea of decenteralisation and IPFS gets me excited for this reason, but I do struggle to find uses for it that are serious work. I guess being able to publish a blog can be quite enabling, even if it is just a pet project for me. This is of course far from convenient enough though.
IPFS creates hashes for each file, and updated files are considered new files. There are two work arounds. IPNS is one that is too slow to be usable. The second one, which is the one I use, is called DNSlink, and it just involves addding a record to your domain. It is however not good enough, because I'll have to manually update that record everytime I update my blog, or figure out some API from my domain provider. Also propagation takes forever.
Another potential option that is only an idea at the back of my head, is using the IPFS API to create a service that maybe listens on a hash, and sends the updated blog, or something like that. This is a slight annoyance though, because it would be the first break from my "vision", which is an entirely static blog, and even though the blog itself is just a PDF; included in the vision is the idea that I don't write and have to maintain any not static code.

PDF also has limitations. It doesn't support videos for example. But any alternatives that I know of, are way too massive. These alternatives are web tech and Latex. I have been thinking about creating some simple blog reader software, and maybe this is inching me further towards actually doing it!

A note about writing PDFs with libharu. It doesn't use a XML tree or anything similar, instead you place where you want text to go, or graphics, etc, and I'm excited about how easy it was to work with, because I've been thinking about it as an alternative to the HTML way, and doing this made me a bit more confident in my assertion that HTML, and web technology, is completely bad.

I've decided to put the IPFS link at the top of this blog, so people can see it when they read the blog. The location is - /ipns/itjac.me/blog.pdf

[Link] tags:     C  code  
02 July 2019

Games have the wonderful property of communicating well visually, and so when I develop games I want to be able to share videos of my work anywhere.
This is the problem statement. The most obvious solution is to use some recording software, record your game and publish that, but this laptop that I have can't handle that. Turns out it's probably some bug in older Samsung SSDs and so my disk write speed is abyssmal.

Conveniently I program my own games, so a simple glReadPixels call gives me the final render result, and that's where I started. My first solution was simply to store these render results in memory, and then when done recording, a thread would write them to disks, each one as a PNG. I would then use FFMPEG to create the video.
I used this system to put out two development updates, but the memory requirement was high and so I couldn't record many frames. This is when I got the idea to compress the video while it's in heap memory, and then writing it would be faster because the data is smaller and maybe the encoding could be done fast enough. I first found a single header library called jo_mpeg, which creates MPEG2 files. Quality-wise the results where too bad for me, and the speed wasn't good at 20 FPS if I remember correctly. I then decided to try THEORA from Xiph.org, because I hoped that maybe it had enough options for me to make it fast enough. I stuck with this problem of getting my frames through the theora codec and into an ogg container for about 10 hours straight. I had no experience with this at the start and now I feel like I understand the fundamentals of creating video files.
The conversion into YCbCr colorspace was what spun me the most, because I did not know if I did that transformation correctly, which meant that I couldn't verify if the changes I made were a step in the correct direction, until I got the entire pipeline correct. I tried to use Imagemagick to create a YCbCr image, so that I could at least trust that I got the correct input, but it turned out that I had done that incorrectly, so while I was going through my code, trying to figure things out and getting nowhere, I somewhere along the way got it right, but now my input was incorrect, so I couldn't know. Eventually this was discovered, and then I could learn that I couldn't make theora fast enough.

I went to bed and the morning after went straight to work again, because I had got an idea just as I was heading into bed: my laptop can't handle video data, but it can store all user input and recreate a play session. I wrote the code to store all user input immediately, which was something I intended to do anyway, because replays are very useful. After that added a "replay" mode, which shows the last play you made. Then I wrote a YCbCr transformation in a fragment shader, and the game uses this shader if it is in recording mode. I then wrote again some Theora + ogg code, which I had previously deleted, and BAM it all came together. Now when I pass "record-replay" to my game, it playes the last play, but uses a YCbCr fragment shader, and then uses glReadPixels to read each frame, which it encodes and writes to disk. This process is way too slow to keep an acceptable framerate, but I don't need that when I'm playing a replay!

Come the conclusion, this was a trip for me, and I'm noticing now how I fail to put it down in words in a way I feel satisfied about. But I can only get better at that and here's a video showing the result:

[Link] tags:     C  code  
Date forever lost

I feel like writing, so I'm just going to write about some C enum antics.
Here's a segfault bug I recently encountered,
enum e_types {
} ;
char *e_types_names[TYPE_END] = {
} ;
The code is valid C and compiles, but notice that in "e_types_names", "TYPE_TWO" doesn't end with a comma. C considers it an array of two elements, "TYPE_ONE" and "TYPE_TWOTYPE_O_NEGATIVE". My compiler didn't warn me that my explicit array size wasn't matched, but it probably would with the correct warning flags, I just can't deal with "const correctness" warnings, and I don't want to figure out the correct combination of flags that suit me.

To add some ethically produced meat onto this post, I want to write about a relevant idea of mine.
enum e_types {
} ;
#define NAME_OF_ENUM(e) (char *)(e)
In practice the compiler complains that those are not integer constants. There might be a way to make it work, but I'm currently at a place without internet, so I'll just assume it doesn't, but I will argue that the idea behind it is sound. Each of the strings will be stored in the data section of the binary, where it will have a constant and unique address, which means it qualifies for the function of an enum.

EDIT (02 July 2019): Removed "TYPE_END" from very last enum example, because it was confusing.

[Link] tags:     C  response  gamejam  
08 June 2019

Introduction and the entity system

This post was prompted by https://samsai.online/post/linux-game-jam-2019-experiences/ where Samsai, from the https://gamingonlinux.com community and Twitch, https://twitch.tv/thesamsai writes about their experiences developing "Cursed Pl@tformer", https://samsai.itch.io/cursed-platformer for the Linux Game Jam 2019 event. I do love decenteralised content, and I feel compelled to engage in decenteralised engagement - by writing a response.
In said post, Samsai details the entity system they developed for this game, and the result is a very object-oriented design, where function pointers are explicit, rather than implicit in the vtable, because of their use of C, which does not come with vtable support. I do not want to discourage that method, and I do not claim that what I write about next is a better way, I just want to talk about a different way of approaching the same problem.
In this case, the function pointer points to the logic update function of the entity. This way, all entities of a game level can be iterated through, and for each, their respective update function can be called. In their code, they call this function the "processFunc" and I think now is a good time to actually show said entity code,
struct Entity {
struct Level* level;
int width;
char hidden;
char disabled;
char sprite;
char state;
double x, y, x_vel, y_vel;
void* internal_data;
void (*processFunc)(struct Entity* entity, double delta);
} ;
It is quite the marvel.
Let's talk about the different method. Replace the "processFunc" member with an enum member, which has as its enumerations the possible logic processes the game knows, so for example LOGIC_NONE, LOGIC_PLATFORM, LOGIC_BAT, LOGIC_INFO, and so on. Then on level initialisation, an array of pointers to entities, would be allocated for each possible enumeration, and populated by each entity respectively to the logic enumeration they have assigned to themselves. Now on each game loop iteration, the "bat" logic function would process every bat in the level, and the "platform" logic function would process every platform, and so on. In such a system, an entity can belong to multiple logic arrays, for example a bat could kill but also give you a message.
Just for some C education, let's talk about the "internal_data" member. While it is a perfectly valid way to solve the problem, I want to mention "unions" (no, not just because I'm a leftist). Unions in C allow you to define memory for several different types, and the actual memory allocated will be equal to the largest member of the union. For non-pointer members of the union, you get the necessary memory allocated already when the "entity" struct is allocated, skipping one pointer indirection. This is not just a performance consideration, but pointers also complicate the mental models programmers have to keep when writing code. Also unions provide type checking. In this code, the "internal_data" member, as a union, could look something like,
union {
char *message;
struct Entity *block; // Block to remove when the button is pressed
struct {
// p0 is left-most position it travels, and p1 the right-most
int p0, p1;
// This would be for controlling the speed of the platform
float last_move_time, delta_move_time;
} platform_info;
// and so on...
} internal_data;
Either method is fine, and up to those who intend to finish the product. I will note here that in the actual code from Gitlab, dated 07/June/2019, the "internal_data" member is gone.

So why is this even interesting enough to think about writing about? I will ignore the possible argument for performance, because I do not have the energy to do an actual analysis, but most importantly, it is benign at this level of game development. It is the prevailing narrative that object-oriented code design is good, and I spent a lot of years in that mindset, and when I was convinced that it wasn't good for me, I found myself completely lost on how to not code in that way. This alternative method I'm explaining is intuative to me now, but it was not at all back then. And I would like to point out that the code we are discussing, is the result of someone moving from object-oriented programming languages, to C, and the result is a very object oriented approach. All frames of mind are difficult to escape.

Memory management

Samsai writes about their issues with a memory bug that was difficult to debug. In this case we are actually talking about an infamous kind of bug that is being debated amongst C programmers; whether data should always be initialized to 0 or not. I'm ambivalent on the subject. I did encounter these issues often enough at the start, but I have learned since and I don't get them anymore, and if the language assumes it should set the memory to 0, then extra instructions have to be used to do so. A language could assume it should always initialise to 0, but let programmers tell the compiler to not do it in a scope. These changes are never going to happen for C, but it is a worthwhile thought for new languages.
Either way, this wouldn't have been an issue with the memory management code I'm about to explain. This is not to say that there would be no issues.
The idea is that when the game is executed, we allocate once enough memory from the OS for the entire runtime of the game. The way this usually works in practice, is that we allocate more than enough, and on PC grow it if we notice we need more. Then when we release the game, we analyse how much memory our game actually uses, and set the allocation to fit that. On consoles you would know how much memory is available, so you would allocate that amount, or a bit less to have some to spare in case of emergency. Luckily, it turns out that most games, at least at what I've been experimenting with, only require a simple and very fast linear memory manager. You start by defining what we will call a memory arena,
#include <stdint.h>

struct memory_arena {
uint8_t *base;
uintmax_t used, max, watermark;
} ;
Then we create a "memory_arena" for all the memory of the game,
#include <stdlib.h>
#include <string.h>
#define KB(x) (x * 1024)
#define MB(x) (KB(x) * 1024)
struct memory_arena all_memory;

int main(int argc, char *argv[]) {
all_memory.max = MB(10);
all_memory.base = (uint8_t *)malloc(all_memory.max);
all_memory.used = all_memory.watermark = 0;
memset(all_memory.base, 0, all_memory.max); // All the memory is now zeroed

Now using this "memory_arena" as linear memory is easy,
uint8_t *memory_arena_use(struct memory_arena *mem, uintmax_t amount) {
uint8_t *rv = mem->base + mem->used;
if (mem->max < mem->used + amount) {
// Do error handling
return NULL;

mem->watermark = mem->used += amount;
return rv;
Due to how I code my games, I usually make two child memory arenas from the "all memory" one - one for the "engine" and one for the "game". But the details of why are too long of a tangent, so I'll rather explain another cool option that is possible with this method. Make two child memory arenas from the "all memory" one, one called "persistent" memory, and one "transient". The difference is that the persistent one is used for data that should persist across frames, like the entities of a level, and then the transient/temporary one is scrapped at the start of each frame, which with a linear memory manager, just means setting the "used" member to zero. This is also where the "watermark" member comes in, it lets you analyse how much memory has been used at maximum, which lets you make decisions about how much to allocate. Then you can use the transient arena for things like string manipulation before printing and not worry about how much memory use is accumulating. Or parsing the map files, because the textual data is only temporarily needed, it gets translated into other data.
Another benefit of using a method like this, is that you can now easily give accurate information on how much RAM your game requires. Also you can assume that your memory is initialised to zero, because we do that manually, which relates to the bug they were having. Now I want to note that there's some weirdness about how the Linux kernel doesn't actually give you all the memory you ask for, but I don't know the details well enough.

Game timing

I just want to make a quick mention of timing in games. I'll make it quick, because it does require some criticism of the code in Cursed Pl@tformer, and I feel especially uncomfortable with criticising code in a game jam project.
The timing code in Cursed Pl@tformer makes no sense to me. The game loop first runs code that does not depend on time, it then sleeps for however much the difference is between delta and 1 / 60, and after this it runs code that depends on time. Delta is calculated from (last - now) just before the sleep, but "last" is calculated from just before the game loop on the first frame, and then from just after the sleep on consecutive frames. When delta is less than 1/60, the code sleeps and then adds that remainder to delta. I do not think this frame timing was thought about deeply, which is fine, but I'll argue for how I do frame timing.
First, "variable time step" makes no sense to me. When I first started learning game development, a form of it that I was taught is to just use the delta time from the previous frame. I think I have since encountered variations of it, where there were accumulators involved, and the math behind this might be more correct than I know, but I doubt it. It is clear that with such a time step, you are moving things an incorrect amount, if the current frame doesn't take the exact same amount of time as the previous one, which probably never happens. With a fixed time step however the amount of time the logic takes does not matter, as long as it isn't longer than the expected frame time, because the rest of the time the process sleeps or waits on vsync. One thing that worries me about if I am correct on this, is that there are AAA titles that run smoother without vsync, which means they either do use a variable time step, or a fixed time step with a smaller delta time than the refresh rate. If it is infact variable time step, then I don't know what to make of the world.
The way I reason about frame timing is,
int target_frame_time = 16; // This pseudo code assumes these timings are in ms
// I'm actually not sure if this would be more correct with 16.0f / 1000.0f. Need to think about this more
float delta_time = 1.0f / 60.0f;
while (!quit) { // Game loop
int start_time = some_time_function();
// process input
int end_time = some_time_function();
int frame_time = end_time - start_time;
int difference = target_frame_time - frame_time;
if (difference < 0) {
// This means that the frame took longer than 16 ms
// so we can decide on how to handle this.
// Maybe switch down to 30 fps
} else if (difference > 0) {
The "frame time" is actually how much time it took to process everything this frame, then we sleep until we match the target frame time.


Before I even started putting finger to key, I knew I wanted to come packing with some creative positivity, and use the game tech they made to make something of my own. The result was a non challenging list of maps that can be entered in my version of the game, by jumping onto the house, in the very first level of the game. I chose to make them a walk in the park, because I am currently in a state in life where platformer challenges are far from my preferred diet, which is why I never got past the first bat in the original game.
Using their tech was a lot of fun! I got nostalgic feelings from when I wrote Lua for Garry's Mod, or otherwise found other people's games, and learned their innards. And the simplicity, due to the constraints, made it effortless to get started, and by the nature of ASCII art, I could make my maps, and when I finished up the last one, I went back through the other ones, because I had gotten a bit better at it. I am completely convinced at this point that a limited number of options, are what embolden my creative output at the skill level that I am at, and I intend to figure out how to utilize that more.
I also wanted to do some programming with their tech, and ended up adding two new colors and a new entity type. I won't spoil them here, but they are all used in my maps. I also did fix up the frame timing stuff, to a similar result as described above.
My maps are amateurish, but amateurish creativity is a joy and don't let the system tell you to buy yourself happiness instead.

The original project is on Gitlab, so I registered an account and created a fork there, which can be found here - https://gitlab.com/Smilex1/linux-game-jam-2019-game

[Link] tags:     

[Link] tags:     C  code  Linux  convenience  design  
21 June 2018

I've been using Linux for over a decade now, and I find myself always with several terminals open, each with something running in them that I cycle through.
A problem with desktop environments that have a window bar and support alt+tab, is that each terminal has the same icon, and so I have to cycle through all of them, until I get to the correct one.
I've tried solving this by using some terminal software that supports custom background colors, but XFCE changes the window screenshot, in the Alt+Tab window, to grayscale. Also this doesn't help with the XFCE panel.
I had the idea that these should all have distinct icons, with different silhouettes, so that they are easy and quick to recognise. But I figured this would be a good feature for a desktop environment, and that meant a lot of work.
That was until I came across a tool made by Paul Evans ( http://www.leonerd.org.uk ), called xseticon ( http://www.leonerd.org.uk/code/xseticon/ ). The X protocol allows you to change the icon of other X windows, and that is what xseticon does. So all I had to do, was write a program that spawns a new terminal, then looks up the X window that was created for it. This was done by checking the PID of the window, which I got from the _NET_WM_PID atom, with the PID I got from the fork() call. Once I had the window ID, I found the PNG icon I wanted to use for this window, and then I called execv to xseticon, with the window ID and the PNG path. The way I select PNGs, is simply with a global array in the code, and then I create a file in $HOME/.icongiver that stores the previous iterator value. This way it simply iterates through the array, in modulus of the array size.
One weird issue that I never figured out, is that when the program was launched from the XFCE Application Finder, it wouldn't give the terminal an icon. I solved this by creating a script, with a /bin/bash bang that only calls my program. I then copied this script into /usr/local/bin.

I made these icons for myself to use. I'm not trained at this stuff, but it is an enjoyable activity. Mostly I just used the circle select tool in The Gimp, and the bucket tool to fill in color.

This is how it looks on the XFCE panel. The top most icon is what the default is. It's for a terminal that's been running since before I started this project.

And here it is in the Alt+Tab window,

This does not solve the problem entirely, but it does help. And for 2 hours of programming, I expect the effort to have been worth it.
It is only one file of C code, with system dependent variables being #define'd at the top.

Now I want to write about licensing. Of course this program is so simple that any license I put on it is merely symbolic; but I decided to go with GPLv3. And I'm worried where the community is going with MIT licenses, and the like. Worried, but not certain, that it is a very bad thing to do, which only propels the giants and furthers consolidation.
It is weird to think that the GPL wasn't a big part of why the FOSS community exists today. And most tech companies who have adopted open source, have done so because they have had to. This implies that open source has beaten proprietary, in at least some fields. What does it then mean to MIT license your software? To my mind, it means that corporations can freely kick and scream about change, and delay it as much as possible, but then do not have to pay anything when they lose. Because the software was MIT licensed, when the corporation makes the decision that they've lost, they just jump over at no cost and fight again.
I also recognise that the MIT license, and licenses similar to it, can benefit small business because the GPL gets in the way. But if my worry is founded on something, then I don't think the solution is to move to a MIT license. The solution might instead be to have multiple licenses, for different team sizes. However I also see that it isn't that simple, because a large corporation can create a smaller team that gets the MIT licensed version of a FOSS project, but then gives that back to the large corporation.
I realize that I'm hardly adding much to the conversion, but I like to use writing to learn what I think. And I do not have the answer, but I'm also not content with what is currently happening, and so I'd like to link to an article I read that touches on the subject - https://jacobinmag.com/2018/06/github-microsoft-open-source-code-technology

The source code and the icon PNGs can be downloaded here - https://itjac.me/releases/icongiver.tar.bz2

[Link] tags:     code  NASM  assembly  breakout  
05 June 2018

Progress is a bit slow because I'm coding in assembly, which I'm not experienced with, but doubly so because it is exam season.
However, I have now managed to draw image files, flat color rectangles and the game supports input now, so I can move the player from one side to the other, which is pretty much the entirety of user input in Breakout.
For user input, I'm reading the correct /dev/input device, using the poll syscall to check if data is available, and then just the read syscall to read if it is. The Linux input API is decently simple, as it pretty much is only one data structure, for each event.

In the image, the entire screen is first filled with an image I drew, and then that red box is the player pad. I do not know exactly what the artifacting along the lines, in the image are. My guess is it has to do with the blending. I intend to figure that out at a later point.

The movement is currently too sharp, and I think that's because I do not do any blending at all, so the player pad jumps from line to line. I'm quite excited to test out different blending techniques, and seeing the result. I do not know anything about blending, but I want to try on my own first, and my initial guess is that it's a pretty simple case of converting a continuous variable to a discrete one, with steps.

[Link] tags:     C  code  breakout  blog  
02 June 2018

I have managed to get my breakout clone to draw images into a window, and so I would like to post pictures alongside my blog posts from now on. This meant adding features to my blog software.
There were two major features that I had been excited about implementing. One is images, and the other is tags. I implemented both.

When I was thinking about how to implement image support, I had the idea that I wanted my entire blog to be just one file, and therefore one connection, instead of several connections, one for each image and the blog. I'm not an expert in networking, but I do know about the caravan concept, and I've read some about how connections take time to setup. I also decided this approach, because I've unfortunately had some web development experience, and knew that it was possible to embed image data directly into the img tag, using base64 encoding.
What I would have preferred to code, is a large image atlas, with all of the images included in my blog, and then the option to set image coordinates with the img tag. I researched a little, and it seems that it requires javascript, so that option is out.
Once I finished implementing the base64 embed, I decided to test it against just linking to a PNG file, and it turns out that there was no speed difference. I'm not used to analysing network speeds, and I just used my browsers dev tools, but it looks like the browser just waits a bunch of milliseconds to draw the embedded base64 image. The way I read it from the "waterfall", is that downloading the HTML data is about the same, but then one version takes a decent amount of time to connect, and download the PNG, while the embedded version just waits for the same amount of time as the connection would have taken.
This is disappointing because with the embedded version, the source of my website becomes much worse to read and people can't download my images, unless they copy them, and paste them into some software like The GIMP.
I have to conclude that I had a bad idea, but I'm going to keep the embedded version, because then the code gets some use and I don't want to make this project a chore. So far it has been a wonderful project that brings result, with almost no effort.

My implementation of tags went very well. I got the idea for tags, during my last development session of the blogging software, and in the beginning I was thinking about how I could do it with some server side code, but I really hated that idea, then I thought about how I could use Javascript, but that also made me sad. I'm very happy that I didn't just keep my head down and work at that time, because a few days later the simplicity of this issue became clear to me. The filesystem is perfect for this job. So now I still only have my blogging software, which generates everything in one run, and then a server that just sends static webpages.
It became clear to me that instead of any complexity, I could just create a file for every tag page. So now I have a tags subdirectory, with a file for each tag, and each file contains the entire HTML necessary to work correctly. This means that everything gets copied however many times is necessary, but I consider it worth the price, when considering the savings on complexity.
It also gave me an exercise in operating on data. I decided that when each tag is parsed, a linked list of tags get searched and when a hit is made, a pointer to the blog entry is added to that node, or a new node gets added. Then when I generate the tag files, I go through the list, create a file based on the name, create a new AST for each entry on that tag node and then I run the exact same AST writing code that creates the full blog on that file.

One other major change I made is to make my parser two step. I do not know if I will keep it this way, but I've really wanted to make a two step parser for some time, and so I just decided to make that an excuse for this one.

[Link] tags:     code  NASM  assembly  breakout  
Date lost forever

I have been writing this NASM Breakout clone to draw to fbdev on Linux. This means that in order to run it, I have to jump over to another TTY than the one I'm running X on, and run my application from there. I'm doing it this way, because at first I wanted to write a project that draws to fbdev in C, but then it made sense to also learn assembly, because that's something I've been wanting to do for some time.
I want to make a game that draws to fbdev for many reasons that I don't feel like explaining. It just feels depressing to have to explain myself all the time.

However at this point it's getting in the way that I have to switch TTYs. For example, it's getting in the way of debugging. So my next idea is to write a program in C that draws a memory mapped file onto a window, with help from SDL. I could also call the assembly code from C, or I could call C code from assembly. I'm not certain I'm doing the best thing, but drawing memory is easy.

And it turned out to be pretty easy. I had a wrong assumption that mmap would resize the file for me, but after learning that it didn't, it was smooth sailing, and now my SDL program updates at vsync, which means that any draws I make from my real program, I can see in the SDL window.

With this working I wrote my first drawing code. Previously I had just created image files that fit my window size, and done a memcpy from the image data to my frontbuffer. Now I have a render_color_rect function that takes as input xmm0, esi and rdx. xmm0 is rectangle data, with each of the four components being a doubleword in size. esi is color data, with each of the four components being a byte in size, and rdx is a pointer to the memory we write to.
I'm writing this according to the System V Application Binary Interface covention, but I'm not sure how to deal with SSE registers. I just guessed that the second register argument still should be rsi.

My drawing code just uses rep stos, because I'm not yet sure how to deal with the fact that SIMD code would handle 4 colors at a time.

[Link] tags:     C  code  design  breakout  blog  
8 May 2018

When I started writing this blog, I immediately wrote a program in C that I call "blog_composer", which in the beginning just iterated through a directory, non-recursively, and combined each file into one HTML file. I copied a quicksort algorithm from Wikipedia, so that the files get ordered before I read from them, meaning that the file with the name that has the smallest value, is the first blog post. Then I just named my files blog001, blog002, etc.

It turns out this was not sufficient, because HTML does not recognise the newline character (\n) as a new line, and so I had to do some parsing. I decided to do the easiest thing, and just replace the \n character, with the br tag. But the br tag is more characters than \n, and my code just read the file into a buffer. I then decided to just put the work in, and write a tokenizer and AST tree.

Fortunately I did it in a short day, because I've done this kind of work before, and so now my code sets a start position, then reads until EOF or it finds \n. It then sets that as an end position, creates a "token", which is just a range in the buffer, and it adds the token to an array, and one token for \n, if it wasn't EOF. It then sets the start position again, and repeats.

Once the entire file is tokenized, I read the tokens and generate nodes for the AST from that.

This worked, and I felt I could write for some time now at least, without it bothering me. But I had URLs in my first post, and it was disapointing that those weren't clickable, so I couldn't stop thinking about improving it immediately after.

A day or two later I'm again at the code, and now I've decided to properly tokenize it. Every word is a token and they get added to an array. Tokens are seperated by whitespace, and then I have a special test for '\n' that adds an extra token for newline, after adding the word. I was a bit worried that the code would get very complicated, because I don't want every token to be an AST node, I instead want one node for a sequence of text, then one node for a new line, then a text node again, etc. Surprisingly, the code for this turned out much more elegant than I had anticipated. I don't know how to describe it without going into detail about the code, which I do not want to do, but what I realized was that a node only has to be added if the token is a known token, e.g. new line, or it is not and we are not currently reading text.

With this code working, I had an easy time adding URL support. I did not want to have to write an URL tag everytime I wrote an URL, instead I wanted the parser to just match "http" and then recognise that the rest of this token is a URL. This works for http links and https, but it also meant that I had to have specific code for URLs, which wasn't too bad.

At this point I had no styling information on my blog, and I didn't want any due to load times. However I came upon this - http://bettermotherfuckingwebsite.com/ - and despite the fact that I consider that website very ugly, which makes their arrogance look laughable, I realized that I'm an outlier and that some of the things written there were true. I get this because I've seen the reactions, and heard the opinions of people around me. Personally I really like https://danluu.com and http://nothings.org
One thing from the "better website" website, I consider a good argument is that people are less inclined to read something, if each sentence is as big as the width of your screen. It feels like you are taking on an arduous task.

However, as I said, I consider their example ugly, and that's mostly because of how little of the width they use. So I decided instead to use something I learned lightly about quite some time ago, the golden ratio. The golden ratio is supposedly the ratio something has to have, compared to the full size, so that humans enjoy looking at it. More importantly it gave me a goal, and so I set out to calculate what it means for my blog. The definition for the golden ratio is (a + b)/a = a / b, and I know that (a + b) = 1, because that's 100%, which is the entire screen. So I solved 1/a = a/b, by observing that b = a^2 solves it, and then I put a + a^2 = 1 into WolframAlpha, and got the two roots. I'm only interested in the positive root, and it turns out it's 0.618... while the golden ratio is 1.618. This means that if (a + b) = 1, then a = (the golden ratio) - 1. Ok, it looks like I way overcomplicated things. But although it looks easy, I can't figure out the easier solution.

The conclusion was that I just went with width = 62%.

I decided to use all the other style information I got from the "better website" website, mostly because I don't want to do this stuff. But I can also see the argument for lighter font colors, because I have been convinced previously about the idea that you want to keep the baseline in a sort of medium, or moderate position. This lets you to make things louder or more silent.
I was convinced about this idea from people on the internet who mix music.

[Link] tags:     code  NASM  assembly  breakout  
7 May 2018

The next thing I need to do is convert 32 bit floating point RGBA data to byte sized RGBA data. The process should be as simple as multiplying each 32 bit floating point element with 0xFF, converting to integers, and then shuffling the data such that I get packed 8bit RGBA data. With all I learned from converting Farbfeld 16bit RGBA to 32bit floating point, this should come much easier.

My assumption turned out to be mostly true.

Because this conversion converts 32bit values down to 8bit values, I used 4 SIMD registers in parallel, and then OR'd them together in the end, to get one 128bit register with packed 8bit RGBA data.

I wrote a simpler version of my previous tool, to convert 8bit color data to PNG. This helped confirm the algorithm. I then focused on getting it to display correctly to the screen. I got incorrect colors when packing them as RGBA, so I wrote shuffle masks for RGBA, ARGB, BGRA and ABGR, and tested each one until I got the desired result. Turns out FBdev on Linux requires BGRA.

It was nice and reassuring that writing this code was much faster for me. It implies improvement.

[Link] tags:     code  NASM  assembly  breakout  
3 May 2018

So let's try this...

I've just read https://sites.google.com/site/steveyegge2/you-should-write-blogs about why everyone should blog, and it has given my curiousity a tool to beat my shyness. I'm also working on something interesting, and I found myself quite excited to write about it. I found the link from https://danluu.com

I've been trying to convert a Farbfeld image file, which is 16bit color information packed as RGBA, to 32bit floating point data in memory. The catch is that I'm doing it in NASM, to learn how to write and read assembly. It turns out that this was also the last straw to a very exciting improvement to my skills.

So my first process was to write the entire conversion from 32bit floating point color to 8bit color in NASM, which I do not know how to code in. This resulted in me having two pieces of code I struggled to understand. The way I've been writing this code in NASM is by asking in ##asm on freenode and the intel documentation on instructions and architecture. So I get tidbits of information, and I write the code I think should work, and then try it, until it does. However, this conversion code is more complicated and my methods aren't very good.

I figured out immediately that I need to isolate the first conversion code, and so my first improvement as a programmer presented itself; I wrote a tool in C that converts 32 bit floating point to bytes, and outputs a PNG. This is an obvious thing, but it is one of the first times I drop my project, to write a tool to help me out with the project.

So now I'm reading in the farbfeld data, running it through my guess code, which is the code I guessed would work, and I write it to a file. I then put that file through my new tool, and see how nothing works. This was to be expected, but the way I try to fix it, is by running GDB on my tool written in C, and printing the values I'm processing there while changing the NASM code. But I'm stuck.
Keep in mind that this takes days. I'm only working a few hours some days, because I am also a student.

Here I want to explain what I'm trying to do. Because Farbfeld is 16 bit RGBA data, and I need 32 bit RGBA floating point data, I'm trying to read the Farbfeld data into 128bit SIMD registers, and then expand the 16 bit data to 32 bit, using PSHUFB, convert to floating point using CVTDQ2PS, and then converting the values into the [0;1] interval by dividing each 32 bit floating point by 0xFFFF, the max value for 16 bit. The follow up is then that I can modify the image data, sRGB correct it and finally multiply the values with 0xFF, which gives me 8bit RGBA data.

I do not know how to write NASM, and I do not know how to write SIMD code. So I have these 128bit registers I'm filling with data, using MOVAPS, and I'm trying to write a divisor that will divide each 32 bit element correctly. I'm also trying to shuffle the 16 bit elements correctly, so that they become 32 bit integers with the same value, and I'm trying to convert this to floating point. At this point I don't know if I'm doing any of these steps correctly.

So to simplify my issue, I first simplify my input. Instead of the actual Farbfeld image, I'm now just writing a 128bit constant, and I learned from ##asm that 0x0F0E0D0C0B0A090807060503020100 is the identity for the PSHUFB instruction, which makes sense. However I'm still getting rubbish output, and my process requires that I convert to 32 bit floating point, because that's what my other tool converts to 8bit.

I've only ever looked at data as a hex dump a few times, and each of those times it's just been something I opened up in Vim, put me in Hex mode with :%!xdd and instantely I see if the data makes sense or not. This time is different. This time it doesn't open in Vim, because the file is too large.

To solve this I found wxHexEditor via googling. I was skeptical about it at first, but it read my file instantily and even rereads the file whenever my program writes new data to it. I can say I was pleasantly surprised, and it can be found here - https://www.wxhexeditor.org/

So now that I'm able to read the data directly, I can eliminate most parts of the code and only have the read and write left. This means that I should have a repeating pattern of the constant I'm reading from, and that is what I get. So MOVAPS and MOVDQA are working correctly.

Next I need to learn about the shuffle. Turning the shuffle code on and off gives me the same result, which means that the identity mask is correct. However, at this point I'm the bearer of two incorrect assumptions. So, in my .data section I have these constants,

g_move_16_to_32_shift_mask_1: dq 0x0F0E0D0C0B0A0908,0x0706050403020100
g_move_16_to_32_shift_mask_2: dq 0x0F0E0D0C0B0A0908,0x0706050403020100

I need two masks, because when converting from 16bit to 32bit, I'm doubling the amount of data necessary. So I read the same 16 bit data into two XMM registers, and I shuffle one of the registers with the first mmask, and the other with the second. This way, one of registers will end with the first half of the data, and the other register with the second half.

The PSHUFB instruction takes two operands, where the second operand is the mask. It then shuffles each byte in the first operand, according to the mask. The mask works as follows. Each byte of the mask determines what data will be put into the byte, of the same position in the first operand. Also if the most significant bit, of any byte is 1, then that byte in the first operand will be zero. The least significant 4 bits, of each byte, determine which other byte will be copied into the location of the current byte. Notice that 4 bits max out at 15, so there are 16 options, and in 128 bit registers, there are 16 bytes.

So my first incorrect assumption was that the least significant 4 bits of a byte, determined where the current byte's value from the first operand, would be copied to. Secondly, and anyone who's written in NASM before might have caught this, my constant is incorrect. I wanted 0x0F0E0D0C0B0A090807060503020100, but what I've put into my constant is 0x070605030201000F0E0D0C0B0A0908. Which means that when I thought I had proven that I had the correct identity map previously, I really messed up.

I spent some time in my new hex editor and tried different things with the mask constant, until I understood how it worked, and then I came up with this result,

g_move_16_to_32_shift_mask_1: dq 0xFFFF0302FFFF0100,0xFFFF0706FFFF0504
g_move_16_to_32_shift_mask_2: dq 0xFFFF0B0AFFFF0908,0xFFFF0F0EFFFF0D0C

it works.

As it turns out, once I got the shuffle correct, everything (almost) worked, because the division and conversion code was already correct. So, why almost. Well about the last 1/3 of the data is rubbish data.
So first I verify that writing to PNG is correct, and it is. Then I verify that my size math is correct, and it all looks correct. If so, then the input data must be incorrect. To test this, I read again from a constant, and it turns out that the entire image is written correctly if I just write a constant.

This leads me to my next debug attempt. I make the input image be only 0xFF0000FF, and then I run a INT3 when my conversion code gets anything other than 0xFF0000FF. (Krita doesn't let me set color values, so I switch to The Gimp. I switch between these two all the time, and I don't know why. I really need better tools)

It worked.

So I have a linear memory allocator, and I allocate memory for the 32 bit floating point data, and then for the Farbfeld file. I hadn't allocated enough for the first part, so I got a race condition. Also I was aligning the loop counter to 16 bit, which was an error.

The code for the test was,

movaps xmm6,xmm4 ; for testing
pcmpeqq xmm6, [g_test_value]

sub rsp, 16
movdqu [rsp], xmm6
mov rax, [rsp]
je pass_the_test
mov rax, [rsp + 8]
je pass_the_test

So now I have working code for converting Farbfeld to 32 bit floating point. Next up is conversion from floating point to bytes, and then in the future I want to make optimization tests. But right now I don't have the code for timing it.