[Back to blog]

[Link] tags:     code  NASM  assembly  breakout  
05 June 2018

Progress is a bit slow because I'm coding in assembly, which I'm not experienced with, but doubly so because it is exam season.
However, I have now managed to draw image files, flat color rectangles and the game supports input now, so I can move the player from one side to the other, which is pretty much the entirety of user input in Breakout.
For user input, I'm reading the correct /dev/input device, using the poll syscall to check if data is available, and then just the read syscall to read if it is. The Linux input API is decently simple, as it pretty much is only one data structure, for each event.

In the image, the entire screen is first filled with an image I drew, and then that red box is the player pad. I do not know exactly what the artifacting along the lines, in the image are. My guess is it has to do with the blending. I intend to figure that out at a later point.

The movement is currently too sharp, and I think that's because I do not do any blending at all, so the player pad jumps from line to line. I'm quite excited to test out different blending techniques, and seeing the result. I do not know anything about blending, but I want to try on my own first, and my initial guess is that it's a pretty simple case of converting a continuous variable to a discrete one, with steps.

[Link] tags:     code  NASM  assembly  breakout  
Date lost forever

I have been writing this NASM Breakout clone to draw to fbdev on Linux. This means that in order to run it, I have to jump over to another TTY than the one I'm running X on, and run my application from there. I'm doing it this way, because at first I wanted to write a project that draws to fbdev in C, but then it made sense to also learn assembly, because that's something I've been wanting to do for some time.
I want to make a game that draws to fbdev for many reasons that I don't feel like explaining. It just feels depressing to have to explain myself all the time.

However at this point it's getting in the way that I have to switch TTYs. For example, it's getting in the way of debugging. So my next idea is to write a program in C that draws a memory mapped file onto a window, with help from SDL. I could also call the assembly code from C, or I could call C code from assembly. I'm not certain I'm doing the best thing, but drawing memory is easy.

And it turned out to be pretty easy. I had a wrong assumption that mmap would resize the file for me, but after learning that it didn't, it was smooth sailing, and now my SDL program updates at vsync, which means that any draws I make from my real program, I can see in the SDL window.

With this working I wrote my first drawing code. Previously I had just created image files that fit my window size, and done a memcpy from the image data to my frontbuffer. Now I have a render_color_rect function that takes as input xmm0, esi and rdx. xmm0 is rectangle data, with each of the four components being a doubleword in size. esi is color data, with each of the four components being a byte in size, and rdx is a pointer to the memory we write to.
I'm writing this according to the System V Application Binary Interface covention, but I'm not sure how to deal with SSE registers. I just guessed that the second register argument still should be rsi.

My drawing code just uses rep stos, because I'm not yet sure how to deal with the fact that SIMD code would handle 4 colors at a time.

[Link] tags:     code  NASM  assembly  breakout  
7 May 2018

The next thing I need to do is convert 32 bit floating point RGBA data to byte sized RGBA data. The process should be as simple as multiplying each 32 bit floating point element with 0xFF, converting to integers, and then shuffling the data such that I get packed 8bit RGBA data. With all I learned from converting Farbfeld 16bit RGBA to 32bit floating point, this should come much easier.

My assumption turned out to be mostly true.

Because this conversion converts 32bit values down to 8bit values, I used 4 SIMD registers in parallel, and then OR'd them together in the end, to get one 128bit register with packed 8bit RGBA data.

I wrote a simpler version of my previous tool, to convert 8bit color data to PNG. This helped confirm the algorithm. I then focused on getting it to display correctly to the screen. I got incorrect colors when packing them as RGBA, so I wrote shuffle masks for RGBA, ARGB, BGRA and ABGR, and tested each one until I got the desired result. Turns out FBdev on Linux requires BGRA.

It was nice and reassuring that writing this code was much faster for me. It implies improvement.

[Link] tags:     code  NASM  assembly  breakout  
3 May 2018

So let's try this...

I've just read https://sites.google.com/site/steveyegge2/you-should-write-blogs about why everyone should blog, and it has given my curiousity a tool to beat my shyness. I'm also working on something interesting, and I found myself quite excited to write about it. I found the link from https://danluu.com

I've been trying to convert a Farbfeld image file, which is 16bit color information packed as RGBA, to 32bit floating point data in memory. The catch is that I'm doing it in NASM, to learn how to write and read assembly. It turns out that this was also the last straw to a very exciting improvement to my skills.

So my first process was to write the entire conversion from 32bit floating point color to 8bit color in NASM, which I do not know how to code in. This resulted in me having two pieces of code I struggled to understand. The way I've been writing this code in NASM is by asking in ##asm on freenode and the intel documentation on instructions and architecture. So I get tidbits of information, and I write the code I think should work, and then try it, until it does. However, this conversion code is more complicated and my methods aren't very good.

I figured out immediately that I need to isolate the first conversion code, and so my first improvement as a programmer presented itself; I wrote a tool in C that converts 32 bit floating point to bytes, and outputs a PNG. This is an obvious thing, but it is one of the first times I drop my project, to write a tool to help me out with the project.

So now I'm reading in the farbfeld data, running it through my guess code, which is the code I guessed would work, and I write it to a file. I then put that file through my new tool, and see how nothing works. This was to be expected, but the way I try to fix it, is by running GDB on my tool written in C, and printing the values I'm processing there while changing the NASM code. But I'm stuck.
Keep in mind that this takes days. I'm only working a few hours some days, because I am also a student.

Here I want to explain what I'm trying to do. Because Farbfeld is 16 bit RGBA data, and I need 32 bit RGBA floating point data, I'm trying to read the Farbfeld data into 128bit SIMD registers, and then expand the 16 bit data to 32 bit, using PSHUFB, convert to floating point using CVTDQ2PS, and then converting the values into the [0;1] interval by dividing each 32 bit floating point by 0xFFFF, the max value for 16 bit. The follow up is then that I can modify the image data, sRGB correct it and finally multiply the values with 0xFF, which gives me 8bit RGBA data.

I do not know how to write NASM, and I do not know how to write SIMD code. So I have these 128bit registers I'm filling with data, using MOVAPS, and I'm trying to write a divisor that will divide each 32 bit element correctly. I'm also trying to shuffle the 16 bit elements correctly, so that they become 32 bit integers with the same value, and I'm trying to convert this to floating point. At this point I don't know if I'm doing any of these steps correctly.

So to simplify my issue, I first simplify my input. Instead of the actual Farbfeld image, I'm now just writing a 128bit constant, and I learned from ##asm that 0x0F0E0D0C0B0A090807060503020100 is the identity for the PSHUFB instruction, which makes sense. However I'm still getting rubbish output, and my process requires that I convert to 32 bit floating point, because that's what my other tool converts to 8bit.

I've only ever looked at data as a hex dump a few times, and each of those times it's just been something I opened up in Vim, put me in Hex mode with :%!xdd and instantely I see if the data makes sense or not. This time is different. This time it doesn't open in Vim, because the file is too large.

To solve this I found wxHexEditor via googling. I was skeptical about it at first, but it read my file instantily and even rereads the file whenever my program writes new data to it. I can say I was pleasantly surprised, and it can be found here - https://www.wxhexeditor.org/

So now that I'm able to read the data directly, I can eliminate most parts of the code and only have the read and write left. This means that I should have a repeating pattern of the constant I'm reading from, and that is what I get. So MOVAPS and MOVDQA are working correctly.

Next I need to learn about the shuffle. Turning the shuffle code on and off gives me the same result, which means that the identity mask is correct. However, at this point I'm the bearer of two incorrect assumptions. So, in my .data section I have these constants,

g_move_16_to_32_shift_mask_1: dq 0x0F0E0D0C0B0A0908,0x0706050403020100
g_move_16_to_32_shift_mask_2: dq 0x0F0E0D0C0B0A0908,0x0706050403020100

I need two masks, because when converting from 16bit to 32bit, I'm doubling the amount of data necessary. So I read the same 16 bit data into two XMM registers, and I shuffle one of the registers with the first mmask, and the other with the second. This way, one of registers will end with the first half of the data, and the other register with the second half.

The PSHUFB instruction takes two operands, where the second operand is the mask. It then shuffles each byte in the first operand, according to the mask. The mask works as follows. Each byte of the mask determines what data will be put into the byte, of the same position in the first operand. Also if the most significant bit, of any byte is 1, then that byte in the first operand will be zero. The least significant 4 bits, of each byte, determine which other byte will be copied into the location of the current byte. Notice that 4 bits max out at 15, so there are 16 options, and in 128 bit registers, there are 16 bytes.

So my first incorrect assumption was that the least significant 4 bits of a byte, determined where the current byte's value from the first operand, would be copied to. Secondly, and anyone who's written in NASM before might have caught this, my constant is incorrect. I wanted 0x0F0E0D0C0B0A090807060503020100, but what I've put into my constant is 0x070605030201000F0E0D0C0B0A0908. Which means that when I thought I had proven that I had the correct identity map previously, I really messed up.

I spent some time in my new hex editor and tried different things with the mask constant, until I understood how it worked, and then I came up with this result,

g_move_16_to_32_shift_mask_1: dq 0xFFFF0302FFFF0100,0xFFFF0706FFFF0504
g_move_16_to_32_shift_mask_2: dq 0xFFFF0B0AFFFF0908,0xFFFF0F0EFFFF0D0C

it works.

As it turns out, once I got the shuffle correct, everything (almost) worked, because the division and conversion code was already correct. So, why almost. Well about the last 1/3 of the data is rubbish data.
So first I verify that writing to PNG is correct, and it is. Then I verify that my size math is correct, and it all looks correct. If so, then the input data must be incorrect. To test this, I read again from a constant, and it turns out that the entire image is written correctly if I just write a constant.

This leads me to my next debug attempt. I make the input image be only 0xFF0000FF, and then I run a INT3 when my conversion code gets anything other than 0xFF0000FF. (Krita doesn't let me set color values, so I switch to The Gimp. I switch between these two all the time, and I don't know why. I really need better tools)

It worked.

So I have a linear memory allocator, and I allocate memory for the 32 bit floating point data, and then for the Farbfeld file. I hadn't allocated enough for the first part, so I got a race condition. Also I was aligning the loop counter to 16 bit, which was an error.

The code for the test was,

movaps xmm6,xmm4 ; for testing
pcmpeqq xmm6, [g_test_value]

sub rsp, 16
movdqu [rsp], xmm6
mov rax, [rsp]
je pass_the_test
mov rax, [rsp + 8]
je pass_the_test

So now I have working code for converting Farbfeld to 32 bit floating point. Next up is conversion from floating point to bytes, and then in the future I want to make optimization tests. But right now I don't have the code for timing it.