Adventures in Linux porting #1

With 9.3.1 freshly released, we made a promise that 9.3.2 will finally bring Linux support!

Now this isn’t the first time we promised Linux support, but each time we tried, it turned out to be a very time consuming task: fixing hundreds of small bugs, most of them related to paths, while we would rather work on more pressing issues, like finishing the language and library for one platform first. But we did learn a lot from the path related issues and we will introduce a platform agnostic path class in the library, one that won’t allow you to use the wrong path format on the wrong operating system.

Anyway, each time we attempted Linux, we had to postpone it due to deadlines for the next release, but each time progress was made, so this final attempt should be relatively quick and easy and give great results. In theory! Let’s see how it goes!

While we need to finish this ASAP, it is still not possible to allocate too much time during the week for it because of other tasks. This means weekend work! My task for this weekend was to fix up as much as possible and prepare a status update for the project under Linux. On Friday night, I dusted off the old virtual machine and made it install all the updates.

On Saturday at 9 AM, I was up and ready to go. The first step was grabbing the compiler sources and seeing if it compiles. There was just one compilation error, because Linux file-system is case sensitive and Windows is not. We have the convention of threating Windows as case sensitive too in order to not have any problems, but a single file naming error snuck through. After fixing this, all the project specific compilation errors were gone, but one of the third party libraries we use had an older version installed on our Linux and it was not compatible. This library is available in a third party repository, not the main Ubuntu ones, but I’ve always had problems installing it from there: the installation works, but causes crashes when invoking the library. So I like to build it by hand. I’ve downloaded the latest sources and started the compilation.

18 minutes later it was done. Great! Mental note: now that Linux is a full-time platform, it might be time to no longer use VMs. Maybe use native install. Dual boot or new PC. After this, the compiler was compiled and ready for testing.

The first test I did was running the compiler from the command line without any parameters. And the first problem manifested: the compiler supports a lot of build methods, so you need to choose one. You need to give it as a command line parameter. The compiler also lists all the available configurations, so you can choose one. But it was complaining that you did not provide one and exited before showing you the list. The correct order is to show you the list, then complain that you did not choose one from the list. We use the command line compiler all the time, but not manually, so we never noticed that if you give it no parameters, it is not that usable.

I fixed this issue and also fine tunned GCC detection. On Saturday I was convinced the detection was made better. On Sunday I’m thinking the change I did was neutral. Anyway, the compiler was up and running, listing build methods and taking parameters correctly, so it was time to compile “Hello world”. The compiler exited with a short message telling me that compilation finished correctly in a very short period of time.

Naturally, I did not believe it! The amount of time passed was a clear give-away, especially on a VM, plus it running on the first attempt would be a minor miracle. I did not check the folder where the native binary is to be put, but first I checked the temporary build folder and found something nasty:

Blasted path bugs! This screenshot also reveals that the compiler is trying to pull in the Windows libraries instead of the Posix ones. And the builder was still thinking it is on Windows. I fixed these problems and everything was working as expected: all auxiliary bugs looked like they were fixed and compilation was failing as expected because some standard library functions were implemented with Windows API and they need to be ported over to Linux.

The compiler looked like it was successfully ported and I arrived to the phase where the library needs to be ported. Porting it requires examining compilation error messages and jumping around a lot from one source to the other, the perfect job for our resident Z2 IDE, ZIDE. I compiled ZIDE and it ran fine, only I did notice that it too was trying to pull in Windows sources:

I fixed this issue again and there needs to be in the future some mechanic added to fix this problem better. I also went though the code and removed a lot of Windows bias. For starters, I commented out the offending Windows API calls and tried to compile. ZIDE hung and I had to kill it.

After a lot of research, I found the culprit: under Windows, execution return codes are 32 bit, with negative values meaning failure and positive success. Under Linux, return codes are 8 bit, 0 is success and positive values are failure. So exactly the opposite. Every time the compiler exited with -1 32 bit, it became a positive number and ZIDE was not expecting that. I wrote a quick platform independent return code mapper and made ZIDE much more robust, so it no longer hung, but reported error gracefully when something unexpected was received.

So ZIDE is considered ported too. The next step was to look at what the compiler was feeding to the backend. As expected, it was not good, uncompilable, but there were no major issue. Some required files are put in the right places under Windows for backend support, but under Linux, such an interfacing profile was not created yet. I created it and will be included in all future builds from now on.

After this was fixed, it compiled correctly. Unfortunately, “hello world” was not doing anything. If you remember, I wrote above that I commented out the offending windows code and this included console support. This was fixed last time we attempted Linux, but it broke. The Z2 compiler has a great dependency analyzer, so all the Windows specific parts of the library can exist in peace as long as you don’t call them. “Hello world” was calling just on platform specific bit, the one that outputs to the console. And beyond that, the runtime environment we created need to report bugs and that uses the console too. There was no code to pull the console in though the dependency analyzer, so I fixed that.

Z2 inherited the “extern” import mechanism from C. It turns out that that system is not good enough for our needs and will be removed next version. Instead I came up on the spot with a new “bind” mechanism, much more powerful, and hacked in support for it into the compiler in 30 minutes. Now, this is not how we do things. Everything is properly designed, sometimes even feeling like the design work never stops and there is never enough time to code. So starting Monday, this bind feature will be properly discussed and designed. There are still a lot of unknowns, but I think it is better than “extern”.

Anyway, with the “ieee.posix” package made to use the new “bind” feature and a few more fixes, finally it was time to see it all put together:

Too good to be true! So I checked the binary file. It was named “.exe”, something I quickly fixed. And it was almost 1 MiB large. Turns out the builds were made with debug information and static linking. Good to know that these features work. But without debug information and using the Linux standard .so that you inevitable get even in a small hello world like program (nothing Z specific here), the executable was around 17 KiB and working properly.

These results are so good (I did further testing to make sure that all major bugs were squashed) that I don’t want to ruin my mood by running the test suite. Next time… I will probably say “1 out of 206 tests have passed successfully”.

So there is a lot of work still. “Hello world” by coincidence only uses a single Linux specific function, so porting it over was easy. The Z2 standard library tries to be as native as possible, meaning most functions are implemented in pure Z, no platform specific bindings, but for some things, like console output, file system work, getting the time, etc. this can’t be avoided. All those functions need to be ported over to Linux. Not just ported, but using the new “bind” feature. Which first needs to be properly designed.

This port is only a 32 bit one. A 64 bit Linux must be installed and tested toughly. And then there is CLANG support too which we need to add, not just GCC.

But this was a good start and a weekend well spent!

And who knows? Maybe someday there will be a Mac/iOS port? No promises!

Advertisements

Class constructor performance foibles?

So, while working on PT 9.1, the SystemCPU class was added:

class SystemCPU {
	val Vendor = "";
	
	val MMX = false;
	val SSE = false;
	val SSE2 = false;
	val SSE3 = false;
	val SSSE3 = false;
	val SSE41 = false;
	val SSE42 = false;
}

The class isn’t complete yet since it is missing some options, including core count and the brand name for the CPU. But even when it going to be finished, it is going to be simple class since it has no functionality. The System class will instantiate one of these for you and fill it up with the appropriate information so you have a platform independent way of figuring out the capabilities of you CPU and things like number of cores and what-not.

So this is and is meant to be a simple class. It has a String field and several Bool fields. As it turns out, the current implementation of String causes its fields to be zeroed and false is also 0, so what a constructor for SystemCPU must do is set everything to 0. Do basically a memset(0). Since he have the C++ backend, one can always output pretty readable equivalent C++ code, and here are the constructors for String and SystemCPU:

String::String() {
	data = 0;
	length = 0;
	capacity = 0;
}

SystemCPU::SystemCPU(): Vendor(_Void) {
	new (&this->Vendor) ::String();
	MMX = false;
	SSE = false;
	SSE2 = false;
	SSE3 = false;
	SSSE3 = false;
	SSE41 = false;
	SSE42 = false;
}

The Vendor(_Void) and new (&this->Vendor) ::String() are a bit weird. The sequence Vendor(_Void) guarantees that the C++ SystemCPU constructor will leave Vendor alone (the optimizer recognizes the Vendor(_Void) construct due to a special constructor in String and generates a NOP for that). With the placement new, new (&this->Vendor) ::String(), we manually call the constructor for String and then initialize the rest of the fields. And all fields get initialized to 0.

The question is: should we optimize this initialization? When working with the C++ backend, we always need to decide if we do an optimization in the front-end or leave-it to the backend. Leaving it to the backend can result in more readable code. But does the backend do this? Let’s benchmark!

using sys.core.StopWatch;
using sys.core.SystemCPU;

class BenchSystemCPU {
	const TIMES = 100000;
	
	static def test_Abs(const in: [c]SystemCPU) {
		for (val i = 0; i < TIMES; i++) {
			for (val j = 0p; j < in.Length; j++)
				in[j]{};
		}
	}

	static val buffa: [c 100000]SystemCPU = void;
	
	def @main() {
		{
			val sw = StopWatch{};
			test_Abs(buffa);
			System.Out << test_Abs.Name << " finished in " << sw.Elapsed() / 1000 << " sec.\n";
		}
	}
}

This simple little benchmark will fill up a statically allocated vector of 100000 SystemCPU instances 100000 times. Never-mind that the program only due to the sheer coincidence that everything is initialized to zero doesn’t leak memory. In order for it to not leak memory the call of a manual constructor must be accompanied by a call to manual destructor too, but calling the destructor would detract from the goal of the benchmark since the destructor is not free. And even if it were to leak, we are benchmarking the time it takes to call the constructor, so it is not important.

So here are the results:

  • 37.1421495178437 seconds on MSC14, 32-bit, release mode.
  • 35.1300343545186 seconds on TDM, 32-bit, release mode.

Fair enough! But what if I am wrong about the _Void bit and the placement new. What if these two constructs completely confuse both compilers? Well, we can easily rewrite the two constructors to be much more standard:

String::String(): data(0), length(0), capacity(0) {
}

SystemCPU::SystemCPU(): Vendor(), MMX(false), SSE(false), SSE2(false), SSE3(false),
						SSSE3(false), SSE41(false), SSE42(false) {
}

Now all fields are initialized in the initializer list and I also followed the declaration order. Let’s see the benchmarks:

  • 37.1537800874641 seconds on MSC14, 32-bit, release mode.
  • 35.1234764643324 seconds on TDM, 32-bit, release mode.

The results are the same, barring a bit of standard variance, so the constructs are not confusing the compiler.

But we are 600 words in and you may be asking the following question: what is this pointless benchmarking all about. Well, I’ll do one final tweak and replace the initializer list with a memset(0). It is a bit hacky and not as pretty or maintainable, but one would except the compiler to actually do this behind the scenes and if we get the same numbers, then that is evidence enough that the memset hack should not be used. Here are the modified constructors:

String::String() {
	memset(this, 0, sizeof(String));
}

SystemCPU::SystemCPU(): Vendor(_Void) {
	memset(this, 0, sizeof(SystemCPU));
}

And the results:

  • 8.30797196732857 seconds on MSC14, 32-bit, release mode.
  • 16.5283915465819 seconds on TDM, 32-bit, release mode.

This is insane! The MSC version is roughly 4 times faster and the TDM version is roughly 2 times faster with this hackjob. To investigate the 4 vs. 2 times difference I would need to go into the assembly and into memset and see what is going on, but that is not the point.

The point is: are C++ compilers not equipped with an optimization pass to handle this? Because if they are not, adding such a pass to the front-end would be a huge win.

This issue needs further investigation and I’ll be back with a follow up after I dig though some ASM. Hopefully it is not just some mistake on my part!

Post PT 9.0 Updates and OSS

Z2C PT 9.0 was announced here and on Reddit last week and released as a pre-alpha last week. So the question is: what now?

Development on PT 9.1 has started. It will contain bug-fixes, new features and library additions. But this goes for all versions from now on until the implementation is considered done, so I won’t repeat this every week. Instead I’ll talk about long term goals.

One of the Reddit questions was: why not release the source code as OSS. Now, the standard library is released as OSS, but the compiler itself and ZIDE are not. The reason I gave for this was time an perfectionism. So everything will be released as OSS. The compiler itself will be included in the standard library. Thus, it must look decent and have a stable API. The code is no where near ready for this and in consequence it is not OSSed yet.

A lot of time is needed to achieve this and in order to compensate for this, the compiler library will be split up more. A part of it will be separated as a low dependency “scanner”, a library that goes over Z2 code and builds class layout and other meta-information. This new library shall work without the compiler or the assembly and is designed to be very lenient with errors if needed, in order to provide all the code navigation features for ZIDE and other tools. So step one is to do this isolation part and refactor it a bit. This small library will be called “z2c-scan” and will be the first to be OSS. Using this, ZIDE can be OSSed as well, since it has no other dependencies.

The rest of the library will be split up again, with z2c-comp, z2-cpp and z2c-llvm being the compiler, C++ backend and LLVM backend respectively.

And a very long term refactoring process will be executed incrementally on all the code to make it fit for prime time and not leave it as ugly and dense as it can be in some places. In the early stages of the compiler, there was a goal to get it as small as possible. A pretty stupid goal and in consequence the compiler code is very dense and not that easy to maintain. This will change with time.

Other than that, there will be an update every two weeks. Even if these smaller updates are light on features, they do contain bug-fixes and will in turn create a better user experience.