CrashCourse – 004 – Building an Int

With PT8 development starting in the next few days, several parts of the project will get slowly released in different states of completion, the standard library source code being one of them. So it is the right time to describe a few parts of the standard library and how it evolved since its inception.

The numeric types are a good point to start, since they have a lot in common: understand one and you understand them all. As a standard library, one part of it may freely use other parts of it to accomplish some tasks. But let us suppose for a moment that all classes inside the library are independent and only serve to offer an API to clients of the library, without one referencing another one within the library. Then what is the minimal Int class?

namespace sys.core.lang;

class Int {
}

That’s it! If nobody expects Int to have a specific API, Z2 as a language does not impose any structure upon it. It is just a normal value class. But the combination of namespace plus name, the class sys.core.lang.Int is still special. It is a core class (not to be confused with sys.core, the two “core” terms have separate meanings; maybe we should fix this conflict of terms), meaning the CPU has a special understanding of it. Additionally, it is an arithmetic class. While all classes are value types, some, like Int are arithmetic implicitly, without them having an explicit API to make them behave like arithmetic types. Other third-party classes do need to have an API to conform to the arithmetic requirements. And this special treatment does not apply to other classes named Int from other namespaces.

As implicitly arithmetic, even though the Int class is empty, it still behaves as if it had several methods defined inside, like the ones commonly defined through operator overloading. All the commonly used operators in C like languages work on Int instances, like +, -, *, /, <>, ==, !=, <, , =>, ++, –, &, |, ^ and ~. They all behave as expected and you are not allowed to override them and change their meaning. Using these operators one can write complex expressions and with a few exceptions, expressions involving Ints could be copied over from C or Java into Z2.

Another thing that one does with numerical values is convert from one type to another, a task commonly done with casting. As a historical note, early versions of the Z2 design had casts, but it was found that they greatly overlapped with constructors and were eliminated. Today, Z2 has no casts and all conversions are handled though constructors. You do not cast a type to another, you construct a new instance of appropriate type, based on another instance. This is a mostly a theoretical and style based distinction, because the end result and the generated machine code are the same. As a normal class, Int has a default constructor Int{}. Conversion constructors have usually one parameter, the input value that needs to be converted. If we have a Float variable called floaty or a literal Float constant, -7.4f, we can “cast” them to Int with Int{floaty} and respectively Int{-7.4f}. And this works for all built-in numeric types, even with Bool values, like Int{true}.

As mentioned in a previous post, Z2 does not like to force you to write code that it can figure out itself or is just boiler plate code. The standard Int class could have had like 20 operators overloaded, all of them with all the parameter combinations, totaling hundreds of methods and additionally have all the conversion constructors. Instead, we choose to have this core functionality be available implicitly. Thus, the class is perfectly functional empty.

And things could be left as is. The standard library could have just a bunch of numerical classes with empty bodies, offering a few expected built-in operations. But Z2 chooses to add a bit of extra functionality to such classes. Not a huge amount, we don’t want these classes to become bloated, especially since third parties can reopen these classes and add any extra functionality they might need. Today I will show a little bit of a blast from the past, the Int class as it was a few months back. Today it is almost identical, but small changes and tweaks have been made. This simpler Int class will serve as a fine introduction on how to add value to such types and in the next posts I’ll detail how the evolution of the language has led to some changes to this class.

namespace sys.core.lang;

class Int {
	const Zero: Int = 0;
	const One: Int = 1;
	const Default: Int = Zero;

	const Min: Int = -2'147'483'648;
	const Max: Int = 2'147'483'647;

	const IsSigned = true;
	const IsInteger = true;

	const MaxDigitsLow = 9;
	const MaxDigitsHigh = 10;

	property Abs: Int {
		return this > 0 ? this : -this;
	}

	property Sqr: Int {
		return this * this;
	}

	property Sqrt: Int {
		return Int{Double{this}.Sqrt};
	}

	property Floor: Int {
		return this;
	}

	property Ceil: Int {
		return this;
	}

	property Round: Int {
		return this;
	}

	def GetMin(min: Int): Int; const {
		return this >= min ? min : this;
	}

	def GetMax(max: Int): Int; const {
		return this <= max ? max : this;
	}

	def Clamp(min: Int, max: Int): Int; const {
		if (this <= min)
			return min;
		else if (this >= max)
			return max;
		else
			return this;
	}
}

This is a rather bare bones Int class but it still offers a lot more functionality over an empty class and also serves to show our approach to library design: using this style, the difference between language features and library features is blurred. The absolute value of -7 can be obtained with -7.Abs and it looks a bit like a language feature, but the implementation is actually part of the library. Additionally, all the numeric types are extremely similar and share similar API, giving you the necessary feature parity in some situations, like when working with templates.

But let’s go slower. On lines 4-6, we have a few simple constants that do not seem that useful, giving you the 0, 1 and default values for the class. They are mostly here for feature parity with more complex numeric types, like multi-dimensional points.

On lines 8 and 9 we have two extremely important constant: Min and Max, giving us the minimum and maximum Int values. Adding these two constants to the class solves an old problem quite nicely. Where to stick these values? In C/C++, you need to include a header to access INT_MIN and INT_MAX. The recommended header changes depending on if you are using C or C++. These constants could be a #define, thus sharing the myriad of well documented problems of the pre-processor. If you are using C++ and doing things the C++ way, you need std::numeric_limits::min() and std::numeric_limits::max(). Or starting with C++ 11, besides min, there is also lowest. Why are there two? What is the difference between them? The answer is not self-evident and you need to google it to find any answer. This approach is better than using #defines, and Z2 could easily go this route, but it was decided that such a simple task should not be handled by templates. Does your type have a minimum value? If yes, just add a constant into it! You can use Int.Min to get the minimum value for Int and Foo.Max to get the maximum value for Foo if it has one. Or you can use existing instances, even literal constants, so the following samples are examples of perfectly legal expressions:

A + C * (C.Max / C.Max.Min);
A + C * (Int.Max / Int.Max.Min);
Int{Bool{Bool{Int{Bool{A}.Min.Max}}.Max}};
(true <= 6).Min <= (1 < 5).Max;

Please don’t write code like this!

On line 17 we have the Abs property defined, which returns the absolute value of the instance. On line 21 we find the very simple property that returns the square of the values. This is useful as a shorthand, when having to square some complex expression. Using Sqr, you don’t need to type it twice with a * between the two, minding side effects of the expression or having to use a temporary variable and multiplying it with itself. We find it useful and it is implemented easily inside Int, so why not have it? On line 25, we have the Sqrt property, which returns the square root of the value. This already shows interconnection of classes within the standard library: the easiest implementation of square roots on integer values is casting them to double, getting the square root and casting that result back to an integer. On lines 29, 33 and 37 we have properties that return the floor, ceiling and rounded values. For floating point values these make sense, but for integer values, they don’t really and by definition the floor of an integer is the value itself. They are included for feature parity again. As an example, you may have a template vector and run a summing lamda on it that adds together the floors of the values in the vector. This will run fine on a vector of Double as an example, but would fail to compile on a vector of Int. But because we added these feature parity APIs, the types are interchangeable and it is easier to write generic algorithms.

These methods are also logically grouped. We have one “block” doing one kind of tasks, followed by other blocks. The final block is the comparison one. Having two or more values, we often need to find the minimum and maximum of them or clamp one to a range. This is why most types in Z2, when applicable, have methods like GetMax, GetMin and Clamp. Or had, to be more precise. This is where we found that having these methods which are almost always implemented identically added to each class contradicts the principle of Z2 not making you write boiler plate code and this was changed. As explained earlier, this is how numerical types were a few months back.

Next time we’ll see how we fixed this and evolve the Int class closer to its current form.

Z2 Compiler PT7 in feature freeze

There has been a lack of new information on the blog in 2016. Sorry, I didn’t have time to write posts since I was busy with PT7. For PT7, we wanted to simplify a lot of the complexity that can be found deep within the heart of the implementation of some of the standard classes. This tuned out to be a far more lengthy task then expected. But now is the right time to iron out the last few remaining kinks, even if this means breaking compatibility.

Module system fine-tuning

First, we changed the module system. The using keyword was used to make a source file refer to another source file. After the using clause, a sequence formatted like foo.bar.(...).baz was interpreted as a reference to an existing module/source file on disk and imported as such. Like a lot of things in Z2, this is an evolution of a system from C++. In this case this was a more advanced form of f#include, but this time coupled with a powerful module system. And it worked very well. But we did find a small problem that in practice may have been a moot point, but it was still worth to fix. Top level source files were referring to lower lever modules and so on until the bottom level ones were reached. Using this hierarchy, a net semi-unavoidable increase in the number of imported symbols was noticed. The module system gave you to power to choose what to make available and what to keep private, but still a small increase was leaking from layer to layer.

So we changed the sequence after the using to refer to a fully qualified entity and decoupled it from its location on disk. I shall explain this in detail one day on the blog, but it is a variant on the C# system. But in short, a source file can refer to one or more fully qualified classes and other entities and it is the job of the compiler to supply the on disk location to them. You can still organize source codes in any way you see fit and there is no compounding public symbol pollution. And since we made sure for the standard library than fully qualified class names were in the right place on disk, this change had zero compatibility break. Compilation has gotten slightly slower because of this change, but we’ll fix this in the next versions.

Greatly simplified parameter type system

Z2 is all about combining good and best-practice inspired designs with powerful compilers to reduce the complexity and fussiness of problems. This is why we use the term of “dependency/declaration order baby-sitting”, declared it a “bad thing” and went ahead to eliminate it. Another thing we wanted to eliminate was ambiguity, especially when it came to calling functions. Like most things in life and programming, ambiguity is not binary, but a spectrum. Things that have low to medium ambiguity are often resolved in programming languages by conventions and rules. In Z2 we took this to its limit and created a language than can resolve any level of ambiguity, if it is possible to resolve of course. In consequence, the rules were extremely complicated when dealing with the types of formal parameters. We set out to create a language more powerful than C++, but with better and more sane rules, and for the most part we think we were very close to achieving this. But for formal parameters and eliminating all ambiguity by rules, we failed: we created a rule-set that is easy to learn, but almost impossible to master. The exact opposite of our goal.

We tried a solution in 2015 and now in January 2016 we tried another one. Things were better but still too complicated. So we reevaluated the problem and the value proposition of having all ambiguities resolved and came to the conclusion that… it is not worth it! We rewrote the system and now common, expected and useful ambiguities are resolved by a set of easy to learn and master rules and for the rest, we are not attempting to resolve them. We give an ambiguity related compilation error! This brings Z2 in line with other languages when it comes to the effort of learning and overall we feel that this is far better place to be in related to complexity!

Low level array clean up

Z2 has a wealth of high-level containers you should use, like Vector. It also has a couple of very low-level static vectors, RBuffer and SBuffer. Unless you are writing bindings for an existing C library, using some existing OS API or declaring static look-up tables, there is no good reason to use these low-level buffers. Still, they are in use in the heart of the standard library when calling OS features, and there was a small problem with them.

RBuffer (short for RawBuffer) is the new name starting with PT7. Before, it was called Raw. It is a raw static fixed size low-level array, like standard arrays in C. Unlike C, RBuffer has a an immutable Length and Capacity, but they are not stored in RAM. When needed, they are supplied based on compile time information. So if you have a RBuffer of Byte with a Capacity of 4, it will occupy exactly 4 bytes of RAM, not 4 plus the size of the buffer. SBuffer (short for SizeBuffer) was called previously Fixed and is low-level static fixed capacity array, that has a mutable Length that is stored in RAM and an immutable Capacity that is not stored in RAM. So a SBuffer of Byte with a Capacity of 4, will occupy RAM like this: a PtrSize sized chunk to store the Length and exactly 4 bytes. The Capacity is supplied based on compile time information, without it taking up memory. So the difference between SBuffer and RBuffer is that SBuffer has an extra field to store Length.

So far so good. The small problem of library design came from the way we were using these two types. We noticed that in most cases, a RBuffer was passed as in input, but when using RBuffer as an output, we always had an extra reference parameter to store “length”. So we refactored the standard library and now in 99% of cases a const RBuffer is used as in input and a ref SBuffer is used as an output parameter. Additionally, the low-level parts of the library no longer use pointers when those pointers are meant to be a C arrays, but use RBuffer instead. This creates a cleaner standard library.

Function overloading code refactored

All these welcome simplifications worked together and allowed us to refactor and greatly simplify the function overloading code. Simple rules give simple code and now the old super complicated implementation is gone and replaced with a far shorter one that is faster than ever!

Conclusion and future versions

This chunk of simplifications turned out very well. The language is in a far better place right now. Easier to learn, easier to master and cleaner API overall.

On the downside, implementing all this took more than expected. Starting with PT8, we want to do shorter release cycles. This means that the target final PT version moves up from about PT15 to about PT20. To compensate for this, we won’t wait until we have a very stable and feature complete compiler before we make it available for testing and instead will release a super early pre-alpha version of the compiler and ZIDE for adventurous people.

PT7 is in feature freeze and we need a couple of weeks more to fix some bugs, but starting with PT8 the standard library code will begin upload to GitHub. An account was created and I’ll write a couple of explanatory post related to the standard library and then one by one, classes will be tested, documented and uploaded.