CrashCourse – 007 – Vector literal introduction

Today I shall introduce the concept of vector classes. There are several vector classes in the standard library, but I shall only focus on the most common and simple of them, Vector. Vector is a template class and it is very similar to std::vector from C++, so it can be used as any normal template class using it’s API defined by the members it has. You can see the source code for this class in sys.core\container\Vector.z2.

But using functions to manipulate a Vector is a bit boring and also easy to figure out by looking at the class interface. Instead, I’ll focus on phase two features and talk about easier and more expressive ways to work with vectors though Vector literals. So, let us first see how a vector with 3 Int items, 1, 2, 3, in this order, looks like:

[1, 2, 3]

Using this syntax, there is no mention of Vector, or Int. The [] syntax denotes (in this case) a Vector instance and always the first element dictates the type of all the elements. The first element has a type of Int, giving the whole literal a type of Vector<Int>. All further elements beyond the first must be compatible with the first, meaning that one could assign any element to the first without any losses. So the following example works:

[1, Byte{2}, 7u, 2'147'483'647u, 1l]

Byte{2} is an 8 bit unsigned value but it fits into a 32 bit signed value, so it is not impartant that the first element is an Int and the second a Byte. This is the same for 7u, a DWord with value 7 and 2'147'483'647u, another unsigned that is the maximum DWord value that still fits inside Int. 1l is a signed 64 bit values, but it still fits into a 32 bit signed value. So the final type of the literal is Vector<Int>.

On the other hand, the following example does not compile:

[1, 1.0f, 2'147'483'648u]

The first element gives us a Vector<Int>, but the second element is 1.0, a Double. Due to how floating point numbers work, a floating point value may not be able to perfectly represent its integer counterpart, so Floats and Doubles are not compatible with Int and you need to convert them manually. So the second element causes the literal to not compile. So does the third. 2'147'483'648u is too big as an unsigned to fit into Int, it being one greater than the maximum value that would fit, 2'147'483'647u.

This short inferred syntax can be handy, especially when writing some code that is very obvious in what it does and further syntax sugar is not wanted. But in Z2, inferred classes can still be manually specified, even though most of the time they would be redundant. So:

val a = Foo{};

is 100% identical to:

val a: Foo = Foo{};

and since there is no such thing as an uninitialized instance in Z2, the two snippets above are 100% identical to:

val a: Foo;

Once can do the same thing with literal vectors. As I mentioned before, the type of that literal I used as an example is Vector<Int>, so:

val p = [1, 2, 3];

is 100% identical to:

val p: Vector<Int> = [1, 2, 3];

Now that we have a variable called p, we can do some stuff with it, like printing it:

val p = [1, 2, 3];
for (val i = 0p; i < p.Length; i++)
	System.Out << p[i] << ' ';
System.Out << "\n";

1 2 3 

This was a normal traversal using a for loop, PtrSize index local variable called i and the Length of the vector. An easier way to traverse the vector is using the foreach loop:

val p = [1, 2, 3];
foreach (v in p)
	System.Out << v << ' ';
System.Out << "\n";

The class can also print itself and by default will print out the number of elements followed by the elements, so [1, 2 3] printed will be “3 1 2 3”:

System.Out << p << "\n";

3 1 2 3

This variable is also mutable, so we can change the values of some elements, add, elements to the end, insert and delete:

p << 4;
System.Out << p << "\n";
		
p << 5 << 6;
System.Out << p << "\n";
		
p.Delete(3);
System.Out << p << "\n";
		
p.Fill(4);
System.Out << p << "\n";
		
p.Add(7000);
System.Out << p << "\n";
		
p.DeleteAll(4);
System.Out << p << "\n";
		
p.DeleteAll(7000);
System.Out << p << "\n";

4 1 2 3 4
6 1 2 3 4 5 6
5 1 2 4 5 6
5 4 4 4 4 4
6 4 4 4 4 4 7000
1 7000
0

The final line in the output is interesting: the vector remains with a count of zero elements. Vector instances can have zero elements, both after several operations, like in the example above, or they can be created from the go with zero elements. The later needs more attention, since you can’t just go [] to express an empty literal because the compiler can’t infer the type of elements in the vector. A few paragraphs ago I mentioned how these literals are just normal class instances and the class name is Vector<Int>, so you can instance them as a normal class, Vector<Int>{} in order to get an empty vector instance. There is also a shorter syntax: <Int>[]. This is equivalent to the one before and it allows you to create a vector of Ints with zero elements, an empty vector. I won’t explain today how and why this works to keep the post short. But it is a useful short syntax to remember.

One interesting tidbit to note is that these empty vectors do not interact with the heap, so creating an empty vector is very fast and doesn’t allocate any RAM.

When declaring all the previous literals, the number of elements, the Length of the vector was determined implicitly by the compiler counting how many values you provided. This number can be explicitly specified by using the syntax of [number_of_elements: list_of_elements]. So [3: 1, 2, 3] is identical to [1, 2, 3], but this time we explicitly specified the number of items. The explicit number must be equal to the implicit number, so [2: 1, 2, 3] or [10: 1, 2, 3] will not compile.

What use is it then to be able to provide the item number if it must be the same as the actual number of elements? The secondary reason is safety. Let’s say you have a table that must be introduced verbatim in code and if you know how many elements there are, the compiler can help find the error of leaving out an item or two.

But the primary reason is the ellipsis syntax, [number_of_elements: list_of_elements, ...]. The ... at the end of the sequence means to repeat the last element so many times that the total element count of the sequence is equal to the provided explicit count. This means that while [10: 1, 2, 3] is a compilation error, [10: 1, 2, 3, …] is equal to [1, 2, 3, 3, 3, 3, 3, 3, 3, 3]. Yo may be inclined to think that it is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], but as said, the syntax means repeat the final element.

But it is not a dumb copy or anything similar. In this context, “repeat” means evaluate the last element a number of times: execute the same code sequence multiple times. This works very well with literals that have only one item and a count (but they can have any number of items) and can be used to achieve a lot of interesting stuff and even some meta-programming. Now I’m getting ahead of myself. Going back to the repeat code execution/evaluation paradigm the syntax of [5: 1, …] means to evaluate 1 five times and the syntax of [5: foo(), …] means evaluate foo() 5 times, while the syntax of [5: 1, 2, foo(), …] means evaluate 1 once, 2 once and foo 3 times (5 – 2). Here it is in action:

namespace org.z2legacy.ut.misc;

class VectorSample {
	def @main() {
		val a = [5: 1, ...];
		System.Out << a << "\n";
		
		val b = [5: foo(), ...];
		System.Out << b << "\n";
		
		val c = [5: 1, 2, foo(), ...];
		System.Out << c << "\n";
	}
	
	def foo(): Int {
		return dummy++;
	}
	
	val dummy = 100;
}

5 1 1 1 1 1
5 100 101 102 103 104
5 1 2 105 106 107

The fact that the last item is executed multiple times shows the way how this feature can be used for meta-programming, but there is so much more to this subject. And since I gave the example that [10: 1, 2, 3, …] is not [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], one can do something like val i = 1; [10: i++] to get [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

One final but very important note is that this evaluation and all literal construction is executed, whenever possible, at compile time. Even if it is delegated to run time, it is done as efficiently as possible. To take the [10: i++] example, supposing it is executed at run time, this will not result in an empty vector that grows to accommodate one element, i++ is computed in a temporary and then the value gets copied into the vector, then the vectors grows again to accommodate another i++ and so on. A single memory allocation happens with the correct number of elements (10) and each element is not copy constructed (if possible) but instead in-place constructed, so you should not have any extra copy constructors called. So if we have a class Foo with a default constructor with a side effect, a copy constructor with a side effect and an assignment operator with a side effect and we do val v = [100: Foo{}, …], this will result in one Vector<Int>{} constructor and 100 Foo{} default constructors, the side effect of said constructor being executed 100 times and the side effect of the copy constructor and assignment operator as well as the main effect of these methods (making copies) will not be executed since they are not called at all.

So how does this all function? Vector is a dynamic buffer with a Length and a Capacity, similar to std::vector. The Capacity property gives it amortized growth rate. All vector types have Length and Capacity, but not all have these properties mutable. One special case is where neither are mutable, so we have a fixed Length and Capacity vector. Since accessing elements beyond Length is an error, such vectors are considered fixed Length. If you pass such a vector as a reference to a function, the function will be ale to change items, but not the length. You know how many items there are but you can’t modify this number. If you pass it as const, the elements will be read only and the Length will remain immutable. This is the base case, the CArray, a class from the standard library not yet introduced.

At the other extreme is Vector, a class with mutable Length and Capacity. You both know the Length and can change it. Changing the Length might change Capacity too. And changing Capacity directly might allow for faster insertions if you know the number of elements that you are going to add to a vector. Passing such a Vector as a ref parameter will allow you to modify both the items and the Length and Capacity of the Vector.

So when designing the interface to something that needs a vector, the question to ask is: is the Length mutable or not? This will allow you to determine the correct flavor of Vector. But as general rule, Vector is the main and most commonly encountered vector flavor. Unless you have a good reason not to use it, always use Vector.

Advertisements

CrashCourse – 006 – Templates and relationals

Last time I introduced the Intrinsic class together with some handy relational operator related functionality. But since Intrinsic is now the home of operations like Clamp and Max and these operations are defined by templates, without filtering the template parameters they can receive, anything can be passed to these functions. Even things that are not comparable!

So let’s see what problems this can cause and how to solve them. Let us consider a simple Version class that holds the major, minor and revision number of some product. A very simple class, not meant to be functional, just a simple example:

class Version {
	val major = 0;
	val minor = 0;
	val revision = 0;
	
	this(maj: Int, min: Int, rev: Int) {
		major = maj;
		minor = min;
		revision = rev;
	}
}

To break up the monotony of large blocks of text on the blog, I shall show a screenshot of the typical ZIDE workflow that is used when developing the samples for this blog and anything else Z2 related:

006less01

A quick tour of the sample. On line 3 we define our simple Version class and on line 15 we define a class with a @main slot method to test the version class. On lines 17 and 18 we instantiate two version variables.

On line 20, we test the equality of these two variables. As described in CrashCourse – 003 – What you get for free, Z2 will figure out common task for you from a small pool of common tasks. Simple straightforward comparison of value types is one of these tasks. The compiler takes one look at the Version and has zero problems to test equality of two instances. You as a programmer you shouldn’t have to write such easy boring code that only compares 3 Ints. Same for line 21, where we test for inequality. These two lines work out of the box because aggregate value type equality if a well defined unambiguous operation. Such methods that are provided automatically by the compiler for you are called automatic methods. They are always provided, but you can suppress this for a class/method combination if not needed.

And this is the reason why line 22 fails to compile. On this line, we don’t use operators == or !=, but operator <. The “less” operator can’t be defined unambiguously for aggregate types and the compiler can’t resolve v1 < v2. This is what the error says. Maybe the error message could be improved though.

The solution for this compilation error is to provide a < since the compiler can’t provide one for us. In Z2, method names that start with @ are called slots and they are just regular methods, but in some context the compiler will call them implicitly. Like @main. The calling mechanisms of the slots makes them perfect candidates for defining operators in classes. This was deemed a better solution than using a keyword like in other programming languages and the literal operator. It is easier to read, type and manually call. And solves some additional problems, like with pre and post ++ operators.

The < can be defined using the @less method:

class Version {
	val major = 0;
	val minor = 0;
	val revision = 0;
	
	this(maj: Int, min: Int, rev: Int) {
		major = maj;
		minor = min;
		revision = rev;
	}
	
	def @less(const v: Version): Bool; const {
		System.Out << "call " << class << '.' << def.Name << " of class " << @neq.class << "\n";
		if (major < v.major)
			return true;
		else if (major > v.major)
			return false;
		else  {
			if (minor < v.minor)
				return true;
			else if (minor > v.minor)
				return false;
			else
				return revision < v.revision;
		}
	}
}

This updated Version class defines on line 12 the @less slot. Now, when we do v1 < v2, the @less method will be called on the v1 instance and v2 will be passed in as a parameter. This is of course just syntactic sugar since @less is just a normal method with a name that starts with @, so it can be called normally: v1 < v2 and v1.@less(v2) are equivalent and result in the same machine code. The actual implementation is not important. I used here a simple implementation of the top of my head for comparing versions and may not be the optimal way to compare versions in production code. It is just a sample. This method could be implemented to not work as a relational less operator, even though it represents that slot. It is a very good idea not to do this to avoid major confusion.

One strange thing about this method is line 13. This is just for this sample to test that this method is actually called. Normally, you wouldn’t add such tests in real code. It could have been a simple System.Out << "Hey, @less has been called"; but I opted for this more complicated statement to demonstrate some reflection. Z2 has both compile and run time reflection and these features combined with templates and can be quite powerful. But in this sample we only use reflection for basic debugging. First we print out class. This is equivalent with this.class and returns the compile time class information for this, a reference to the current instance. def is similar to this, but does not represent the current instance, but instead the current method. Like class.Name, def.Name will return the name of the current method. And similarly to this.class, def.class returns the class information for the current method. In Z2 everything is an instance of a class, even methods. The class of all methods is Def. But instead of printing out def.class, I printed out the class of another method: @neq.class. It is the same class and I used this in the sample both to demonstrate that the classes are the same and that automatic methods are there and present, even if it not obvious that they are: @eq is the == operator and @neq is the != operator. So the output of this program after the fix will be:

false
true
call Version.@less of class Def
true

With @less working, now we can try a much more complicated sample, where we use all relational operators, Min,Max, Clamp and so on:

class Test {
	def @main() {
		val v1 = Version{1, 2, 7000};
		val v2 = Version{2, 0, 1};
		
		System.Out << (v1 < v2) << "\n";
		System.Out << (v1 > v2) << "\n";
		System.Out << (v1 <= v2) << "\n";
		System.Out << (v1 >= v2) << "\n";
		
		System.Out << "Min: " << Intrinsic.Min(v1, v2) << "\n";
		System.Out << "Max: " << Intrinsic.Max(v1, v2) << "\n";
		
		val v3 = Version{1, 0, 0};
		Intrinsic.Clamp(v3, v1, v2);
		System.Out << "Clamp: " << v3 << "\n";
		
		val v4 = Intrinsic.Clamped(Version{7, 5, 6}, v1, v2);
		System.Out << "Clamped: " << v4 << "\n";
	}
}

call Version.@less of class Def
true
call Version.@less of class Def
false
call Version.@less of class Def
true
call Version.@less of class Def
false
call Version.@less of class Def
Min: 1 2 7000
call Version.@less of class Def
Max: 2 0 1
call Version.@less of class Def
Clamp: 1 2 7000
call Version.@less of class Def
call Version.@less of class Def
Clamped: 2 0 1

The interesting part of this sample starts on lines 6-9. We only defined operator <, but we can use >, <= and >=. Think of it as the compiler making up for the fact that it couldn’t provide you with a free < operator. If you have @less defined, but not @more(the > operator), the compiler can still handle > by swapping the two operands: v1 > v2 is compiled as v2 < v1. And since we provided less and @eq, the equality operator, is an automatic method, the compiler can use them both to provide operator , called @lesseq and @moreeq. And once all these operators are defined manually or automatically, you can now use all the stuff from Intrinsic. And the same applies for when you have @more defined but not @less.

This part of the language design may be a bit confusing at first, so let me reiterate the rules:

  • @eq (==) and @neq (!=) are automatic. You get them for free, but you can of course override them and do something different. For POD types you rarely need to, but or non POD types and types that embed pointers, you may need to provide a better implementation since the automatic one might not do what you need.
  • @less (<) and @more (>) are not automatic. You need to define at least one of them! If you define both, they are use appropriately. If you define only one, their opposite is resolved by swapping around the operands.
  • @lesseq (<=) and @moreeq (>=) are automatic only if you defined at least one of @less and @more. Once you define at least one of these two, you get all 4. You are free to override @lesseq and @moreeq as with the others, and sometimes it is worth it from a performance point of view. To do <=, the compiler may need to do both < and =. It is sometimes possible to implement <= in a more efficient way.

So to reiterate, in order to get full relational operator coverage, you need to define either @less, @more or both!

In the git repository you can find these samples and more, part of the daily unit-testing.

With this, the very basics of the core numerical types are covered. Next time I’ll extend upon this as a jumping off point to introducing vectors!

CrashCourse – 005 – Int and Intrinsic

Last time I wrote about the basics of the Z2 library using an older and shorter version of the Int class. It is an archetypal value type and behaves similarly in a lot of languages, so it is easy to understand. I described how it handles conversions and operators using intrinsic functionality, how one can use constants to allow a class to offer some basic information about its value range and showed a few methods and properties.

The design looks viable, but has a few problems. It is easy to see this once you try to expand upon the library by adding a few more basic types. Just adding a single class, like Double, the only dependency of Int in this sample, would see us repeat the same code with minor changes. Defining the constants each time makes sense, since they have different values. But how about some methods? Like GetMin and GetMax? Sure, they are short and having them copied over into each class, including third-party classes is not a big issue, but surely there must be a better method.

This is where intrinsics come in! Last time we talked about two types of intrinsics: conversion constructors and operators, both in the context of numerical types. These represent the highest level of intrinsic functionality: they just exist and are part of their respective classes without any formal element to hint at their existence. But there are more traditional ways to access intrinsic functionality, with the main one being the Intrinsic class. This is a class only with static methods, offering a wide-set of common functionality. And this functionality is accessed using normal methods in a normal class, so it becomes easier to gain awareness of what is available.

Determining minimum and maximum values is an example of such functionality. Intrinsic.Min will return the minimum of the provided parameters. Instead of using:

5.GetMin(9);

…you now use the much more natural syntax of:

Intrinsic.Min(5, 9);

This approach has multiple advantages, beyond the already mentioned more natural syntax. It solves the problem of having to repeat the body of GetMin in each class. Intrinsic.Min is now a template method, so it only needs to be defined once and works with all types. Additionally, while some methods inside the Intrinsic class don’t have a visible implementation, Min does and it can be useful to see what it does. And finally, this method, and its counterpart, Max, is designed to work not on individual values, but value providers, so you will be able to pass it any combination of containers.

With this first change, we eliminated two methods not only from Int, but from all comparable value types from the library. What about Clamp? First, let us ignore Intrinsic and focus on naming conventions. During the development of the library we introduced a convention related to actions that can be applied to instances: these actions are implemented using verbs. A verb in its base form describes a mutating action, one that modifies the instance. A few examples: Add, Insert, Delete, Clamp, Sort and so on. Naturally, these methods can’t be called on const instances. Verbs using the past tense do the same thing as the base form action, but do not modify the instance, instead returning a new instance and leaving the original unchanged. The same examples: Added, Inserted, Deleted, Clamped, Sorted and so on. This is just a convention and there is no obligation for third-parties to respect it. So using this convention, in our Int class, we should have two methods. If the variable a is an Int with value 5, a.Clamp(10, 100); would modify a to be clamped to the range of 10-100, in this case making it have the value of 10, while a.Clamped(10, 100); would leave a as 5, but return 10. Additionally, a.Clamp(10, 100); and a = a.Clamped(10, 100); are equivalent. This holds as a general rule, with foo.Bar(); being equivalent to foo = foo.Bared();, but the former may or may not be more efficient, depending on what operator = does and the quality of the compiler’s optimizer.

So using this convention, our second version of Int would have two methods instead of one: Clamp and Clamped. Which leaves us with the same problem: two methods which are almost always the same, having to be copied over to a bunch of classes. Intrinsic solves this again, by having two methods, Clamp and Clamped:

class TestClamp {
	def @main() {
		val a = 5;
		Intrinsic.Clamp(a, 10, 100);
		
		System.Out << a << " " << Intrinsic.Clamped(-5, -100, -10) << "\n";
	}
}

10 -10

This solves the problem, but there is more to it. 0.GetMax(-1) wasn’t the most natural syntax, but a.Clamp(min, max) is. In some cases we want a class to have a “clamp” method independently from the Intrinsic class. We could just add the method to such classes, ignoring code repeat. But there is a better method: method aliasing! In Z2, a method can be an alias for another method. Their parameters must be compatible and there are a few other requirements too which I won’t describe right now. Luckily for us, parameter compatibility includes the case where a non-static method of a class Foo is an alias of a static method from another class with N + 1 parameters, where the first parameter is of class Foo. Using method aliasing, we can add only the signature of the method to classes and let the compiler forward the call to another method, with zero performance overhead. Using this, we can add the following two methods to Int:

	def Clamp(min: Int, max: Int); Intrinsic.Clamp;
	
	def Clamped(min: Int, max: Int): Int; const Intrinsic.Clamped;

Int.Clamp(Int, Int) is now an alias for Intrinsic.Clamp(ref Int, Int, Int).

Int.Clamped(Int, Int) is now an alias for Intrinsic.Clamped(const Int, Int, Int).

I shall talk more about parameters in a future post, including how ref works, but for now it is important to understand that these are just aliases. The parameters match up, are compatible, and when you call Int.Clamp, the compiler actually generates code for a call to Intrinsic.Clamp. An alias is just a formal way to say “hey, I’d like to add a new method to an interface for some purpose which leaves the heavy lifting to someone else”. The method names do not need to be identical. They are identical here because it makes sense, but the alias name can be anything.

Now it is time to see our second version of the Int class:

namespace sys.core.lang;

class Int {
	const Zero: Int = 0;
	const One: Int = 1;
	const Default: Int = Zero;

	const Min: Int = -2'147'483'648;
	const Max: Int = 2'147'483'647;

	const IsSigned = true;
	const IsInteger = true;

	const MaxDigitsLow = 9;
	const MaxDigitsHigh = 10;

	property Abs: Int {
		return this > 0 ? this : -this;
	}

	property Sqr: Int {
		return this * this;
	}

	property Sqrt: Int {
		return Int{Double{this}.Sqrt};
	}

	property Floor: Int {
		return this;
	}

	property Ceil: Int {
		return this;
	}

	property Round: Int {
		return this;
	}

	def Clamp(min: Int, max: Int); Intrinsic.Clamp;
	
	def Clamped(min: Int, max: Int): Int; const Intrinsic.Clamped;
	
#region Saturation

	this Saturated(value: Int) {
		this = value;
	}

	this Saturated(value: DWord) {
		this = value > DWord{Max} ? Max : Int{value};
	}

	this Saturated(value: Long) {
		if (value > Max)
			this = Max;
		else if (value < Min)
			this = Min;
		else
			this = Int{value};
	}

	this Saturated(value: QWord) {
		this = value > QWord{Max} ? Max : Int{value};
	}

	this Saturated(value: Double) {
		if (value > Max)
			this = Max;
		else if (value < Min)
			this = Min;
		else
			this = Int{value};
	}

#endregion
}

We can see the changes from version 1: GetMin and GetMax are gone, replaced with calls to Intrinsic when needed, Clamp is now an alias to Intinisc.Clamp and we added Clamped. Additionally, a new section has been added to the class that handles saturation. Z2 as a systems programming language is designed to have rich and performant numerical processing capabilities. Things like clamping and saturation are considered common tasks and ass such receive full support. Saturation is a lengthy section that will get repeated in multiple classes, but here we consider it not to be a problem since third party value types will generally not offer generic saturation support and us covering the basic numerical types is sufficient. This section is surrounded by the #region/#endregion tags, a purely syntactical construct that allows you to create logically related blocks in code as a tool to facilitate organizing.

And finally, this version 2 also has one additional change from what it could do in version 1, but this change is remarkable not by adding something, but by omitting something that was planned to be added but was ultimately not. Z2 supports bit rotation, not just bit shifting. This is supported with the Intrinsic class and some time ago, we had two aliases in Int for this:

	def GetRol(bits: DWord): Int; const Intrinsic.Rol32;

	def GetRor(bits: DWord): Int; const Intrinsic.Ror32;

During the design process it was decided that bit rotations are useful enough to be fully supported but not common enough to have an alias for them in Int, so these two aliases were eliminated from all core numerical types. If you need bit rotation, you can use Rol8/Rol16/Rol32/Rol64 and Ror8/Ror16/Ror32/Ror64 directly from Intrinsic.

This second version of Int, together with Double and Intrinsic have been committed to a branch in GitHub. The main branch also has some associated UT.

Next time we’ll investigate how this generic solution for clamping and other operations works with third party classes.

CrashCourse – 004 – Building an Int

With PT8 development starting in the next few days, several parts of the project will get slowly released in different states of completion, the standard library source code being one of them. So it is the right time to describe a few parts of the standard library and how it evolved since its inception.

The numeric types are a good point to start, since they have a lot in common: understand one and you understand them all. As a standard library, one part of it may freely use other parts of it to accomplish some tasks. But let us suppose for a moment that all classes inside the library are independent and only serve to offer an API to clients of the library, without one referencing another one within the library. Then what is the minimal Int class?

namespace sys.core.lang;

class Int {
}

That’s it! If nobody expects Int to have a specific API, Z2 as a language does not impose any structure upon it. It is just a normal value class. But the combination of namespace plus name, the class sys.core.lang.Int is still special. It is a core class (not to be confused with sys.core, the two “core” terms have separate meanings; maybe we should fix this conflict of terms), meaning the CPU has a special understanding of it. Additionally, it is an arithmetic class. While all classes are value types, some, like Int are arithmetic implicitly, without them having an explicit API to make them behave like arithmetic types. Other third-party classes do need to have an API to conform to the arithmetic requirements. And this special treatment does not apply to other classes named Int from other namespaces.

As implicitly arithmetic, even though the Int class is empty, it still behaves as if it had several methods defined inside, like the ones commonly defined through operator overloading. All the commonly used operators in C like languages work on Int instances, like +, -, *, /, <>, ==, !=, <, , =>, ++, –, &, |, ^ and ~. They all behave as expected and you are not allowed to override them and change their meaning. Using these operators one can write complex expressions and with a few exceptions, expressions involving Ints could be copied over from C or Java into Z2.

Another thing that one does with numerical values is convert from one type to another, a task commonly done with casting. As a historical note, early versions of the Z2 design had casts, but it was found that they greatly overlapped with constructors and were eliminated. Today, Z2 has no casts and all conversions are handled though constructors. You do not cast a type to another, you construct a new instance of appropriate type, based on another instance. This is a mostly a theoretical and style based distinction, because the end result and the generated machine code are the same. As a normal class, Int has a default constructor Int{}. Conversion constructors have usually one parameter, the input value that needs to be converted. If we have a Float variable called floaty or a literal Float constant, -7.4f, we can “cast” them to Int with Int{floaty} and respectively Int{-7.4f}. And this works for all built-in numeric types, even with Bool values, like Int{true}.

As mentioned in a previous post, Z2 does not like to force you to write code that it can figure out itself or is just boiler plate code. The standard Int class could have had like 20 operators overloaded, all of them with all the parameter combinations, totaling hundreds of methods and additionally have all the conversion constructors. Instead, we choose to have this core functionality be available implicitly. Thus, the class is perfectly functional empty.

And things could be left as is. The standard library could have just a bunch of numerical classes with empty bodies, offering a few expected built-in operations. But Z2 chooses to add a bit of extra functionality to such classes. Not a huge amount, we don’t want these classes to become bloated, especially since third parties can reopen these classes and add any extra functionality they might need. Today I will show a little bit of a blast from the past, the Int class as it was a few months back. Today it is almost identical, but small changes and tweaks have been made. This simpler Int class will serve as a fine introduction on how to add value to such types and in the next posts I’ll detail how the evolution of the language has led to some changes to this class.

namespace sys.core.lang;

class Int {
	const Zero: Int = 0;
	const One: Int = 1;
	const Default: Int = Zero;

	const Min: Int = -2'147'483'648;
	const Max: Int = 2'147'483'647;

	const IsSigned = true;
	const IsInteger = true;

	const MaxDigitsLow = 9;
	const MaxDigitsHigh = 10;

	property Abs: Int {
		return this > 0 ? this : -this;
	}

	property Sqr: Int {
		return this * this;
	}

	property Sqrt: Int {
		return Int{Double{this}.Sqrt};
	}

	property Floor: Int {
		return this;
	}

	property Ceil: Int {
		return this;
	}

	property Round: Int {
		return this;
	}

	def GetMin(min: Int): Int; const {
		return this >= min ? min : this;
	}

	def GetMax(max: Int): Int; const {
		return this <= max ? max : this;
	}

	def Clamp(min: Int, max: Int): Int; const {
		if (this <= min)
			return min;
		else if (this >= max)
			return max;
		else
			return this;
	}
}

This is a rather bare bones Int class but it still offers a lot more functionality over an empty class and also serves to show our approach to library design: using this style, the difference between language features and library features is blurred. The absolute value of -7 can be obtained with -7.Abs and it looks a bit like a language feature, but the implementation is actually part of the library. Additionally, all the numeric types are extremely similar and share similar API, giving you the necessary feature parity in some situations, like when working with templates.

But let’s go slower. On lines 4-6, we have a few simple constants that do not seem that useful, giving you the 0, 1 and default values for the class. They are mostly here for feature parity with more complex numeric types, like multi-dimensional points.

On lines 8 and 9 we have two extremely important constant: Min and Max, giving us the minimum and maximum Int values. Adding these two constants to the class solves an old problem quite nicely. Where to stick these values? In C/C++, you need to include a header to access INT_MIN and INT_MAX. The recommended header changes depending on if you are using C or C++. These constants could be a #define, thus sharing the myriad of well documented problems of the pre-processor. If you are using C++ and doing things the C++ way, you need std::numeric_limits::min() and std::numeric_limits::max(). Or starting with C++ 11, besides min, there is also lowest. Why are there two? What is the difference between them? The answer is not self-evident and you need to google it to find any answer. This approach is better than using #defines, and Z2 could easily go this route, but it was decided that such a simple task should not be handled by templates. Does your type have a minimum value? If yes, just add a constant into it! You can use Int.Min to get the minimum value for Int and Foo.Max to get the maximum value for Foo if it has one. Or you can use existing instances, even literal constants, so the following samples are examples of perfectly legal expressions:

A + C * (C.Max / C.Max.Min);
A + C * (Int.Max / Int.Max.Min);
Int{Bool{Bool{Int{Bool{A}.Min.Max}}.Max}};
(true <= 6).Min <= (1 < 5).Max;

Please don’t write code like this!

On line 17 we have the Abs property defined, which returns the absolute value of the instance. On line 21 we find the very simple property that returns the square of the values. This is useful as a shorthand, when having to square some complex expression. Using Sqr, you don’t need to type it twice with a * between the two, minding side effects of the expression or having to use a temporary variable and multiplying it with itself. We find it useful and it is implemented easily inside Int, so why not have it? On line 25, we have the Sqrt property, which returns the square root of the value. This already shows interconnection of classes within the standard library: the easiest implementation of square roots on integer values is casting them to double, getting the square root and casting that result back to an integer. On lines 29, 33 and 37 we have properties that return the floor, ceiling and rounded values. For floating point values these make sense, but for integer values, they don’t really and by definition the floor of an integer is the value itself. They are included for feature parity again. As an example, you may have a template vector and run a summing lamda on it that adds together the floors of the values in the vector. This will run fine on a vector of Double as an example, but would fail to compile on a vector of Int. But because we added these feature parity APIs, the types are interchangeable and it is easier to write generic algorithms.

These methods are also logically grouped. We have one “block” doing one kind of tasks, followed by other blocks. The final block is the comparison one. Having two or more values, we often need to find the minimum and maximum of them or clamp one to a range. This is why most types in Z2, when applicable, have methods like GetMax, GetMin and Clamp. Or had, to be more precise. This is where we found that having these methods which are almost always implemented identically added to each class contradicts the principle of Z2 not making you write boiler plate code and this was changed. As explained earlier, this is how numerical types were a few months back.

Next time we’ll see how we fixed this and evolve the Int class closer to its current form.

CrashCourse – 003 – What you get for free

Happy New Year!

The winter holidays are done for now and it is time to get back to work! In December things worked out as planned. Z2 PT6 was finished, but we did not do any announcements since there is no reason to announce compiler versions which are not publicly available. PT7 development has started and it will have most of the planned features, but we are diverging a bit away from the planned features for this release. We consider Z2 to be syntactically a relatively clean language considering it aims to a have a feature set that is comparable to C++, but we did get the feedback that deep inside the implementation of some of the system classes, especially in containers and OS interaction, the language is not necessarily cluttered, but too complex instead. So we will try to address this in PT7, without breaking compatibility with the rest of language of course.

But back to CrashCourse. Last time we talked about the object model, how literal constants are still instances of classes and about constants in general. Today we shall talk about instances and what are the so-called “values”.

Z2 is a value based language, like C++. It is not a reference based language, like Java. When it comes to most languages, core numerical types are often value types and in Z2, since everything is a class and everything tries to follow one set of rules, everything is a value: all class instances are values. This does not mean that you can’t use references in Z2: you can and they behave as expected. The distinction is made by the ref keyword, which introduces references. In the absence of it, entities are values. I shall use the following short C snippet to illustrate what it means to use values, since C was at least at some point so ubiquitous:

int a = 10;
int b = a;
a = 0;

If you are ever so slightly familiar with programming, that code should be pretty self explanatory, as should “int” and the way these two variables, these two values behave. One line 1, we declare the variable “a” and assign 10 to “a”. On line 2 we declare “b” and assign to it the value of “a”. We have two separate forever independent entities here, “a” and “b”, which are stored in two different memory locations and at both memory locations you can find 10. On line 3, we assign 0 to “a”, but since “a” and “b” are independent, this does not affect “b”. This is the core principle of value types. When dealing with references, two references may refer to the same memory location and changing one variable might “change” the other too (it is not really a change, since there is just one entity accessed under two names), but this is impossible with values. In Z2 every instance of a class is a value, thus no mater how simple or complicated the class is, it behaves like “int” in the sample above.

For simple classes, this value semantic is natural and comes for free. For more complicated classes, classes that manage some resources, you need to write code in order to impose this value semantic. Without additional code, some classes, when trying to copy, might do a “shallow” copy and you can wind up in the situation of two separate instances not being logically independent. As an example, think about implementing a very simple string class that has two members: a pointer to the bytes in the string and its length. Without code to handle the copying of the string, a shallow copy will have two different string instances pointing to the same buffer. There are of course cases where you want a shallow copy, but for now we’ll consider that we want all classes to respect value semantics. Which leads us to the distinction between classes that behave like values by definition and classes where you need to write code to assure this behavior. The first case is called “plain old data” (POD for short). All core types are POD, static vectors of POD classes and classes in which all members are POD are all cases of working with POD. The primary goal of a POD class is to store data in memory. The other classes are called “manager classes”: these classes often own or manage some resources and the act of managing these resources is more important and often more complex than just storing things into memory. So the primary goal of a manager class is its side effect. If at least one member of a class is not POD, the entire class is considered not POD. But still, this distinction is mostly unimportant for now and even once we hit more advanced topics, the distinction comes down to one rule: manager classes have a destructor, a copy constructor and an assignment operator. If you add at least one of these to a class, it is automatically considered non POD and you must add all 3. But otherwise, there are no distinctions and you generally don’t care about POD or not POD. Containers like Vector care, which can do special optimizations for POD, but as a client of such containers you do not care. The introduction of POD here was probably premature, but I included it for completeness’ sake.

Now its time for some practical examples in which I will be using a POD class. To keep things simple, I won’t be using pointers inside the POD class, even though they are valid inside a POD class. Since POD values are so simple and natural, maybe the compiler can take care of a lot of things for you? Since Z2 is also a research project, we are interested in seeing how much the compiler can give you for free while still being useful and general. Values are so straight-forward, that in most cases, that what you do for copying one or verifying equality or serializing it to disk is self-evident. Why should the programmer have to write this code? How about having to write code only when the general solution is not good enough. So let’s see this class we shall be using:

class Point {
	val x = 0;
	val y = 0;
	val z = 0;
}

This is an incredibly simple 3D Point class. You should never have to write such a class in real programming situations, since the standard Z2 library comes with geometric types, but as a didactic example it will do just fine. For numerical types, we know we can get access to instances using literal constants, but how do we create a new instance of Point?

class Test {
	def @main() {
		Point{};
	}
}

By Using the “Foo{}” syntax. This creates a new instance of Foo, Point in our case. The “{}” syntax was selected to not conflict with the function calling syntax of “()”. When you see Foo{} you immediately know that is a constructor and when you see Foo() you immediately know that it is a function call. A “box” is created somewhere, probably in memory and a constructor is called using this syntax. In this case, a memory location large enough to hold a Point instance is reserved on the stack and the Point constructor is called upon it. The execution of a constructor is the only supported way of getting a new instance of a class. For numerical classes one can logically assume that each literal constant is the result of a call to a constructor, but this is just a logical abstraction. You can always call the constructor of core numerical types, so Int{} is absolutely identical to the literal constant 0, and DWord{7} is identical to 7u.

The next question: where did the constructor come from? Well, this is one of the first things the compiler offers for free: default constructors. In Z2 there is no such thing as an implicitly uninitialized variable/instance/value. Everything is initialized and every new instance is the result of a constructor. Z2 is a systems programming language, so you can explicitly have a non-initialized instance using a special syntax, but that is an advanced topic that is rarely needed in practice. So everything is initialized by a default constructor and that constructor is provided by the compiler. You can of course write your own constructor, but Z2 discourages the writing of constructors that do the default initialization logic. If the compiler provided constructor and your own do the same things, why write one? You can also write constructors that take parameters and Z2 supports named constructors. And when writing these constructors, again the compiler will help you with initialization, so these constructors should only have code that differs from the default constructor. And you can disable the default constructor for a class if you want it to be only be constructable using parameters or a named constructor.

After the execution of the Point constructor, the instance is valid and usable, so things like Point{}.x are readable. But how long is the instance valid? Until the destructor is called. The destructor is again generated by the compiler for you. The destructor will be called in most cases at the end of the statement, but the compiler might delay the execution a bit. Still it is guaranteed to be called before the end of the block. So in most cases, by the time execution hits the “;” before the end of line 3 the destructor is called. This is why I wanted to introduce the concept of POD: for POD types the destructor is guaranteed to be a “non-operation” (a NOP). Logically we still consider that the destructor was executed, but the compiler generates zero instructions for a destructor with POD types. It does nothing. Still, the instance is no longer accessible. If we want to make the instance available after the end of the statement, we need to bind it to a name using the “val” keyword:

class Test {
	def @main() {
		val p = Point{};
	}
}

This is new snippet is almost identical to the previous one: a “box” is still reserved for a new Point instance and the constructor is called. But his time, the name “p” is bound to this instance and the execution of the destructor is delayed to the end of the block. Thus we have created a local variable called “p” that can be used to read or write into our instance and is scope-bound, meaning it will be valid form the point of its declaration to the end of the block. The keyword is called “val”, not “var”, like it is encountered in many other languages, though it is functionally identical. “val” is short for “value”, contrasting with the other keyword that allows you to bind a name, “ref”, short for “reference”, which is used for references.

The same “val” keyword is used when declaring the Point class. The variables x, y and z are scope bound. Since they are inside the body of the class, the class itself is the scope. This means that the 3 variables are constructed when a Point is constructed and destructed when a Point is destructed. I mentioned before that Int{} and 0 are identical, so “val x = 0;” is identical to “val x = Int{};”. I prefer the first version since it is shorter and more natural to people coming from other programming languages.

But free constructors and destructors are not such a big deal. C++ is doing this right now! Let’s see what else we get for free looking at the full sample and its output:

class Point {
	val x = 0;
	val y = 0;
	val z = 0;
}


class FreeStuffTest {
	def @main() {
		val first = Point{};
		
		val second = Point{};
		second.x = 1;
		second.y = 10;
		second.z = 100;
		
		val third = Point{} {
			x = 1;
			y = 10;
			z = 100;
		};
		
		if (first == second)
			System.Out << "first is equal to second\n";
		else
			System.Out << "first is NOT equal to second\n";
		
		if (second != third)
			System.Out << "second is NOT equal to third\n";
		else
			System.Out << "second is equal to third\n";
		
		System.Out << "first: " << first << "\n";
		System.Out << "second: " << second << "\n";
		System.Out << "third: " << third << "\n";
	}
}

first is NOT equal to second
second is equal to third
first: 0 0 0
second: 1 10 100
third: 1 10 100

On line we declare first, a default constructed Point. All its members will be 0. On lines 11-14 we create a second variable, called “second”. Not happy with its default values, we initialize them to 1, 10 and 100, in order. Don’t worry about the multiple initializations, first by the constructor, then by the statements. The back-end compiler should take care of them. This is not a good place to use the constructor bypassing method I mentioned before. But the initialization is a bit verbose, so on lines 16-20 we initialize a third variable, called “third”, with the same values, but using a shorter syntax, available only immediately after a constructor.

Next, we get a taste of some other compiler provided features on lines 22-30: default equality checks. The compiler will automatically take care of == and != checks, using member-wise == and logical and for == and member-wise != and logical or for !=. Their purpose is to model value equality. This implementation covers most cases, and when the default is not good enough, all you need to do is provide your own implementations. If you only implement ==, you get != for free as a negation of == and the other way around. And you can implement both if you think you can write a more optimized logical expression. Default equality checks combined with standard library implementations means that you have a wide set of testable entities. Integers, strings, colors, hashmaps, hashmaps of hashmaps and so on are all testable. Other comparisons like < and > are not provided by default by the compiler, but the standard library covers this when appropriate.

Finally, on lines we 32-34 we see another big feature of the compiler capabilities: marshaling! The variables of class Point can be written to a stream without you having to implement this. This is not a case of the compiler generating a call to some toString() method and printing that string. General “toString” support is available though if needed, and yes, the compiler will generate that for you, but this is a case of the compiler generating marshaling code for Point instances. The default implementation uses a member-wise approach, marshaling each member to the stream. If this default implementation is not good enough, again, you can write your own where you can do just about anything. This marshaling solution is provided by a combination of compiler features and library support. Initially you have support for text streams and binary serialization, though the “sys.xml” and “sys.json” packages, when added to your compilation, provide automatic support for “xmlizing” or “jsonizing” most user classes. You can basically take any combination of classes and marshal them to a valid destination using statements about as complex as the lines above.

When you have some technical specification where the binary layout of serialization is a requirement, you’ll want to implement your own compliant methods. But when only wanting to get the data to disk, the default marshaling solution is deigned to be sufficient.

CrashCourse – 002 – The object model

Today I am going to talk about the Z2 object model. I’m afraid for the second post in a row, I will be forced to move fairly quickly and not have time to fully explain all the concepts. Hopefully, starting with post 3 in this series, I can slow down a bit.

Last time I showed the “hello world” snippet and introduced the concept of pure OOP languages: everything you manipulate is an object, a.k.a. an instance of a class. In that sample we printed a literal string constant to the STDOUT and it too must then be an object. Objects have members, so we might try and use some of them. The standard Z2 library uses the convention that if something can be directly counted in a straight-forward manner, it will have two members, called Length and Capacity. These two members must be at least immutable, but some countable classes, like vectors, can have them mutable.

I shall attach a cropped screenshot now of a sample that uses these two members, together with the compilation and execution result from ZIDE, but for the rest of the post I’ll use inline source code (to allow selection and preserve space):

2

As expected, the “Hello world!\n” string literal is a vector like object with countable elements, having a Length of 13. It is also has a Capacity of 13. The difference between Capacity and Length signals how much extra storage beyond Length is in use as a buffer available for future growth and Capacicty always greater or equal to Length. In this case it is equal, but do not be surprised when running this sample to find it greater, probably rounded to 16. The class that is in use here is String and it is UTF8.

Note
In Z2 strings are not meaningfully “null terminated”. The String class and friends make sure that there is a ‘\0’ (null) character at the end of the string, at index = Length, outside the valid index range, but the String class and the entire library does not care about that character. Strings have proper lengths and a String made out of 50 ‘\0’ characters is a legitimate String with Length 50. Reading or writing to the ‘\0’ character is a run-time error when done in user code. The only reason the ‘\0’ character is appended automatically is to make possible the passing of Strings to different APIs, which more often than not have traditionally expected null terminated strings. If you create a String for the purpose of passing it to such APIs, it might be a good idea to not store ‘\0’ characters in the middle of it.

The design choice of still adding a token ‘\0’ character to the end of a string does have consequences though. It makes having String slices extremely difficult, if not impossible.

Hold on a minute! The string is an object, but so is everything! Does that mean that String.Length is an object too?

class Hello {
	def @main() {
		System.Out << "The class of Length is: " << "Hello world!\n".Length.class << "\n";
	}
}

The class of Length is: PtrSize

Yes! The class of Length is PtrSize. In Z2 there are classes for signed and unsigned integers. Like Int and DWord. Here the systems programming language nature of Z2 is exposed a bit. Why isn’t it a signed or unsigned integer and instead it is a PtrSize? When counting random stuff, you should use the appropriate type required by the problem you are trying to solve. When counting eggs in a basket, you may use a DWord to store unsigned values. Or you may use Int out of convenience or maybe you want to use negative values to store stolen eggs. Or maybe even floating point numbers if those eggs are fancy. But when counting, offsetting or indexing into heap, and memory in general, you must use PtrSize. It is an unsigned integer large enough to be used for addressing heap size on your platform. PtrSize is almost always associated with traversing containers, so we can safely ignore it for now and focus on the bread and butter integer classes.

Like Int! Int is always signed and currently on all supported platforms it is a 32 bit value. Literal constants like 0, 1, -1, -55566847, 0xFF, 0b101, -2’147’483’648 and 2’147’483’647 are all instances of the Int class. The syntax of a integer literal constant in Z2 is an optional sign prefix (+ or -), an optional base prefix (0x, 0o or 0b), at least one digit fitting said base and an optional suffix. The optional suffix generally tells you the actual class of the constant. No suffix always means Int. The ‘u’ suffix always means DWord, the 32 bit unsigned integer. So 0 is an Int, 0u is a DWord. The ‘ character is used to optionally separate thousands, but it can be placed anywhere (except before the first digit). So 10000, 10’000 and 1000’0 are all the same constant. Using ‘ is pure syntactic sugar.

So far so good. But performance purists might frown upon core numerical types being classes. The Int class is a normal class with source code of several KiB found in the standard library. It has static and non-static members. Isn’t this slow? Especially for a system programming language?

No! Int is not a class that boxes or unboxes some hidden more fundamental type (and there is no automatic boxing in Z2). It may have a lot of members and it looks and behaves like a normal user class, but this is just syntactic sugar and compiler technology. Behind the scenes, all Int instances are “plain ints”, using the C/C++ definition of “int”. There is native support for manipulating them with the hardware. Int instances can be loaded into CPU registers and manipulated in assembly code directly. It is a strict requirement for all compliant Z2 compilers to have the same performance when using all core numerical classes as the equivalent optimized C code. There are even benchmarks in place making sure of that.

This design assures that there is 0% overhead to having Int be a class rather than some intrinsic keyword introduced type, but the strict performance requirements does mean that the Int class, together with all other core numeric classes, do have some limitations that other user classes do not have. These classes can’t have fields for starters. An Int is an Int, with a fixed hardware imposed structure. If you stick something inside it, like another Int, it will no longer be an Int. Static fields are permitted. Another limitation is that these classes can’t have virtual methods. There is a workaround for this in some specific situations, but that is a fairly advanced topic. But everything else goes. You can add as many symbolic constants, properties or methods to Int (static or not, both are allowed). These classes (and all classes in general) can be reopen, so third party libraries might add extra functionality to them.

I described Int and DWord as core classes. Core classes are a bit special because they have those hardware and performance required limitations. There is a just a small number of core classes that I won’t list for now, but we already saw Int, DWord and PtrSize. They all map directly to some native hardware resource. String is not a core class, since current CPUs do not have some atomic intrinsic understanding of strings. String is a non-core class, being able to benefit from all the features of the language. But it is still a “system” class. System classes are normal classes that are part of the system package, meaning they are available on all supported platforms. “Hello”, the class used in the snippets above is not a system class, since we wrote it from scratch and was not available before that.

Classes are introduced by the “class” keyword. The following snippet introduces 3 new classes into the default namespace:

class Foo {
}

class Bar {
}

class Baz {
}

This is the minim syntax required to define a valid class. Class members use a “block model”, meaning that in the block defined by { and } you must include the class members. All classes must have at least one block, but can have an arbitrary amount of blocks. The one required block is called the default block. A block also imposes access rights upon the members declared inside. The default block is public, so anything declared inside will be fully visible to everyone. Standard OOP access rights apply to Z2, so when designing classes, one would generally use a mix of public, private or protected blocks.

So lets add some members to the classes. I have only just introduced the concept of literal numerical constants and last post I introduced the @main method, so that is all I shall be using today to give a final but more complicated example:

class Foo {
	const AA = 7;
	
	def @main() {
		System.Out << AA << "\n";
		System.Out << Foo.AA << "\n";
		
		System.Out << Bar.AA << "\n";
		System.Out << Bar.BB << "\n";
		
		System.Out << Baz.AA << "\n";
	}
}

class Bar {
	const AA = Baz.AA + 100;
	const BB = 1000;
}

class Baz {
	const AA = 99;
}

Here is the output of the program:

7
7
199
1000
99

Literal constants are useful and convenient, but sometimes you want to define a symbolic constant. Line 2 does just that. 7 is our literal constant and we could use that, but instead using the “const” keyword, we bind the name “AA” to 7, thus creating a symbolic constant. With a class of Int. On line 5, we print the constant using its name, rather than the literal. The constant we defined inside the class Foo can be accessed directly since the @main method is in the same class. But if we wish, we can fully qualify the constant using the class.member syntax, like we did on line 6. AA and Foo.AA are identical and refer to the same symbolic constant.

Things change a little when on line 8, we try to print a constant from the class Bar. @main is in Foo so we can’t refer to Bar.AA without fully qualifying it. This line also shows that Foo.AA and Bar.AA are two different entities, even though they have the same names. So is Baz.AA. Names are locally unique within a class only.

One thing you may find strange when coming from C++ is that on lines 8 and 9 we refer to entities we declare on lines that come after. Not only is Foo referring to Bar.AA, but Bar.AA is dependent on Baz.AA, which in turn is declared after in the source code. C++ uses a rather archaic declaration model. One can learn to deal with its limitations, but juggling around hundreds of include files in large project is never as easy as it should be. In Z2 you have no such concerns. The compiler is just a bunch of algorithms running on powerful enough hardware relative to the tasks it is trying to accomplish. It has no problems with declaration orders and dependencies. The compiler sees all and is all-knowing. If you let it! What we do is artificially and willingly limit its ability to see. When dealing with classes in the same module, we use access rights like private and protected to hide members from classes. When in multi-module situations, we do the same to hide full classes.

And while the compiler is a machine, we as programmers are not. We certainly benefit form having a sensible and maintainable project structure, so while the compiler allows you to structure modules in the most difficult to use and counter-intuitive way possible if you really wish to, it is good design to use natural and intuitive structures and declaration orders. The sample above becomes slightly more readable if we change the declaration order to Baz, Bar and Foo. It also becomes more readable if we give more meaningful names to constants.

CrashCouse – 001 – Hello World!

In this series I shall be exploring the capabilities of Z2, giving a hopefully comprehensive tour of the language features and design implications, without going into the full depth of these features. Full details are reserved for documentation and the specification documents, while this series is a more casual approach to presenting the language to a new audience. In fact, in the first few parts of the series I will need to go so fast that some concepts warranting multiple posts will only be introduced using only a phrase or two, leaving further explanations for later. And during the first few posts, one might have difficulty in clearly identifying what Z2 is all about and what it brings new to the table. Z2 is not about introducing radical new ways to solve real world problems. It is also not about a few large “killer” features that can be easily summarized on a single PowerPoint slide. It is more about systematically improving upon existing features and programming styles in subtle ways. It is about the consistency of applying these tweaks to existing patterns and seeing what it all adds up to.

Z2 has a lot in common with C++. It tries to solve the same problems using a similar style, but it is an attempt at a more sanitized language, where every single design element is the result of years of real world experience with C++ and other programming languages, solving real world problems. In consequence it is a language designed to be practical, emphasizing features that are considered useful and de-emphasizing and sometimes downright eliminating undesirable features. Z2 is still a complicated language, but it is complicated because it has a lot of features, not because it has a lot of features with complicated rules, that have a lot of exceptions and idiosyncrasies and which sometimes interact poorly with each other. Z2 tries to have simple and consistent rules, with a focus on avoiding ambiguity, generally having one syntax to achieve a goal and it tries to minimize exceptions and caveats to rules.

Now that this brief introduction is done with, it is time to start with the traditional “hello world” snippet:

class Hello {
	def @main() {
		System.Out << "Hello world!\n";
	}
}

The syntax highlighting does not fully understand Z2, but at least the text can be selected.

Note
The problem with embedding Z2 code into posts is that WordPress’ syntax highlighters do no support it. The Z2 compiler can actually output properly syntax highlighted HTML code, but editing documents with huge embedded HTML snippets is kind of cumbersome, so for now I shall be using the default WordPress code highlighter.

In this particular sample, keywords and some identifiers are not properly highlighted, neither is the String constant. When editing code we use an IDE that supports Z2, called ZIDE. For comparison I will now show the same snippet as it is displayed in ZIDE:

01_01

The “hello world” sample is always a very simple one used as a quick introduction of the language features. We can use it to make some basic assumption about the language.

The “class” keyword introduces a class. Z2 is a pure object-oriented language, meaning that all entities one manipulates are objects, instances of a class. Classes can belong to namespaces, but for this sample I am not using a namespace. Z2 permits the declaration of classes outside of an explicit namespace and such classes are part of the implicit/default namespace. Declaring classes in the implicit namespace is no recommended because it can lead to class name clashes. Inside a given namespace, all class names must be unique.

The “def” keyword introduces a member function. The function’s name is “@main”. The ‘@’ character can only be used as the first character of an identifier and acts pretty much as the underscore character (‘_’) found in most other programming languages. Members that start with ‘@’ are normal class members, with no other special conditions and requirements, but the compiler can use them to achieve several goals. If the name and signature of such a member matches a certain goal, the compiler can and will use it. Operator overloading and out of the box marshaling of user classes are two examples of the compiler automatically using such methods. The official specification (once it is done) will of course include the full list of such members that are used for special purposes and third-party libraries can add new items to the list. Existing classes are always available to be “reopen”, so one can retroactively enhance the standard library with features using a third-party library. Using this approach it is possible to blur the line between a language feature and a library feature.

When Z2 compiles a file or a package, it needs to start the execution of the program somewhere: the main method. Since the language already has a mechanism of denoting members as having an additional use, the main method is called “@main”, not “main”. It was not an easy choice to change this, since traditionally this method has been called “main” for decades now. In the end out commitment to not having the language have unnecessary exceptions to rules won and we changed it to “@main”. Imagine the scenario of someone teaching Z2 to somebody else: “Members starting with ‘@’ can be used as special meaning members by the compiler”. “The compiler gives special meaning to one of the functions in a class and uses it as a startup point of the execution”. “This special function is called “main” (not “@main”). Contradiction!

Additionally, packages can contain multiple classes of course, and multiple classes might have a “@main” method. If there is only one “@main”, that method is always picked. If there are multiple, you can specify the desired class to be used as a startup class as a command line parameter when using the command line compiler or using ZIDE (which simply forwards the option to the command line compiler). Project files can store this option so you do not need to specify it at each compilation. If the option is not specified in the command line for the compiler or it is not part of the project file, when running the executable, it will prompt you to choose the class you wish to use as startup. This prompt happens logically before the startup point, at a point in time when the classes in your project can be considered to not exist yet, so there is no risk of programs with multiple startup points behaving differently once an option is chosen at run time versus programs with an explicit startup point chosen at compile time. But the executable size will be bigger since multiple classes must be compiled and included into the executable.

Using the mechanism described above, a method called “@main” from a class (“Hello” in this case) will be used as a startup point. This means that a new instance of said class is created using a constructor and the “@main” method is called. After the execution of the method, a destructor is called. Depending on compilation flags, several diagnostic steps can be run, especially in “debug” mode builds. A memory leak warning is turned on by default in all builds. Finally the program exists. The use of a constructor and destructor means that the startup class must have a public default constructor and a destructor. Z2 is not in the habit of making you write code it can figure out on its own, so you do not need to write a constructor or a destructor. In this case the code generated for both is of course literary nonexistent. There are zero assembly instructions required to construct or destroy such a simple class as “Hello”.

One thing that is not apparent from this sample is another set of the less than ideal features of C++ that Z2 tries to fix: the lack of a module support, the need of declarations and definitions, the need of forward declarations and finally the need for the dreaded pre-processor, all of these just so that the compiler can find the definitions it needs. Z2 has strong module support, together with text or binary modules. There is no such concept as a forward declaration and there are no problems with finding declarations. Z2 uses the following principle: if a person using formal dependency analysis with a pen and paper in hand can reach the conclusion that the definition of a given entity should be visible to another entity, so can the compiler, so it just works! You don’t need to take special care, use special orders and generally spend any time babysitting “include” files from various sources, as one does in C/C++. There is no pre-processor that is completely unaware of the rules of the language in Z2, but there is conditional compilation if needed.

Finally, we reach the statement that prints “Hello world!\n” to STDOUT and we see that some small parts of the standard library are available in all source files without the need to have their definition “imported” using the “using” keyword. The list of default classes that are visible is fairly short and mostly includes only the System class and the classes for fundamental types, like Int, Float, Vector, String and so on.

This concludes the first episode of the CrashCourse! I am sorry I had to use such a drive-by approach at introducing concepts, but there is much to say as an introduction and I would like to have each post reasonably short. I mentioned in passing the whole “pure OOP” design decision, so next time I shall talk more about the object model Z2 uses and detail some implications of this design.