CrashCourse – 003 – What you get for free

Happy New Year!

The winter holidays are done for now and it is time to get back to work! In December things worked out as planned. Z2 PT6 was finished, but we did not do any announcements since there is no reason to announce compiler versions which are not publicly available. PT7 development has started and it will have most of the planned features, but we are diverging a bit away from the planned features for this release. We consider Z2 to be syntactically a relatively clean language considering it aims to a have a feature set that is comparable to C++, but we did get the feedback that deep inside the implementation of some of the system classes, especially in containers and OS interaction, the language is not necessarily cluttered, but too complex instead. So we will try to address this in PT7, without breaking compatibility with the rest of language of course.

But back to CrashCourse. Last time we talked about the object model, how literal constants are still instances of classes and about constants in general. Today we shall talk about instances and what are the so-called “values”.

Z2 is a value based language, like C++. It is not a reference based language, like Java. When it comes to most languages, core numerical types are often value types and in Z2, since everything is a class and everything tries to follow one set of rules, everything is a value: all class instances are values. This does not mean that you can’t use references in Z2: you can and they behave as expected. The distinction is made by the ref keyword, which introduces references. In the absence of it, entities are values. I shall use the following short C snippet to illustrate what it means to use values, since C was at least at some point so ubiquitous:

int a = 10;
int b = a;
a = 0;

If you are ever so slightly familiar with programming, that code should be pretty self explanatory, as should “int” and the way these two variables, these two values behave. One line 1, we declare the variable “a” and assign 10 to “a”. On line 2 we declare “b” and assign to it the value of “a”. We have two separate forever independent entities here, “a” and “b”, which are stored in two different memory locations and at both memory locations you can find 10. On line 3, we assign 0 to “a”, but since “a” and “b” are independent, this does not affect “b”. This is the core principle of value types. When dealing with references, two references may refer to the same memory location and changing one variable might “change” the other too (it is not really a change, since there is just one entity accessed under two names), but this is impossible with values. In Z2 every instance of a class is a value, thus no mater how simple or complicated the class is, it behaves like “int” in the sample above.

For simple classes, this value semantic is natural and comes for free. For more complicated classes, classes that manage some resources, you need to write code in order to impose this value semantic. Without additional code, some classes, when trying to copy, might do a “shallow” copy and you can wind up in the situation of two separate instances not being logically independent. As an example, think about implementing a very simple string class that has two members: a pointer to the bytes in the string and its length. Without code to handle the copying of the string, a shallow copy will have two different string instances pointing to the same buffer. There are of course cases where you want a shallow copy, but for now we’ll consider that we want all classes to respect value semantics. Which leads us to the distinction between classes that behave like values by definition and classes where you need to write code to assure this behavior. The first case is called “plain old data” (POD for short). All core types are POD, classes in which all members are POD, static vectors and static vectors of POD classes are all cases of working with POD. The primary goal of a POD class is to store data in memory. The other classes are called “manager classes”: these classes often own or manage some resources and the act of managing these resources is more important and often more complex than just storing things into memory. So the primary goal of a manager class is its side effect. If at least one member of a class is not POD, the entire class is considered not POD. But still, this distinction is mostly unimportant for now and even once we hit more advanced topics, the distinction comes down to one rule: manager classes have a destructor, a copy constructor and an assignment operator. If you add at least one of these to a class, it is automatically considered non POD. But otherwise, there are no distinctions and you generally don’t care about POD or not POD. Containers like Vector care, which can do special optimizations for POD, but as a client of such containers you do not care. The introduction of POD here was probably premature, but I included it for completeness’ sake.

Now its time for some practical examples in which I will be using a POD class. To keep things simple, I won’t be using pointers inside the POD class, even though they are valid inside a POD class. Since POD values are so simple and natural, maybe the compiler can take care of a lot of things for you? Since Z2 is also a research project, we are interested in seeing how much the compiler can give you for free while still being useful and general. Values are so straight-forward, that in most cases, that what you do for copying one or verifying equality or serializing it to disk is self-evident. Why should the programmer have to write this code? How about having to write code only when the general solution is not good enough. So let’s see this class we shall be using:

class Point {
	val x = 0;
	val y = 0;
	val z = 0;
}

This is an incredibly simple 3D Point class. You should never have to write such a class in real programming situations, since the standard Z2 library comes with geometric types, but as a didactic example it will do just fine. For numerical types, we know we can get access to instances using literal constants, but how do we create a new instance of Point?

class Test {
	def @main() {
		Point{};
	}
}

By Using the “Foo{}” syntax. This creates a new instance of Foo, Point in our case. The “{}” syntax was selected to not conflict with the function calling syntax of “()”. When you see Foo{} you immediately know that is a constructor and when you see Foo() you immediately know that it is a function call. Calling a constructor is a static allocation, not a dynamic one. A “box” reserved for this new instance is created somewhere, almost always in memory, and the appropriate constructor is called on the address of that box. In this case, a memory location large enough to hold a Point instance is reserved on the stack and the Point constructor is called upon it. The execution of a constructor is the only supported way of getting a new instance of a class. For numerical classes one can logically assume that each literal constant is the result of a call to a constructor, but this is just a logical abstraction. You can always call the constructor of core numerical types, so Int{} is absolutely identical to the literal constant 0, and DWord{7} is identical to 7u.

The next question: where did the constructor come from? Well, this is one of the first things the compiler offers for free: default constructors. In Z2 there is no such thing as an implicitly uninitialized variable/instance/value. Everything is initialized and every new instance is the result of a constructor. Z2 is a systems programming language, so you can explicitly have a non-initialized instance using a special syntax, but that is an advanced topic that is rarely needed in practice. So everything is initialized by a (default) constructor and in this case that constructor is provided by the compiler. You can of course write your own constructor, but Z2 discourages the writing of constructors that do the default initialization logic. If the compiler provided constructor and your own do the same things, why write one? You can also write constructors that take parameters and Z2 supports named constructors. And when writing these constructors, again the compiler will help you with initialization, so these constructors should only have code that differs from the default constructor. And you can disable the default constructor for a class if you want it to be only be constructable using parameters or a named constructor.

After the execution of the Point constructor, the instance is valid and usable, so things like Point{}.x are readable. But how long is the instance valid? Until the destructor is called. The destructor is again generated by the compiler for you. The destructor will be called in most cases at the end of the statement, but the compiler might delay the execution a bit. Still it is guaranteed to be called before the end of the block. So in most cases, by the time execution hits the “;” before the end of line 3 the destructor is called. This is why I wanted to introduce the concept of POD: for POD types the destructor is guaranteed to be a “non-operation” (a NOP). Logically we still consider that the destructor was executed, but the compiler generates zero instructions for a destructor with POD types. It does nothing. Still, the instance is no longer accessible. If we want to make the instance available after the end of the statement, we need to bind it to a name using the “val” keyword:

class Test {
	def @main() {
		val p = Point{};
	}
}

This is new snippet is almost identical to the previous one: a “box” is still reserved for a new Point instance and the constructor is called. But his time, the name “p” is bound to this instance and the execution of the destructor is delayed to the end of the block. Thus we have created a local variable called “p” that can be used to read or write into our instance and is scope-bound, meaning it will be valid form the point of its declaration to the end of the block. The keyword is called “val”, not “var”, like it is encountered in many other languages, though it is functionally identical. “val” is short for “value”, contrasting with the other keyword that allows you to bind a name, “ref”, short for “reference”, which is used for references.

The same “val” keyword is used when declaring the Point class. The variables x, y and z are scope bound. Since they are inside the body of the class, the class itself is the scope. This means that the 3 variables are constructed when a Point is constructed and destructed when a Point is destructed. I mentioned before that Int{} and 0 are identical, so “val x = 0;” is identical to “val x = Int{};”. I prefer the first version since it is shorter and more natural to people coming from other programming languages.

But free constructors and destructors are not such a big deal. C++ is doing this right now! Let’s see what else we get for free looking at the full sample and its output:

class Point {
	val x = 0;
	val y = 0;
	val z = 0;
}


class FreeStuffTest {
	def @main() {
		val first = Point{};
		
		val second = Point{};
		second.x = 1;
		second.y = 10;
		second.z = 100;
		
		val third = Point{} {
			x = 1;
			y = 10;
			z = 100;
		};
		
		if (first == second)
			System.Out << "first is equal to second\n";
		else
			System.Out << "first is NOT equal to second\n";
		
		if (second != third)
			System.Out << "second is NOT equal to third\n";
		else
			System.Out << "second is equal to third\n";
		
		System.Out << "first: " << first << "\n";
		System.Out << "second: " << second << "\n";
		System.Out << "third: " << third << "\n";
	}
}

first is NOT equal to second
second is equal to third
first: 0 0 0
second: 1 10 100
third: 1 10 100

On line we declare first, a default constructed Point. All its members will be 0. On lines 11-14 we create a second variable, called “second”. Not happy with its default values, we initialize them to 1, 10 and 100, in order. Don’t worry about the multiple initializations, first by the constructor, then by the statements. The back-end compiler should take care of them. This is not a good place to use the constructor bypassing method I mentioned before. But the initialization is a bit verbose, so on lines 16-20 we initialize a third variable, called “third”, with the same values, but using a shorter syntax, available only immediately after a constructor.

Next, we get a taste of some other compiler provided features on lines 22-30: default equality checks. The compiler will automatically take care of == and != checks. Member-wise == and the logical “and” operator is used to implement ==. Member-wise != and the logical “or” operator is used to implement !=. Their purpose is to model value equality. This implementation covers most cases, and when the default is not good enough, all you need to do is provide your own implementations. If you only implement ==, you get != for free as a negation of == and the other way around. And you can implement both if you think you can write a more optimized logical expression. Default equality checks combined with standard library implementations means that you have a wide set of testable entities. Integers, strings, colors, hashmaps, hashmaps of hashmaps and so on are all testable. Other comparisons like are not provided by default by the compiler, but the standard library covers this when appropriate.

Finally, on lines we 32-34 we see another big feature of the compiler capabilities: marshaling! The variables of class Point can be written to a stream without you having to implement this. This is not a case of the compiler generating a call to some toString() method and printing that string. General “toString” support is available though if needed, and yes, the compiler will generate that for you, but this is a case of the compiler generating marshaling code for Point instances. The default implementation uses a member-wise approach, marshaling each member to the stream. If this default implementation is not good enough, again, you can write your own where you can do just about anything. This marshaling solution is provided by a combination of compiler features and library support. Initially you have support for text streams and binary serialization, though the “sys.xml” and “sys.json” packages, when added to your compilation, provide automatic support for “xmlizing” or “jsonizing” most user classes. You can basically take any combination of classes and marshal them to a valid destination using statements about as complex as the lines above.

When you have some technical specification where the binary layout of serialization is a requirement, you’ll want to implement your own compliant methods. But when only wanting to get the data to disk, the default marshaling solution is designed to be sufficient.

One thought on “CrashCourse – 003 – What you get for free

Leave a comment