CrashCourse – 007 – Vector literal introduction

Today I shall introduce the concept of vector classes. There are several vector classes in the standard library, but I shall only focus on the most common and simple of them, Vector. Vector is a template class and it is very similar to std::vector from C++, so it can be used as any normal template class using it’s API defined by the members it has. You can see the source code for this class in sys.core\container\Vector.z2.

But using functions to manipulate a Vector is a bit boring and also easy to figure out by looking at the class interface. Instead, I’ll focus on phase two features and talk about easier and more expressive ways to work with vectors though Vector literals. So, let us first see how a vector with 3 Int items, 1, 2, 3, in this order, looks like:

[1, 2, 3]

Using this syntax, there is no mention of Vector, or Int. The [] syntax denotes (in this case) a Vector instance and always the first element dictates the type of all the elements. The first element has a type of Int, giving the whole literal a type of Vector<Int>. All further elements beyond the first must be compatible with the first, meaning that one could assign any element to the first without any losses. So the following example works:

[1, Byte{2}, 7u, 2'147'483'647u, 1l]

Byte{2} is an 8 bit unsigned value but it fits into a 32 bit signed value, so it is not impartant that the first element is an Int and the second a Byte. This is the same for 7u, a DWord with value 7 and 2'147'483'647u, another unsigned that is the maximum DWord value that still fits inside Int. 1l is a signed 64 bit values, but it still fits into a 32 bit signed value. So the final type of the literal is Vector<Int>.

On the other hand, the following example does not compile:

[1, 1.0f, 2'147'483'648u]

The first element gives us a Vector<Int>, but the second element is 1.0, a Double. Due to how floating point numbers work, a floating point value may not be able to perfectly represent its integer counterpart, so Floats and Doubles are not compatible with Int and you need to convert them manually. So the second element causes the literal to not compile. So does the third. 2'147'483'648u is too big as an unsigned to fit into Int, it being one greater than the maximum value that would fit, 2'147'483'647u.

This short inferred syntax can be handy, especially when writing some code that is very obvious in what it does and further syntax sugar is not wanted. But in Z2, inferred classes can still be manually specified, even though most of the time they would be redundant. So:

val a = Foo{};

is 100% identical to:

val a: Foo = Foo{};

and since there is no such thing as an uninitialized instance in Z2, the two snippets above are 100% identical to:

val a: Foo;

Once can do the same thing with literal vectors. As I mentioned before, the type of that literal I used as an example is Vector<Int>, so:

val p = [1, 2, 3];

is 100% identical to:

val p: Vector<Int> = [1, 2, 3];

Now that we have a variable called p, we can do some stuff with it, like printing it:

val p = [1, 2, 3];
for (val i = 0p; i < p.Length; i++)
	System.Out << p[i] << ' ';
System.Out << "\n";

1 2 3 

This was a normal traversal using a for loop, PtrSize index local variable called i and the Length of the vector. An easier way to traverse the vector is using the foreach loop:

val p = [1, 2, 3];
foreach (v in p)
	System.Out << v << ' ';
System.Out << "\n";

The class can also print itself and by default will print out the number of elements followed by the elements, so [1, 2 3] printed will be “3 1 2 3”:

System.Out << p << "\n";

3 1 2 3

This variable is also mutable, so we can change the values of some elements, add, elements to the end, insert and delete:

p << 4;
System.Out << p << "\n";
		
p << 5 << 6;
System.Out << p << "\n";
		
p.Delete(3);
System.Out << p << "\n";
		
p.Fill(4);
System.Out << p << "\n";
		
p.Add(7000);
System.Out << p << "\n";
		
p.DeleteAll(4);
System.Out << p << "\n";
		
p.DeleteAll(7000);
System.Out << p << "\n";

4 1 2 3 4
6 1 2 3 4 5 6
5 1 2 4 5 6
5 4 4 4 4 4
6 4 4 4 4 4 7000
1 7000
0

The final line in the output is interesting: the vector remains with a count of zero elements. Vector instances can have zero elements, both after several operations, like in the example above, or they can be created from the go with zero elements. The later needs more attention, since you can’t just go [] to express an empty literal because the compiler can’t infer the type of elements in the vector. A few paragraphs ago I mentioned how these literals are just normal class instances and the class name is Vector<Int>, so you can instance them as a normal class, Vector<Int>{} in order to get an empty vector instance. There is also a shorter syntax: <Int>[]. This is equivalent to the one before and it allows you to create a vector of Ints with zero elements, an empty vector. I won’t explain today how and why this works to keep the post short. But it is a useful short syntax to remember.

One interesting tidbit to note is that these empty vectors do not interact with the heap, so creating an empty vector is very fast and doesn’t allocate any RAM.

When declaring all the previous literals, the number of elements, the Length of the vector was determined implicitly by the compiler counting how many values you provided. This number can be explicitly specified by using the syntax of [number_of_elements: list_of_elements]. So [3: 1, 2, 3] is identical to [1, 2, 3], but this time we explicitly specified the number of items. The explicit number must be equal to the implicit number, so [2: 1, 2, 3] or [10: 1, 2, 3] will not compile.

What use is it then to be able to provide the item number if it must be the same as the actual number of elements? The secondary reason is safety. Let’s say you have a table that must be introduced verbatim in code and if you know how many elements there are, the compiler can help find the error of leaving out an item or two.

But the primary reason is the ellipsis syntax, [number_of_elements: list_of_elements, ...]. The ... at the end of the sequence means to repeat the last element so many times that the total element count of the sequence is equal to the provided explicit count. This means that while [10: 1, 2, 3] is a compilation error, [10: 1, 2, 3, …] is equal to [1, 2, 3, 3, 3, 3, 3, 3, 3, 3]. Yo may be inclined to think that it is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], but as said, the syntax means repeat the final element.

But it is not a dumb copy or anything similar. In this context, “repeat” means evaluate the last element a number of times: execute the same code sequence multiple times. This works very well with literals that have only one item and a count (but they can have any number of items) and can be used to achieve a lot of interesting stuff and even some meta-programming. Now I’m getting ahead of myself. Going back to the repeat code execution/evaluation paradigm the syntax of [5: 1, …] means to evaluate 1 five times and the syntax of [5: foo(), …] means evaluate foo() 5 times, while the syntax of [5: 1, 2, foo(), …] means evaluate 1 once, 2 once and foo 3 times (5 – 2). Here it is in action:

namespace org.z2legacy.ut.misc;

class VectorSample {
	def @main() {
		val a = [5: 1, ...];
		System.Out << a << "\n";
		
		val b = [5: foo(), ...];
		System.Out << b << "\n";
		
		val c = [5: 1, 2, foo(), ...];
		System.Out << c << "\n";
	}
	
	def foo(): Int {
		return dummy++;
	}
	
	val dummy = 100;
}

5 1 1 1 1 1
5 100 101 102 103 104
5 1 2 105 106 107

The fact that the last item is executed multiple times shows the way how this feature can be used for meta-programming, but there is so much more to this subject. And since I gave the example that [10: 1, 2, 3, …] is not [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], one can do something like val i = 1; [10: i++] to get [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

One final but very important note is that this evaluation and all literal construction is executed, whenever possible, at compile time. Even if it is delegated to run time, it is done as efficiently as possible. To take the [10: i++] example, supposing it is executed at run time, this will not result in an empty vector that grows to accommodate one element, i++ is computed in a temporary and then the value gets copied into the vector, then the vectors grows again to accommodate another i++ and so on. A single memory allocation happens with the correct number of elements (10) and each element is not copy constructed (if possible) but instead in-place constructed, so you should not have any extra copy constructors called. So if we have a class Foo with a default constructor with a side effect, a copy constructor with a side effect and an assignment operator with a side effect and we do val v = [100: Foo{}, …], this will result in one Vector<Int>{} constructor and 100 Foo{} default constructors, the side effect of said constructor being executed 100 times and the side effect of the copy constructor and assignment operator as well as the main effect of these methods (making copies) will not be executed since they are not called at all.

So how does this all function? Vector is a dynamic buffer with a Length and a Capacity, similar to std::vector. The Capacity property gives it amortized growth rate. All vector types have Length and Capacity, but not all have these properties mutable. One special case is where neither are mutable, so we have a fixed Length and Capacity vector. Since accessing elements beyond Length is an error, such vectors are considered fixed Length. If you pass such a vector as a reference to a function, the function will be ale to change items, but not the length. You know how many items there are but you can’t modify this number. If you pass it as const, the elements will be read only and the Length will remain immutable. This is the base case, the CArray, a class from the standard library not yet introduced.

At the other extreme is Vector, a class with mutable Length and Capacity. You both know the Length and can change it. Changing the Length might change Capacity too. And changing Capacity directly might allow for faster insertions if you know the number of elements that you are going to add to a vector. Passing such a Vector as a ref parameter will allow you to modify both the items and the Length and Capacity of the Vector.

So when designing the interface to something that needs a vector, the question to ask is: is the Length mutable or not? This will allow you to determine the correct flavor of Vector. But as general rule, Vector is the main and most commonly encountered vector flavor. Unless you have a good reason not to use it, always use Vector.

Advertisements

Z2C PT9.2 “Kitchen Sink” Released

For the last couple of weeks I’ve been looking at a white on black console output, checking automated unit-testing results and frantically fixing bugs every time a test failed. But the output of the text suite has been stable and promising in the last few days:

1386 out of 1386 tests have passed in a total of 330.129 seconds.

We are golden! Roll the tape with the release music…

So finally, 4 months too late, with a total of 7 months passed since the last release, a new version is finished! PT 9.2 is out! Normally, we like to keep releases smaller, having a fixed set of features and work on that until the version is stable enough. But this time we did not manage that and in the 4 months over the planned time and even in the 3 months before, a lot of non-planned features managed to receive attention. Some are in the latest release, some are not quite done (still!) and are deactivated for this release and some were so experimental that they were removed for good. So this release is more of a kitchen-sink release, hence the code-name.

Let us take a look at the main new features:

Reimplemented package system

The old design for the package system has been updated. Things are still pretty similar, but the new design is more robust, scalable and has more defined roles and rules. In addition to this conceptual and design cleanup/enhancement, the implementation has been redone pretty much from scratch.

Cache system for building

The new design for the package system needs to locate resources and this is so much more important because the entire Z2 building process is controlled by the dependency manager. So the dependency manager needs to be able to locate resources that are later used to locate other types of resources for optimal builds. All these lookups are now helped by a cache.

This cache is only one of the needed features for really fast compilation times. The goal is for Z2 to compile faster than C++, even when using the C++ backend. The second pillar of such compilation times is binary packages. Binary packages are not implemented yet, but even the cache on its own helps a lot. I will detail the results in another post.

Redesigned vectors

The existing vector design has proven to be not ideal and there has been an attempt to replace it for ages now. But vector design is really complicated and might be one of the most complicated subjects, so replacing them was no easy task.

The new design is much better and more natural to use. It is still not perfect so expect minor updates, but no more compatibility breaking major changes.

Function parameters are const

Z2 has inherited the concept of const-correctness from C++. But it does need some updates. If we can’t improve on it, maybe it is better to remove this concept from the language.

One of the improvements is the solution to the problem of mandatory const keywords in function parameters. As in C++, the de-facto standard for passing parameters to functions is by const and often forgetting to add the const is a (minor and often harmless) error. Yet the syntax for this de-facto standard is more verbose than the alternative which is rarely used.

Starting with PT9.2, all function parameters come in two flavors: const by default or reference parameters. Const parameters do not need to be decorated by a keyword and are immutable. Reference parameters are introduced by the ref keyword and they can mutate the passed-in object.

This change is experimental and at the same time quite controversial. But this is the optimal time to test its viability in practice, when the user base of the language is low.

Fancy logs

There are now new fancy HTML5 Bootstrap logs. Really fancy! With color coding, collapsible panels and navigation options.

Going over these logs can be a joy when compared to simple non-navigable logs. They are also very slow to generate and not that useful if I’m honest, so they have been deactivated for this build. It is better to compile fast rather than having Bootsrap enabled logs.

Bug fixing

While a longer development cycle can introduce more bugs, there is also more time to fix existing bugs. A ton of bugs has been fixed and Z2 has never been this stable. Probably another 4 months of bug-fixing without adding any new features and I would call the compiler production ready.

Java backend

I’m mentioning this last because the new Java backend does not work. It tries to, but it fails. But if it will work, this might add a new backend. I’m not sure if Java is a good fit for the Z2 object model and that’s putting it politely.


So what are the next steps? Well, there are two sets of plans for what will make it into PT 9.3. I can’t tell exactly which one will make it though. And after 9.3, PT 10 will come out. This should be a really major and mature version.

But until then, even with the long development cycle, PT 9.2 still has some problems:

  • Code completion. This was developed and tested in the isolating context of a code editor. It works decently in such a context. But in the context of a full IDE, it can be sometimes annoying, eating shortcut keys, popping up when not needed and so on.
  • Some const-correctness problems. The new changes to const parameters has introduced some const-correctness problems. These problems are hard to find so it shouldn’t effect potential users, but they are there.
  • Bugs. So many bugs solved, but so many new features. While beyond the 2 problems mentioned above there are no known bugs in this version (but there are of course many incomplete features we know about), I’m sure this larger release has an above average number of bugs.

In about two weeks, PT 9.2.1 will come out, addressing these and other problems!