CrashCourse – 007 – Vector literal introduction

Today I shall introduce the concept of vector classes. There are several vector classes in the standard library, but I shall only focus on the most common and simple of them, Vector. Vector is a template class and it is very similar to std::vector from C++, so it can be used as any normal template class using it’s API defined by the members it has. You can see the source code for this class in sys.core\container\Vector.z2.

But using functions to manipulate a Vector is a bit boring and also easy to figure out by looking at the class interface. Instead, I’ll focus on phase two features and talk about easier and more expressive ways to work with vectors though Vector literals. So, let us first see how a vector with 3 Int items, 1, 2, 3, in this order, looks like:

[1, 2, 3]

Using this syntax, there is no mention of Vector, or Int. The [] syntax denotes (in this case) a Vector instance and always the first element dictates the type of all the elements. The first element has a type of Int, giving the whole literal a type of Vector<Int>. All further elements beyond the first must be compatible with the first, meaning that one could assign any element to the first without any losses. So the following example works:

[1, Byte{2}, 7u, 2'147'483'647u, 1l]

Byte{2} is an 8 bit unsigned value but it fits into a 32 bit signed value, so it is not impartant that the first element is an Int and the second a Byte. This is the same for 7u, a DWord with value 7 and 2'147'483'647u, another unsigned that is the maximum DWord value that still fits inside Int. 1l is a signed 64 bit values, but it still fits into a 32 bit signed value. So the final type of the literal is Vector<Int>.

On the other hand, the following example does not compile:

[1, 1.0f, 2'147'483'648u]

The first element gives us a Vector<Int>, but the second element is 1.0, a Double. Due to how floating point numbers work, a floating point value may not be able to perfectly represent its integer counterpart, so Floats and Doubles are not compatible with Int and you need to convert them manually. So the second element causes the literal to not compile. So does the third. 2'147'483'648u is too big as an unsigned to fit into Int, it being one greater than the maximum value that would fit, 2'147'483'647u.

This short inferred syntax can be handy, especially when writing some code that is very obvious in what it does and further syntax sugar is not wanted. But in Z2, inferred classes can still be manually specified, even though most of the time they would be redundant. So:

val a = Foo{};

is 100% identical to:

val a: Foo = Foo{};

and since there is no such thing as an uninitialized instance in Z2, the two snippets above are 100% identical to:

val a: Foo;

Once can do the same thing with literal vectors. As I mentioned before, the type of that literal I used as an example is Vector<Int>, so:

val p = [1, 2, 3];

is 100% identical to:

val p: Vector<Int> = [1, 2, 3];

Now that we have a variable called p, we can do some stuff with it, like printing it:

val p = [1, 2, 3];
for (val i = 0p; i < p.Length; i++)
	System.Out << p[i] << ' ';
System.Out << "\n";

1 2 3 

This was a normal traversal using a for loop, PtrSize index local variable called i and the Length of the vector. An easier way to traverse the vector is using the foreach loop:

val p = [1, 2, 3];
foreach (v in p)
	System.Out << v << ' ';
System.Out << "\n";

The class can also print itself and by default will print out the number of elements followed by the elements, so [1, 2 3] printed will be “3 1 2 3”:

System.Out << p << "\n";

3 1 2 3

This variable is also mutable, so we can change the values of some elements, add, elements to the end, insert and delete:

p << 4;
System.Out << p << "\n";
		
p << 5 << 6;
System.Out << p << "\n";
		
p.Delete(3);
System.Out << p << "\n";
		
p.Fill(4);
System.Out << p << "\n";
		
p.Add(7000);
System.Out << p << "\n";
		
p.DeleteAll(4);
System.Out << p << "\n";
		
p.DeleteAll(7000);
System.Out << p << "\n";

4 1 2 3 4
6 1 2 3 4 5 6
5 1 2 4 5 6
5 4 4 4 4 4
6 4 4 4 4 4 7000
1 7000
0

The final line in the output is interesting: the vector remains with a count of zero elements. Vector instances can have zero elements, both after several operations, like in the example above, or they can be created from the go with zero elements. The later needs more attention, since you can’t just go [] to express an empty literal because the compiler can’t infer the type of elements in the vector. A few paragraphs ago I mentioned how these literals are just normal class instances and the class name is Vector<Int>, so you can instance them as a normal class, Vector<Int>{} in order to get an empty vector instance. There is also a shorter syntax: <Int>[]. This is equivalent to the one before and it allows you to create a vector of Ints with zero elements, an empty vector. I won’t explain today how and why this works to keep the post short. But it is a useful short syntax to remember.

One interesting tidbit to note is that these empty vectors do not interact with the heap, so creating an empty vector is very fast and doesn’t allocate any RAM.

When declaring all the previous literals, the number of elements, the Length of the vector was determined implicitly by the compiler counting how many values you provided. This number can be explicitly specified by using the syntax of [number_of_elements: list_of_elements]. So [3: 1, 2, 3] is identical to [1, 2, 3], but this time we explicitly specified the number of items. The explicit number must be equal to the implicit number, so [2: 1, 2, 3] or [10: 1, 2, 3] will not compile.

What use is it then to be able to provide the item number if it must be the same as the actual number of elements? The secondary reason is safety. Let’s say you have a table that must be introduced verbatim in code and if you know how many elements there are, the compiler can help find the error of leaving out an item or two.

But the primary reason is the ellipsis syntax, [number_of_elements: list_of_elements, ...]. The ... at the end of the sequence means to repeat the last element so many times that the total element count of the sequence is equal to the provided explicit count. This means that while [10: 1, 2, 3] is a compilation error, [10: 1, 2, 3, …] is equal to [1, 2, 3, 3, 3, 3, 3, 3, 3, 3]. Yo may be inclined to think that it is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], but as said, the syntax means repeat the final element.

But it is not a dumb copy or anything similar. In this context, “repeat” means evaluate the last element a number of times: execute the same code sequence multiple times. This works very well with literals that have only one item and a count (but they can have any number of items) and can be used to achieve a lot of interesting stuff and even some meta-programming. Now I’m getting ahead of myself. Going back to the repeat code execution/evaluation paradigm the syntax of [5: 1, …] means to evaluate 1 five times and the syntax of [5: foo(), …] means evaluate foo() 5 times, while the syntax of [5: 1, 2, foo(), …] means evaluate 1 once, 2 once and foo 3 times (5 – 2). Here it is in action:

namespace org.z2legacy.ut.misc;

class VectorSample {
	def @main() {
		val a = [5: 1, ...];
		System.Out << a << "\n";
		
		val b = [5: foo(), ...];
		System.Out << b << "\n";
		
		val c = [5: 1, 2, foo(), ...];
		System.Out << c << "\n";
	}
	
	def foo(): Int {
		return dummy++;
	}
	
	val dummy = 100;
}

5 1 1 1 1 1
5 100 101 102 103 104
5 1 2 105 106 107

The fact that the last item is executed multiple times shows the way how this feature can be used for meta-programming, but there is so much more to this subject. And since I gave the example that [10: 1, 2, 3, …] is not [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], one can do something like val i = 1; [10: i++] to get [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

One final but very important note is that this evaluation and all literal construction is executed, whenever possible, at compile time. Even if it is delegated to run time, it is done as efficiently as possible. To take the [10: i++] example, supposing it is executed at run time, this will not result in an empty vector that grows to accommodate one element, i++ is computed in a temporary and then the value gets copied into the vector, then the vectors grows again to accommodate another i++ and so on. A single memory allocation happens with the correct number of elements (10) and each element is not copy constructed (if possible) but instead in-place constructed, so you should not have any extra copy constructors called. So if we have a class Foo with a default constructor with a side effect, a copy constructor with a side effect and an assignment operator with a side effect and we do val v = [100: Foo{}, …], this will result in one Vector<Int>{} constructor and 100 Foo{} default constructors, the side effect of said constructor being executed 100 times and the side effect of the copy constructor and assignment operator as well as the main effect of these methods (making copies) will not be executed since they are not called at all.

So how does this all function? Vector is a dynamic buffer with a Length and a Capacity, similar to std::vector. The Capacity property gives it amortized growth rate. All vector types have Length and Capacity, but not all have these properties mutable. One special case is where neither are mutable, so we have a fixed Length and Capacity vector. Since accessing elements beyond Length is an error, such vectors are considered fixed Length. If you pass such a vector as a reference to a function, the function will be ale to change items, but not the length. You know how many items there are but you can’t modify this number. If you pass it as const, the elements will be read only and the Length will remain immutable. This is the base case, the CArray, a class from the standard library not yet introduced.

At the other extreme is Vector, a class with mutable Length and Capacity. You both know the Length and can change it. Changing the Length might change Capacity too. And changing Capacity directly might allow for faster insertions if you know the number of elements that you are going to add to a vector. Passing such a Vector as a ref parameter will allow you to modify both the items and the Length and Capacity of the Vector.

So when designing the interface to something that needs a vector, the question to ask is: is the Length mutable or not? This will allow you to determine the correct flavor of Vector. But as general rule, Vector is the main and most commonly encountered vector flavor. Unless you have a good reason not to use it, always use Vector.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s