CrashCourse – 002 – The object model

Today I am going to talk about the Z2 object model. I’m afraid for the second post in a row, I will be forced to move fairly quickly and not have time to fully explain all the concepts. Hopefully, starting with post 3 in this series, I can slow down a bit.

Last time I showed the “hello world” snippet and introduced the concept of pure OOP languages: everything you manipulate is an object, a.k.a. an instance of a class. In that sample we printed a literal string constant to the STDOUT and it too must then be an object. Objects have members, so we might try and use some of them. The standard Z2 library uses the convention that if something can be directly counted in a straight-forward manner, it will have two members, called Length and Capacity. These two members must be at least immutable, but some countable classes, like vectors, can have them mutable.

I shall attach a cropped screenshot now of a sample that uses these two members, together with the compilation and execution result from ZIDE, but for the rest of the post I’ll use inline source code (to allow selection and preserve space):

2

As expected, the “Hello world!\n” string literal is a vector like object with countable elements, having a Length of 13. It is also has a Capacity of 13. The difference between Capacity and Length signals how much extra storage beyond Length is in use as a buffer available for future growth and Capacicty always greater or equal to Length. In this case it is equal, but do not be surprised when running this sample to find it greater, probably rounded to 16. The class that is in use here is String and it is UTF8.

Note
In Z2 strings are not meaningfully “null terminated”. The String class and friends make sure that there is a ‘\0’ (null) character at the end of the string, at index = Length, outside the valid index range, but the String class and the entire library does not care about that character. Strings have proper lengths and a String made out of 50 ‘\0’ characters is a legitimate String with Length 50. Reading or writing to the ‘\0’ character is a run-time error when done in user code. The only reason the ‘\0’ character is appended automatically is to make possible the passing of Strings to different APIs, which more often than not have traditionally expected null terminated strings. If you create a String for the purpose of passing it to such APIs, it might be a good idea to not store ‘\0’ characters in the middle of it.

The design choice of still adding a token ‘\0’ character to the end of a string does have consequences though. It makes having String slices extremely difficult, if not impossible.

Hold on a minute! The string is an object, but so is everything! Does that mean that String.Length is an object too?

class Hello {
	def @main() {
		System.Out << "The class of Length is: " << "Hello world!\n".Length.class << "\n";
	}
}

The class of Length is: PtrSize

Yes! The class of Length is PtrSize. In Z2 there are classes for signed and unsigned integers. Like Int and DWord. Here the systems programming language nature of Z2 is exposed a bit. Why isn’t it a signed or unsigned integer and instead it is a PtrSize? When counting random stuff, you should use the appropriate type required by the problem you are trying to solve. When counting eggs in a basket, you may use a DWord to store unsigned values. Or you may use Int out of convenience or maybe you want to use negative values to store stolen eggs. Or maybe even floating point numbers if those eggs are fancy. But when counting, offsetting or indexing into heap, and memory in general, you must use PtrSize. It is an unsigned integer large enough to be used for addressing heap size on your platform. PtrSize is almost always associated with traversing containers, so we can safely ignore it for now and focus on the bread and butter integer classes.

Like Int! Int is always signed and currently on all supported platforms it is a 32 bit value. Literal constants like 0, 1, -1, -55566847, 0xFF, 0b101, -2’147’483’648 and 2’147’483’647 are all instances of the Int class. The syntax of a integer literal constant in Z2 is an optional sign prefix (+ or -), an optional base prefix (0x, 0o or 0b), at least one digit fitting said base and an optional suffix. The optional suffix generally tells you the actual class of the constant. No suffix always means Int. The ‘u’ suffix always means DWord, the 32 bit unsigned integer. So 0 is an Int, 0u is a DWord. The ‘ character is used to optionally separate thousands, but it can be placed anywhere (except before the first digit). So 10000, 10’000 and 1000’0 are all the same constant. Using ‘ is pure syntactic sugar.

So far so good. But performance purists might frown upon core numerical types being classes. The Int class is a normal class with source code of several KiB found in the standard library. It has static and non-static members. Isn’t this slow? Especially for a system programming language?

No! Int is not a class that boxes or unboxes some hidden more fundamental type (and there is no automatic boxing in Z2). It may have a lot of members and it looks and behaves like a normal user class, but this is just syntactic sugar and compiler technology. Behind the scenes, all Int instances are “plain ints”, using the C/C++ definition of “int”. There is native support for manipulating them with the hardware. Int instances can be loaded into CPU registers and manipulated in assembly code directly. It is a strict requirement for all compliant Z2 compilers to have the same performance when using all core numerical classes as the equivalent optimized C code. There are even benchmarks in place making sure of that.

This design assures that there is 0% overhead to having Int be a class rather than some intrinsic keyword introduced type, but the strict performance requirements does mean that the Int class, together with all other core numeric classes, do have some limitations that other user classes do not have. These classes can’t have fields for starters. An Int is an Int, with a fixed hardware imposed structure. If you stick something inside it, like another Int, it will no longer be an Int. Static fields are permitted. Another limitation is that these classes can’t have virtual methods. There is a workaround for this in some specific situations, but that is a fairly advanced topic. But everything else goes. You can add as many symbolic constants, properties or methods to Int (static or not, both are allowed). These classes (and all classes in general) can be reopen, so third party libraries might add extra functionality to them.

I described Int and DWord as core classes. Core classes are a bit special because they have those hardware and performance required limitations. There is a just a small number of core classes that I won’t list for now, but we already saw Int, DWord and PtrSize. They all map directly to some native hardware resource. String is not a core class, since current CPUs do not have some atomic intrinsic understanding of strings. String is a non-core class, being able to benefit from all the features of the language. But it is still a “system” class. System classes are normal classes that are part of the system package, meaning they are available on all supported platforms. “Hello”, the class used in the snippets above is not a system class, since we wrote it from scratch and was not available before that.

Classes are introduced by the “class” keyword. The following snippet introduces 3 new classes into the default namespace:

class Foo {
}

class Bar {
}

class Baz {
}

This is the minim syntax required to define a valid class. Class members use a “block model”, meaning that in the block defined by { and } you must include the class members. All classes must have at least one block, but can have an arbitrary amount of blocks. The one required block is called the default block. A block also imposes access rights upon the members declared inside. The default block is public, so anything declared inside will be fully visible to everyone. Standard OOP access rights apply to Z2, so when designing classes, one would generally use a mix of public, private or protected blocks.

So lets add some members to the classes. I have only just introduced the concept of literal numerical constants and last post I introduced the @main method, so that is all I shall be using today to give a final but more complicated example:

class Foo {
	const AA = 7;
	
	def @main() {
		System.Out << AA << "\n";
		System.Out << Foo.AA << "\n";
		
		System.Out << Bar.AA << "\n";
		System.Out << Bar.BB << "\n";
		
		System.Out << Baz.AA << "\n";
	}
}

class Bar {
	const AA = Baz.AA + 100;
	const BB = 1000;
}

class Baz {
	const AA = 99;
}

Here is the output of the program:

7
7
199
1000
99

Literal constants are useful and convenient, but sometimes you want to define a symbolic constant. Line 2 does just that. 7 is our literal constant and we could use that, but instead using the “const” keyword, we bind the name “AA” to 7, thus creating a symbolic constant. With a class of Int. On line 5, we print the constant using its name, rather than the literal. The constant we defined inside the class Foo can be accessed directly since the @main method is in the same class. But if we wish, we can fully qualify the constant using the class.member syntax, like we did on line 6. AA and Foo.AA are identical and refer to the same symbolic constant.

Things change a little when on line 8, we try to print a constant from the class Bar. @main is in Foo so we can’t refer to Bar.AA without fully qualifying it. This line also shows that Foo.AA and Bar.AA are two different entities, even though they have the same names. So is Baz.AA. Names are locally unique within a class only.

One thing you may find strange when coming from C++ is that on lines 8 and 9 we refer to entities we declare on lines that come after. Not only is Foo referring to Bar.AA, but Bar.AA is dependent on Baz.AA, which in turn is declared after in the source code. C++ uses a rather archaic declaration model. One can learn to deal with its limitations, but juggling around hundreds of include files in large project is never as easy as it should be. In Z2 you have no such concerns. The compiler is just a bunch of algorithms running on powerful enough hardware relative to the tasks it is trying to accomplish. It has no problems with declaration orders and dependencies. The compiler sees all and is all-knowing. If you let it! What we do is artificially and willingly limit its ability to see. When dealing with classes in the same module, we use access rights like private and protected to hide members from classes. When in multi-module situations, we do the same to hide full classes.

And while the compiler is a machine, we as programmers are not. We certainly benefit form having a sensible and maintainable project structure, so while the compiler allows you to structure modules in the most difficult to use and counter-intuitive way possible if you really wish to, it is good design to use natural and intuitive structures and declaration orders. The sample above becomes slightly more readable if we change the declaration order to Baz, Bar and Foo. It also becomes more readable if we give more meaningful names to constants.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s