CrashCourse – 002 – The object model

Today I am going to talk about the Z2 object model. I’m afraid for the second post in a row, I will be forced to move fairly quickly and not have time to fully explain all the concepts. Hopefully, starting with post 3 in this series, I can slow down a bit.

Last time I showed the “hello world” snippet and introduced the concept of pure OOP languages: everything you manipulate is an object, a.k.a. an instance of a class. In that sample we printed a literal string constant to the STDOUT and it too must then be an object. Objects have members, so we might try and use some of them. The standard Z2 library uses the convention that if something can be directly counted in a straight-forward manner, it will have two members, called Length and Capacity. These two members must be at least immutable, but some countable classes, like vectors, can have them mutable.

I shall attach a cropped screenshot now of a sample that uses these two members, together with the compilation and execution result from ZIDE, but for the rest of the post I’ll use inline source code (to allow selection and preserve space):

2

As expected, the “Hello world!\n” string literal is a vector like object with countable elements, having a Length of 13. It is also has a Capacity of 13. The difference between Capacity and Length signals how much extra storage beyond Length is in use as a buffer available for future growth and Capacicty always greater or equal to Length. In this case it is equal, but do not be surprised when running this sample to find it greater, probably rounded to 16. The class that is in use here is String and it is UTF8.

Note
In Z2 strings are not meaningfully “null terminated”. The String class and friends make sure that there is a ‘\0’ (null) character at the end of the string, at index = Length, outside the valid index range, but the String class and the entire library does not care about that character. Strings have proper lengths and a String made out of 50 ‘\0’ characters is a legitimate String with Length 50. Reading or writing to the ‘\0’ character is a run-time error when done in user code. The only reason the ‘\0’ character is appended automatically is to make possible the passing of Strings to different APIs, which more often than not have traditionally expected null terminated strings. If you create a String for the purpose of passing it to such APIs, it might be a good idea to not store ‘\0’ characters in the middle of it.

The design choice of still adding a token ‘\0’ character to the end of a string does have consequences though. It makes having String slices extremely difficult, if not impossible.

Hold on a minute! The string is an object, but so is everything! Does that mean that String.Length is an object too?

class Hello {
	def @main() {
		System.Out << "The class of Length is: " << "Hello world!\n".Length.class << "\n";
	}
}

The class of Length is: PtrSize

Yes! The class of Length is PtrSize. In Z2 there are classes for signed and unsigned integers. Like Int and DWord. Here the systems programming language nature of Z2 is exposed a bit. Why isn’t it a signed or unsigned integer and instead it is a PtrSize? When counting random stuff, you should use the appropriate type required by the problem you are trying to solve. When counting eggs in a basket, you may use a DWord to store unsigned values. Or you may use Int out of convenience or maybe you want to use negative values to store stolen eggs. Or maybe even floating point numbers if those eggs are fancy. But when counting, offsetting or indexing into heap, and memory in general, you must use PtrSize. It is an unsigned integer large enough to be used for addressing heap size on your platform. PtrSize is almost always associated with traversing containers, so we can safely ignore it for now and focus on the bread and butter integer classes.

Like Int! Int is always signed and currently on all supported platforms it is a 32 bit value. Literal constants like 0, 1, -1, -55566847, 0xFF, 0b101, -2’147’483’648 and 2’147’483’647 are all instances of the Int class. The syntax of a integer literal constant in Z2 is an optional sign prefix (+ or -), an optional base prefix (0x, 0o or 0b), at least one digit fitting said base and an optional suffix. The optional suffix generally tells you the actual class of the constant. No suffix always means Int. The ‘u’ suffix always means DWord, the 32 bit unsigned integer. So 0 is an Int, 0u is a DWord. The ‘ character is used to optionally separate thousands, but it can be placed anywhere (except before the first digit). So 10000, 10’000 and 1000’0 are all the same constant. Using ‘ is pure syntactic sugar.

So far so good. But performance purists might frown upon core numerical types being classes. The Int class is a normal class with source code of several KiB found in the standard library. It has static and non-static members. Isn’t this slow? Especially for a system programming language?

No! Int is not a class that boxes or unboxes some hidden more fundamental type (and there is no automatic boxing in Z2). It may have a lot of members and it looks and behaves like a normal user class, but this is just syntactic sugar and compiler technology. Behind the scenes, all Int instances are “plain ints”, using the C/C++ definition of “int”. There is native support for manipulating them with the hardware. Int instances can be loaded into CPU registers and manipulated in assembly code directly. It is a strict requirement for all compliant Z2 compilers to have the same performance when using all core numerical classes as the equivalent optimized C code. There are even benchmarks in place making sure of that.

This design assures that there is 0% overhead to having Int be a class rather than some intrinsic keyword introduced type, but the strict performance requirements does mean that the Int class, together with all other core numeric classes, do have some limitations that other user classes do not have. These classes can’t have fields for starters. An Int is an Int, with a fixed hardware imposed structure. If you stick something inside it, like another Int, it will no longer be an Int. Static fields are permitted. Another limitation is that these classes can’t have virtual methods. There is a workaround for this in some specific situations, but that is a fairly advanced topic. But everything else goes. You can add as many symbolic constants, properties or methods to Int (static or not, both are allowed). These classes (and all classes in general) can be reopen, so third party libraries might add extra functionality to them.

I described Int and DWord as core classes. Core classes are a bit special because they have those hardware and performance required limitations. There is a just a small number of core classes that I won’t list for now, but we already saw Int, DWord and PtrSize. They all map directly to some native hardware resource. String is not a core class, since current CPUs do not have some atomic intrinsic understanding of strings. String is a non-core class, being able to benefit from all the features of the language. But it is still a “system” class. System classes are normal classes that are part of the system package, meaning they are available on all supported platforms. “Hello”, the class used in the snippets above is not a system class, since we wrote it from scratch and was not available before that.

Classes are introduced by the “class” keyword. The following snippet introduces 3 new classes into the default namespace:

class Foo {
}

class Bar {
}

class Baz {
}

This is the minim syntax required to define a valid class. Class members use a “block model”, meaning that in the block defined by { and } you must include the class members. All classes must have at least one block, but can have an arbitrary amount of blocks. The one required block is called the default block. A block also imposes access rights upon the members declared inside. The default block is public, so anything declared inside will be fully visible to everyone. Standard OOP access rights apply to Z2, so when designing classes, one would generally use a mix of public, private or protected blocks.

So lets add some members to the classes. I have only just introduced the concept of literal numerical constants and last post I introduced the @main method, so that is all I shall be using today to give a final but more complicated example:

class Foo {
	const AA = 7;
	
	def @main() {
		System.Out << AA << "\n";
		System.Out << Foo.AA << "\n";
		
		System.Out << Bar.AA << "\n";
		System.Out << Bar.BB << "\n";
		
		System.Out << Baz.AA << "\n";
	}
}

class Bar {
	const AA = Baz.AA + 100;
	const BB = 1000;
}

class Baz {
	const AA = 99;
}

Here is the output of the program:

7
7
199
1000
99

Literal constants are useful and convenient, but sometimes you want to define a symbolic constant. Line 2 does just that. 7 is our literal constant and we could use that, but instead using the “const” keyword, we bind the name “AA” to 7, thus creating a symbolic constant. With a class of Int. On line 5, we print the constant using its name, rather than the literal. The constant we defined inside the class Foo can be accessed directly since the @main method is in the same class. But if we wish, we can fully qualify the constant using the class.member syntax, like we did on line 6. AA and Foo.AA are identical and refer to the same symbolic constant.

Things change a little when on line 8, we try to print a constant from the class Bar. @main is in Foo so we can’t refer to Bar.AA without fully qualifying it. This line also shows that Foo.AA and Bar.AA are two different entities, even though they have the same names. So is Baz.AA. Names are locally unique within a class only.

One thing you may find strange when coming from C++ is that on lines 8 and 9 we refer to entities we declare on lines that come after. Not only is Foo referring to Bar.AA, but Bar.AA is dependent on Baz.AA, which in turn is declared after in the source code. C++ uses a rather archaic declaration model. One can learn to deal with its limitations, but juggling around hundreds of include files in large project is never as easy as it should be. In Z2 you have no such concerns. The compiler is just a bunch of algorithms running on powerful enough hardware relative to the tasks it is trying to accomplish. It has no problems with declaration orders and dependencies. The compiler sees all and is all-knowing. If you let it! What we do is artificially and willingly limit its ability to see. When dealing with classes in the same module, we use access rights like private and protected to hide members from classes. When in multi-module situations, we do the same to hide full classes.

And while the compiler is a machine, we as programmers are not. We certainly benefit form having a sensible and maintainable project structure, so while the compiler allows you to structure modules in the most difficult to use and counter-intuitive way possible if you really wish to, it is good design to use natural and intuitive structures and declaration orders. The sample above becomes slightly more readable if we change the declaration order to Baz, Bar and Foo. It also becomes more readable if we give more meaningful names to constants.

Advertisements

Compiler versioning system

The Z2 compiler, called z2c has almost reached its next internal stable version, PT6. It is in the fixing stage of the last few known bugs and undergoing additional testing and should be done before Christmas. Since the detailed change-log is lost (see the first post and the blog) and the change-log is meaningless anyway to people who are not already using the compiler, I thought this would be a great moment to detail the versioning scheme, release structure and schedule of the compiler.

The compiler is developed in two stages. The first stage will have the full set of core features and a full standard library and will officially be labeled “1.0”. Stage two will enhance the language with some optional meta-programming related features and add a few others inspired by some scripting languages. Stage two can be safely ignored for now. While we do have the exact plan on what features we want to add into stage two, development on them won’t begin until stage one is done.

For stage one, we are labeling each internal compiler version with PT##, where ## is a number. PT1 though PT5 are done and PT6 is just about finshed. PT comes from “pre-tier”, signaling the alpha unreleased quality of the software.

Depending on the progress we actually achieve, either PT8 or PT9 will be the first version available for public preview, so right now it is not possible to download a public working compiler from anywhere. Since these milestones are already labeled with “pre”, there is no use to create special preview releases. You’ll be able to download the exact compiler package we use for testing. Starting with PT10, the software should reach beta level quality and by PT15 we’d like to reach 1.0.

Each compiler package comes with at least the following components:

  • z2c: the command line compiler and full build tool. The compiler is also a build tool capable of building arbitrarily complex projects out of the box with a single execution of the tool.
  • zide: a GUI cross platform IDE designed to offer a minimal golden standard of features to new adopters of the language who wish to edit code using an IDE.
  • zsyntax: a command line tool that can output syntax highlighted HTML code using various options for single files or full projects.
  • zut: a command line unit-test execution tool, serving as a sanity check for the compiler and standard library.
  • a full copy of the standard library as source code

Most of these tools are self explanatory, but I’d like to focus a bit on ZIDE, the Z2 IDE. During the past few years, several programming languages have cropped up, some broad scoped, some niche, some popular, some not. But in general, when a new language pops up, especially if it is not a scripting language, the tool support at release is very poor. It can take a lot of time before even a few syntax highlighters crop up for some popular IDEs and it can take years before any decent tools get created. It is very common for somebody picking up such a language to have to edit code in some random editor that does not understand the syntax of the language, have to compile in the command line because said editor can’t launch the appropriate compilation command and be forced to use “printf” debugging, since a real debugger with breakpoints and watches (as a bare minimum) is years away.

It is our goal to eliminate this problem by having the official package come with an IDE, called ZIDE. Since PT5 the package includes this GUI editor and in each PT it is getting improved as part of the official development schedule. The command line compiler is married to ZIDE as development efforts go, so even if the compiler and library are 100% ready for the next release, if ZIDE has not been updated, we will delay the release until some new features have been added to it. ZIDE is supported on all platforms where the compiler is officially supported and requires a X server or Windows.

ZIDE is really not meant to be the best IDE in the world. Hopefully, other better tools will be created by third parties. ZIDE is there to offer a few much needed features and conveniences from day one! A few major features like syntax highlighting, code browsing, project creation and navigation, auto-complete, compiling and debugging are considered by us as being a worthwhile development effort. An early adopter will not be forced to use whatever tools one can muster because ZIDE will always be there if needed. To make sure that ZIDE is pleasant to use and has all the necessary features, the Z2 standard library is fully developed using ZIDE and ZIDE only. Additionally, non-automated testing is also done mostly using ZIDE.

Now that I presented what is in the compiler package, I shall finish by describing how it compiles.

Z2 uses the common IR paradigm (intermediate representation). The compiler has a front-end, which when compiling source code will create an IR tree. And with this the job of the front-end is done. A back-end is needed to output some meaningful machine code based on IR.

The compiler does not have a back-end capable of outputting machine code and this is intentional. It wouldn’t take use more than a couple of months to create one such back-end and it would function properly, but it would produce terribly un-optimized machine code and would also be completely tied to one single machine architecture. Even with a team 20 times larger and working for years, it would still be highly improbable to produce a better optimized machine code generator than some of the solutions that are available today. GCC has been in development for 28 years and LLVM for 12, just to name a couple. So we want the back-ends to use existing solutions for machine code generation. Solutions supporting every major combination of architecture and operating system, while still being capable of outputting highly optimized code.

In order to achieve this, we are developing two back-ends: one that is looking back and one that is looking forward.

The one that is looking back is the C/C++ back-end. This back-end converts IR to C/C++. This is not a binary switch between C and C++: there is a whole spectrum of options related to what kind of C/C++ constructs to use and which parts of the code base to output. As an example, one of the configurations outputs C++ code with maximal formatting, meant to be as readable as hand written code and includes all the code available in a package. This option is meant to make Z2 code available as a library for C++ projects. This way, the Z2 standard library can be used from C++ without having to maintain two libraries for two programming languages. Using ghost-classes and this back-end, it is also possible for Z2 code to call C++ code, for maximum interoperability. Another example is a set of options that produces very short and ugly code, without any meaningful formatting and using short mangled names. This option is used when you only care about the resulting binary and want to potentially speed up compilation. And pretty much everything in between these two extremes is supported.

Since this back-end outputs C/C++ code, but Z2 tries to offer convenience features like ZIDE, it won’t let you to your own devices to compile the resulting C/C++ code. The command line compiler will detect installed GCC versions on Linux and use them to compile the resulting back-end output. Under Windows, a package optionally bundled with a MINGW will be available for binary generation out of the box. Alternatively, Visual Studio versions 7.1, 8, 9, 10, 11 and 12 are auto-detected and can be selected for compilation. These build methods can be edited or added to after detection in order to support custom paths towards other compilers.

The second back-end, the one looking forwards is the LLVM back-end. In consequence it supports the features, code generation models and general interoperability capabilites of LLVM. Currently, this back-end is in early stages of prototyping and we will first finish the C/C++ back-end 100% and then focus fully on the LLVM back-end.

CrashCouse – 001 – Hello World!

In this series I shall be exploring the capabilities of Z2, giving a hopefully comprehensive tour of the language features and design implications, without going into the full depth of these features. Full details are reserved for documentation and the specification documents, while this series is a more casual approach to presenting the language to a new audience. In fact, in the first few parts of the series I will need to go so fast that some concepts warranting multiple posts will only be introduced using only a phrase or two, leaving further explanations for later. And during the first few posts, one might have difficulty in clearly identifying what Z2 is all about and what it brings new to the table. Z2 is not about introducing radical new ways to solve real world problems. It is also not about a few large “killer” features that can be easily summarized on a single PowerPoint slide. It is more about systematically improving upon existing features and programming styles in subtle ways. It is about the consistency of applying these tweaks to existing patterns and seeing what it all adds up to.

Z2 has a lot in common with C++. It tries to solve the same problems using a similar style, but it is an attempt at a more sanitized language, where every single design element is the result of years of real world experience with C++ and other programming languages, solving real world problems. In consequence it is a language designed to be practical, emphasizing features that are considered useful and de-emphasizing and sometimes downright eliminating undesirable features. Z2 is still a complicated language, but it is complicated because it has a lot of features, not because it has a lot of features with complicated rules, that have a lot of exceptions and idiosyncrasies and which sometimes interact poorly with each other. Z2 tries to have simple and consistent rules, with a focus on avoiding ambiguity, generally having one syntax to achieve a goal and it tries to minimize exceptions and caveats to rules.

Now that this brief introduction is done with, it is time to start with the traditional “hello world” snippet:

class Hello {
	def @main() {
		System.Out << "Hello world!\n";
	}
}

The syntax highlighting does not fully understand Z2, but at least the text can be selected.

Note
The problem with embedding Z2 code into posts is that WordPress’ syntax highlighters do no support it. The Z2 compiler can actually output properly syntax highlighted HTML code, but editing documents with huge embedded HTML snippets is kind of cumbersome, so for now I shall be using the default WordPress code highlighter.

In this particular sample, keywords and some identifiers are not properly highlighted, neither is the String constant. When editing code we use an IDE that supports Z2, called ZIDE. For comparison I will now show the same snippet as it is displayed in ZIDE:

01_01

The “hello world” sample is always a very simple one used as a quick introduction of the language features. We can use it to make some basic assumption about the language.

The “class” keyword introduces a class. Z2 is a pure object-oriented language, meaning that all entities one manipulates are objects, instances of a class. Classes can belong to namespaces, but for this sample I am not using a namespace. Z2 permits the declaration of classes outside of an explicit namespace and such classes are part of the implicit/default namespace. Declaring classes in the implicit namespace is no recommended because it can lead to class name clashes. Inside a given namespace, all class names must be unique.

The “def” keyword introduces a member function. The function’s name is “@main”. The ‘@’ character can only be used as the first character of an identifier and acts pretty much as the underscore character (‘_’) found in most other programming languages. Members that start with ‘@’ are normal class members, with no other special conditions and requirements, but the compiler can use them to achieve several goals. If the name and signature of such a member matches a certain goal, the compiler can and will use it. Operator overloading and out of the box marshaling of user classes are two examples of the compiler automatically using such methods. The official specification (once it is done) will of course include the full list of such members that are used for special purposes and third-party libraries can add new items to the list. Existing classes are always available to be “reopen”, so one can retroactively enhance the standard library with features using a third-party library. Using this approach it is possible to blur the line between a language feature and a library feature.

When Z2 compiles a file or a package, it needs to start the execution of the program somewhere: the main method. Since the language already has a mechanism of denoting members as having an additional use, the main method is called “@main”, not “main”. It was not an easy choice to change this, since traditionally this method has been called “main” for decades now. In the end out commitment to not having the language have unnecessary exceptions to rules won and we changed it to “@main”. Imagine the scenario of someone teaching Z2 to somebody else: “Members starting with ‘@’ can be used as special meaning members by the compiler”. “The compiler gives special meaning to one of the functions in a class and uses it as a startup point of the execution”. “This special function is called “main” (not “@main”). Contradiction!

Additionally, packages can contain multiple classes of course, and multiple classes might have a “@main” method. If there is only one “@main”, that method is always picked. If there are multiple, you can specify the desired class to be used as a startup class as a command line parameter when using the command line compiler or using ZIDE (which simply forwards the option to the command line compiler). Project files can store this option so you do not need to specify it at each compilation. If the option is not specified in the command line for the compiler or it is not part of the project file, when running the executable, it will prompt you to choose the class you wish to use as startup. This prompt happens logically before the startup point, at a point in time when the classes in your project can be considered to not exist yet, so there is no risk of programs with multiple startup points behaving differently once an option is chosen at run time versus programs with an explicit startup point chosen at compile time. But the executable size will be bigger since multiple classes must be compiled and included into the executable.

Using the mechanism described above, a method called “@main” from a class (“Hello” in this case) will be used as a startup point. This means that a new instance of said class is created using a constructor and the “@main” method is called. After the execution of the method, a destructor is called. Depending on compilation flags, several diagnostic steps can be run, especially in “debug” mode builds. A memory leak warning is turned on by default in all builds. Finally the program exists. The use of a constructor and destructor means that the startup class must have a public default constructor and a destructor. Z2 is not in the habit of making you write code it can figure out on its own, so you do not need to write a constructor or a destructor. In this case the code generated for both is of course literary nonexistent. There are zero assembly instructions required to construct or destroy such a simple class as “Hello”.

One thing that is not apparent from this sample is another set of the less than ideal features of C++ that Z2 tries to fix: the lack of a module support, the need of declarations and definitions, the need of forward declarations and finally the need for the dreaded pre-processor, all of these just so that the compiler can find the definitions it needs. Z2 has strong module support, together with text or binary modules. There is no such concept as a forward declaration and there are no problems with finding declarations. Z2 uses the following principle: if a person using formal dependency analysis with a pen and paper in hand can reach the conclusion that the definition of a given entity should be visible to another entity, so can the compiler, so it just works! You don’t need to take special care, use special orders and generally spend any time babysitting “include” files from various sources, as one does in C/C++. There is no pre-processor that is completely unaware of the rules of the language in Z2, but there is conditional compilation if needed.

Finally, we reach the statement that prints “Hello world!\n” to STDOUT and we see that some small parts of the standard library are available in all source files without the need to have their definition “imported” using the “using” keyword. The list of default classes that are visible is fairly short and mostly includes only the System class and the classes for fundamental types, like Int, Float, Vector, String and so on.

This concludes the first episode of the CrashCourse! I am sorry I had to use such a drive-by approach at introducing concepts, but there is much to say as an introduction and I would like to have each post reasonably short. I mentioned in passing the whole “pure OOP” design decision, so next time I shall talk more about the object model Z2 uses and detail some implications of this design.

The site is dead, long life the site!

Hi and welcome to our new blog!

This is the official blog of the Z2 project! The project is a programming language and a research project. As a programming language, Z2 is a modern pure OOP general-use performance-centric design-driven systems programming language, with a rich standard library and a programming style designed around clear object relationships and lifetime rules in order to achieve automatic deterministic management of most resources without the need for garbage collection. It is similar to C++ in capabilities and performance profile, but tries to improve upon the less desirable features of C++ and better support modern programming styles and some best practices.

But enough about the project for now! We have the entire future of the blog to talk about said subject. Today I will explain why the move to a blog and what happened to the old site.

We had a full site, domain and hosting, where I started uploading some scarce content. The progress on the site was very slow because we are not web developers, nor do we enjoy such tasks. C++ programmers do not always enjoy HTML and CSS work.

When creating the site, there was an offer for multiple sites with a deep discount, so we went for it. We had a third party buy the plan so that the rest of the sites could be used for different purposes not related to the project. This third party did not inform me that the hosting plan was approaching the renewal period, nor about the multiple warning e-mails that arrived in their inbox and the site got suspended, with the content deleted.

Luckily, there wasn’t that much content lost. The detailed version-by-version changelog was lost, as were the source code statistics. We also had a fairly long description of what the language is about.

Since nothing that important was lost, it was decided to move forward using a wordpress.com blog. The old site idea was ambitious, with it being a SSO solution containing project related content, documentation, blog, news, forum and a code browser, to name the main features it was supposed to have. At the current moment a simple blog should serve us better so we can concentrate on just publishing information, rather than web development. If we ever outlive the capabilities of this blog, the site will return, but this time we shall hire a dedicated web developer and the site should be finished in a couple of months at most.

The plan is to provide frequent updates using this blog. The only problem is the source code. Z2 will be open-sourced, starting with the standard library. The old site idea had the code browser component meant to offer proper syntax highlighting for the language.

With the site being dead, we will upload the code to some form of public repository, using probably either GitHub or Gitorious. This will work well as a version control and distribution platform, but shall lack proper syntax highlighting for our language, so using those services for online code browsing will be a less than ideal solution.

For the immediate future, I shall create a post detailing the project structure and versioning scheme. And I shall create a long running CrashCourse series designed to walk you though the main language features. And I shall try to pick a better theme for the blog and customize it a bit.

See you next time!