PT 9.2 Preview #1

PT 9.2 just so happens to turn out a much larger release than anticipated. It includes a major rewrite of the packaging system, making it a lot faster in hopes of the compiler getting closer to a production ready state. These packaging changes will still take some time to finish so I won’t talk about them for now, but other features will also make their way into the release and I can talk about those!

To reiterate, some of the long term goals are:

  • greatly increase Z2/C++ interoperability
  • increase compilation speed
  • do the final disambiguation and language feature cleanup rounds

With the C++ backend, Z2 can be compiled down to plain and simple C++. We generally don’t talk about or show this resulting C++, since it is an ever changing beast. The backend has multiple compilation options and if you use a well defined set of them, the compiler gives guarantees on what the C++ result will be like. But if you just randomly use the compiler with the goal of creating executables, the actual form of the resulting C++ is decided on the fly, is implementation defined and is generally tailored to be smaller and uglier for quicker compilation times. The compiler might even decide to skip all white-spaces and output the whole code on 128 (configurable) character long lines. Or it might change all you names to encoded short strings (configurable, BASE64 and other options).

This is why we talk about two separate entities. One is ad-hoc C++ code, meant to be quickly processed by a backend compiler, is implementation defined and can vary randomly between readable and obfuscated. Ad-hoc serves a single purpose: feeding a back end compiler in a final binary deliverable generation scenario. Scenarios where you don’t care about how the code looks, just what it does and you need it compiled.

The second one is interoperability code. In this mode, the resulting C++ tries to look as close as possible to both your Z2 code and the equivalent hand written code if it were originally written in C++, not Z2, but with compromises to handle the differences and needs of both languages.

So today I shall show some of the resulting interoperability mode code and to demonstrate the new features. First, let us introduce a very simple test class that has a single field called Name of type Int and a sample of its use:

class Test {
	val Name = 0;
}
val t = Test{};
t.Name = 7;
t.Name += 1;
t.Name /= 4;
System.Out << t.Name << "\n";

The snippet would of course print 2. In the past, if you had a plain class that had a simple member you wanted to have unrestricted read and write access to and there was no design document or other reason for you to expect for this member to ever be read or written to in a more complicated way, we would argue for the use of a variable over a property. If later things changed, you could then and only then change the variable to a property. And the result was always the same: OOP purists would object to the use of a public variable. They would suggest that you always use a property for public members, even if you are sure that you will never have complicated side effect based getters and setters:

class Test {
	property Name: Int {
		return name;
	}
	set (value) {
		name = value;
	}
}
private {
	val name = 0;
}

Z2 does not adopt an “one size fits all” approach to things like this and lets you decide. If you feel that you should decide on a case by case basis if each public field should be a variable or a property or instead go with a rule that all public fields should be properties, Z2 let’s you decide. Because, in the end, it might not even matter. The two versions of Test are identical. Even to the point that both the front end and back end compiler will strip away the property, leaving you just with a variable. The public API is the same for both versions, but in order to give the exact answer to what exactly happens, the answer to other questions must be known first, like if the build is debug or release mode, optimization level, inlineing, if the class is intended for dynamically linked libraries and so on.

Z2 will do instead just two things: first, give you the curtsy of keeping your property around when compiling in C++/Z2 interoperation mode, so you API looks nice and clean. Second, Z2 realizes that while there are some complex cases of getter and setters out there, in this simple case, where the property is read and write and only updates a single variable, the current syntax is too verbose, so PT 9.2 introduces this new syntax that is identical to the first:

class Test {
	property Name = name;
}
private {
	val name = 0;
}

Using this syntax, the compiler will “provide” you with getters and setters that will affect the variable that is at the right of the = sign.

Now it is time to see the resulting C++ code:

class Test {
public:
	int32 name;

	inline Test() {
		memset(this, 0, sizeof(Test));
	}

	inline Test(const Void&) {}

	inline int32 Name() const {
		return name;
	}

	inline void Name(int32 value) {
		name = value;
	}
};

The conversion is convention based. Randomly outputted C++ code from Z2 code can always be made to work with other C++ code, but the results might not be pretty. “Autogenerated” code has a reputation of being difficult to work with. So a conventions system is used to make everything look good and have predictable results. But there is also some heavy but standardized compromising always in use, because the object model, calling conventions and other details are subtly different between C++ and Z2.

But the class overall look nice and clean. I won’t discuss the details of getting this code into header files for now. Instead, let’s focus on the class. It has the same name as in Z2, but you will notice that the name variable is public. This is one of the compromises I talked about before. In C++, public/private/protected can affect you API/ABI compatibility, so by default, Z2 bypasses these possible resulting troubles by using public. One additional added benefit is that if you change a field from private to public in Z2, the C++ code does not need to be recompiled. There is an option for turning on protected and private access modifiers, but it is off by default.

Another thing you’ll notice is the second constructor (I’ll talk about the first one latter). This is another compromise. In Z2, everything is a class and there is no such thing as an implicitly unitized object. All Z2 constructors will fully initialize the instance. But C++ can skip this, mostly for built in types. The second constructor is present in every single class and is a NOP: it will leave your instance completely uninitialized. This is not for public use form C++ code, but it still must be present to satisfy Z2 API requirements. So all classes have a constructor that accepts a Void const reference and it can be always ignored because it is always guaranteed to do nothing! Nothing implies a full NOP: all members, on any depth will remain unitized, even virtual tables and other internals. Using an instance resulting from this constructor is an guaranteed error. Don’t use it!

Except for the automatic getters and setters, which use by default the “short getter/setter” naming convention (there is an option for this, default is short; with long convention, the methods are called GetFoo and SetFoo), there is the question of the first constructor. It uses memset. In the post "Class constructor performance foibles?" I detailed the problems. Cutting edge compilers, especially Clang are great at consolidating multiple small fields that are initialized with 0 and even using SSE and every trick in the book the create the fastest constructors possible. In these compilers, using memset instead of initializing has the same performance and results in the same ASM code, because memset is treated as an intrinsic. Other compilers are not that great at consolidating values, especially 8-bit ones, and will routinely be outperformed by memsetting the instance. All supported compilers have been tested and using memset is always as fast or faster than setting all fields in order. This applies the same to using initializer lists. In conclusion, using a memset is as fast as the fastest method of setting everything to zero on all supported platforms. So we have gotten around to adding this optimization to the compiler and by default you will always see memsets in constructors whenever possible. But this optimization is intentionally not aggressive. It will only be used when all the fields in the class are initialized with 0 bits. Otherwise, it will initialize them as usual. It handles pointers, like we can see with String:

String::String() {
	this->data = nullptr;
	this->length = 0;
	this->capacity = 0;
}

becomes…

String::String() {
	memset(this, 0, sizeof(String));
}

It is also aware of types it has already optimized, so even if you have a non POD class that is embedded in another class, the whole deal can be optimized and flattened down to a single memset, instead of the host class calling the child class’ constructor, which memsets and the doing a separate memset for the rest of the members:

SystemCPU::SystemCPU(): Vendor(_Void) {
	new (&this->Vendor) ::String();
	this->MMX = false;
	this->SSE = false;
	this->SSE2 = false;
	this->SSE3 = false;
	this->SSSE3 = false;
	this->SSE41 = false;
	this->SSE42 = false;
}

becomes…

SystemCPU::SystemCPU(): Vendor(_Void) {
	memset(this, 0, sizeof(SystemCPU));
}

And finally, it also handles classes with virtual methods correctly:

Stream::Stream() {
	memset(&this->pos, 0, sizeof(Stream) - __Z_MEMBER_OFFSET);
}

For classes with virtual mebers, the memset starts at the offset of the first field, making sure to not nuke the vtable. The actual memset that is generated varies based on backend compiler and class layout, so don’t take the actual code as set in stone, only what it does: if a constructor logically ends up writing only 0 bits into the entire instance, barring vtables and what-not, an appropriate memset optimization will kick in and this will guarantee that constructors on really old compilers are more competitive with the latest Clang.

Next, let us look at one of the samples form the org.z2legacy.ut package, in the access folder:

namespace org.z2legacy.ut.access;

using org.z2legacy.ut.access.Foo;

class FailPrivate01 {
	def @main() {
		val p = Foo{};
	}
}

The contents of the sample are not important here: it just tests that the private constructor of Foo is indeed not accessible. The new minor feature is that you can now write:

namespace org.z2legacy.ut.access;

using Foo;

class FailPrivate01 {
	def @main() {
		val p = Foo{};
	}
}

When the using statement is followed by an unqualified class name, it will always assume that it is in the same namespace as the one specified in the namespace statement, so using org.z2legacy.ut.access.Foo means the same as using Foo. This may lead you to the question: how do you handle classes that are not within a namespace? The short answer is: you can’t.

But fret not! We removed the ability to have classes outside of namespaces! Not for the above mentioned reason, but because try as we might, as soon as packages started to grow, public namespace pollution became more an more of an issue. If anybody can add names to the public namespace, it is only a question of time before two different packages will define two different classes with the same name. With mandatory namespaces, this issue is greatly lessened.

So starting with Z2 PT 9.2, all your classes must be added to a namespace. This is the first breaking change in the language, but the language is very young, so it should not be a problem. It was either adding a breaking change, or realizing years from now, when it is too late, that namespace pollution is indeed a severe problem affecting actual code.

I’ll go in and change the content on the site to reflect this change.

As a bit of an tangentially related curiosity, Z2 does away with declaration orders being meaningful when they are not needed, as stated before. This is great for classes and methods, where one class can have access to another class that is defined in the same file, only latter. You no longer have to babysit orders and you only care about public or private access rights. But this also affects the namespace statement, so the above example is 100% identical to the case where the namespace statement is not the first line in the file, but placed somewhere more awkward, like:

using Foo;

class FailPrivate01 {
	def @main() {
		val p = Foo{};
	}
}

// DO NO LOOK HERE!!!!!!! I AM HIDING!!!!
namespace org.z2legacy.ut.access;

You can have only one namespace statement per source file, so all the classes in a file must be in the same namespace. It makes sense for it to be in the beginning of the file, maybe even the first statement, since it affects the whole file, but you can place it anywhere outside of a class definition.

The error reporting within the compiler has been upgraded. A new component was introduced to centrally handle all error reporting. This will also allow for internationalization of error messages and the assignment of unique error codes, but the list is not agreed upon yet. Additionally, some error messages have been improved, as one can see in this command line screenshot:

error

To review the contents of this preview, I talked about:

  • new shorthand syntax for properties
  • memset optimization for zeroing constructors
  • new shorthand syntax for the suing statement
  • new error reporting component

At least one more preview and maybe even a minor release will be created before PT 9.2 is released.

Compiler versioning system

The Z2 compiler, called z2c has almost reached its next internal stable version, PT6. It is in the fixing stage of the last few known bugs and undergoing additional testing and should be done before Christmas. Since the detailed change-log is lost (see the first post and the blog) and the change-log is meaningless anyway to people who are not already using the compiler, I thought this would be a great moment to detail the versioning scheme, release structure and schedule of the compiler.

The compiler is developed in two stages. The first stage will have the full set of core features and a full standard library and will officially be labeled “1.0”. Stage two will enhance the language with some optional meta-programming related features and add a few others inspired by some scripting languages. Stage two can be safely ignored for now. While we do have the exact plan on what features we want to add into stage two, development on them won’t begin until stage one is done.

For stage one, we are labeling each internal compiler version with PT##, where ## is a number. PT1 though PT5 are done and PT6 is just about finshed. PT comes from “pre-tier”, signaling the alpha unreleased quality of the software.

Depending on the progress we actually achieve, either PT8 or PT9 will be the first version available for public preview, so right now it is not possible to download a public working compiler from anywhere. Since these milestones are already labeled with “pre”, there is no use to create special preview releases. You’ll be able to download the exact compiler package we use for testing. Starting with PT10, the software should reach beta level quality and by PT15 we’d like to reach 1.0.

Each compiler package comes with at least the following components:

  • z2c: the command line compiler and full build tool. The compiler is also a build tool capable of building arbitrarily complex projects out of the box with a single execution of the tool.
  • zide: a GUI cross platform IDE designed to offer a minimal golden standard of features to new adopters of the language who wish to edit code using an IDE.
  • zsyntax: a command line tool that can output syntax highlighted HTML code using various options for single files or full projects.
  • zut: a command line unit-test execution tool, serving as a sanity check for the compiler and standard library.
  • a full copy of the standard library as source code

Most of these tools are self explanatory, but I’d like to focus a bit on ZIDE, the Z2 IDE. During the past few years, several programming languages have cropped up, some broad scoped, some niche, some popular, some not. But in general, when a new language pops up, especially if it is not a scripting language, the tool support at release is very poor. It can take a lot of time before even a few syntax highlighters crop up for some popular IDEs and it can take years before any decent tools get created. It is very common for somebody picking up such a language to have to edit code in some random editor that does not understand the syntax of the language, have to compile in the command line because said editor can’t launch the appropriate compilation command and be forced to use “printf” debugging, since a real debugger with breakpoints and watches (as a bare minimum) is years away.

It is our goal to eliminate this problem by having the official package come with an IDE, called ZIDE. Since PT5 the package includes this GUI editor and in each PT it is getting improved as part of the official development schedule. The command line compiler is married to ZIDE as development efforts go, so even if the compiler and library are 100% ready for the next release, if ZIDE has not been updated, we will delay the release until some new features have been added to it. ZIDE is supported on all platforms where the compiler is officially supported and requires a X server or Windows.

ZIDE is really not meant to be the best IDE in the world. Hopefully, other better tools will be created by third parties. ZIDE is there to offer a few much needed features and conveniences from day one! A few major features like syntax highlighting, code browsing, project creation and navigation, auto-complete, compiling and debugging are considered by us as being a worthwhile development effort. An early adopter will not be forced to use whatever tools one can muster because ZIDE will always be there if needed. To make sure that ZIDE is pleasant to use and has all the necessary features, the Z2 standard library is fully developed using ZIDE and ZIDE only. Additionally, non-automated testing is also done mostly using ZIDE.

Now that I presented what is in the compiler package, I shall finish by describing how it compiles.

Z2 uses the common IR paradigm (intermediate representation). The compiler has a front-end, which when compiling source code will create an IR tree. And with this the job of the front-end is done. A back-end is needed to output some meaningful machine code based on IR.

The compiler does not have a back-end capable of outputting machine code and this is intentional. It wouldn’t take use more than a couple of months to create one such back-end and it would function properly, but it would produce terribly un-optimized machine code and would also be completely tied to one single machine architecture. Even with a team 20 times larger and working for years, it would still be highly improbable to produce a better optimized machine code generator than some of the solutions that are available today. GCC has been in development for 28 years and LLVM for 12, just to name a couple. So we want the back-ends to use existing solutions for machine code generation. Solutions supporting every major combination of architecture and operating system, while still being capable of outputting highly optimized code.

In order to achieve this, we are developing two back-ends: one that is looking back and one that is looking forward.

The one that is looking back is the C/C++ back-end. This back-end converts IR to C/C++. This is not a binary switch between C and C++: there is a whole spectrum of options related to what kind of C/C++ constructs to use and which parts of the code base to output. As an example, one of the configurations outputs C++ code with maximal formatting, meant to be as readable as hand written code and includes all the code available in a package. This option is meant to make Z2 code available as a library for C++ projects. This way, the Z2 standard library can be used from C++ without having to maintain two libraries for two programming languages. Using ghost-classes and this back-end, it is also possible for Z2 code to call C++ code, for maximum interoperability. Another example is a set of options that produces very short and ugly code, without any meaningful formatting and using short mangled names. This option is used when you only care about the resulting binary and want to potentially speed up compilation. And pretty much everything in between these two extremes is supported.

Since this back-end outputs C/C++ code, but Z2 tries to offer convenience features like ZIDE, it won’t let you to your own devices to compile the resulting C/C++ code. The command line compiler will detect installed GCC versions on Linux and use them to compile the resulting back-end output. Under Windows, a package optionally bundled with a MINGW will be available for binary generation out of the box. Alternatively, Visual Studio versions 7.1, 8, 9, 10, 11 and 12 are auto-detected and can be selected for compilation. These build methods can be edited or added to after detection in order to support custom paths towards other compilers.

The second back-end, the one looking forwards is the LLVM back-end. In consequence it supports the features, code generation models and general interoperability capabilites of LLVM. Currently, this back-end is in early stages of prototyping and we will first finish the C/C++ back-end 100% and then focus fully on the LLVM back-end.

The site is dead, long life the site!

Hi and welcome to our new blog!

This is the official blog of the Z2 project! The project is a programming language and a research project. As a programming language, Z2 is a modern pure OOP general-use performance-centric design-driven systems programming language, with a rich standard library and a programming style designed around clear object relationships and lifetime rules in order to achieve automatic deterministic management of most resources without the need for garbage collection. It is similar to C++ in capabilities and performance profile, but tries to improve upon the less desirable features of C++ and better support modern programming styles and some best practices.

But enough about the project for now! We have the entire future of the blog to talk about said subject. Today I will explain why the move to a blog and what happened to the old site.

We had a full site, domain and hosting, where I started uploading some scarce content. The progress on the site was very slow because we are not web developers, nor do we enjoy such tasks. C++ programmers do not always enjoy HTML and CSS work.

When creating the site, there was an offer for multiple sites with a deep discount, so we went for it. We had a third party buy the plan so that the rest of the sites could be used for different purposes not related to the project. This third party did not inform me that the hosting plan was approaching the renewal period, nor about the multiple warning e-mails that arrived in their inbox and the site got suspended, with the content deleted.

Luckily, there wasn’t that much content lost. The detailed version-by-version changelog was lost, as were the source code statistics. We also had a fairly long description of what the language is about.

Since nothing that important was lost, it was decided to move forward using a wordpress.com blog. The old site idea was ambitious, with it being a SSO solution containing project related content, documentation, blog, news, forum and a code browser, to name the main features it was supposed to have. At the current moment a simple blog should serve us better so we can concentrate on just publishing information, rather than web development. If we ever outlive the capabilities of this blog, the site will return, but this time we shall hire a dedicated web developer and the site should be finished in a couple of months at most.

The plan is to provide frequent updates using this blog. The only problem is the source code. Z2 will be open-sourced, starting with the standard library. The old site idea had the code browser component meant to offer proper syntax highlighting for the language.

With the site being dead, we will upload the code to some form of public repository, using probably either GitHub or Gitorious. This will work well as a version control and distribution platform, but shall lack proper syntax highlighting for our language, so using those services for online code browsing will be a less than ideal solution.

For the immediate future, I shall create a post detailing the project structure and versioning scheme. And I shall create a long running CrashCourse series designed to walk you though the main language features. And I shall try to pick a better theme for the blog and customize it a bit.

See you next time!