Octane: Blog - Post #2

The Osiris Programming Language: Overview

Overview

Osiris has been developed from the Anubis[1] programming language, originally designed by Richard Loxley (Gibbs), Andrew Tindale and myself in 1991 at the University of St. Andrews. Osiris is targeted at delivering applications for the Internet of Things (IoT) running on boards based on the ARM Cortex-M platform.

Features:

Object-Oriented (OO): simple, yet powerful class-based model:
- All class data items are private
- Methods can be public (called interface procedures) or protected (default)
- There are no access modifiers to allow data items to be made public or protected - enforcing data encapsulation and good OO programming practice
- Supports single inheritance and interfaces - unlike Java, all classes can be inherited as interfaces by subclasses
Simple: easy to read syntax and consistent semantic model means it's quick to learn and master
Powerful: fully block-structured with features like nested procedures / methods, first-class functions (including closures), lazy object instantiation and recursive class definitions (to support data structures like linked lists etc.)
Resilient: statically type checked and safe memory access - references are implicit and handled by Osiris itself, there are no null pointer exceptions because there are no explicit pointers and all items are declared with a value - classes and methods cannot be abstract
Lightweight: designed from the ground up to be compact - ARM Cortex-M boards have limited memory resources e.g. 256KB RAM. There is no virtual machine overhead and the Anubis Object-Oriented model has been re-designed for Osiris for simple, deterministic and efficient memory management
High Performance: applications execute at native code performance levels - the generated C code has been designed to allow modern C compilers to optimise the executable code and the Object-Oriented model has been written to reduce the overheads associated with method calls
Portable: the environment can be run on any platform supporting Java 5 (and above) and[update] a C99-compliant C compiler (including gcc, clang, tcc and ARM Keil MDK). It currently supports bare metal ARM Cortex-M3/M4[update], Keil RTX, Linux (x86/x64 and ARM), OS X 10.4-10.10 (x86/x64 & PowerPC) and Microsoft Windows Vista,7,8 & 10 (x86/x64)
Secure: the remote code update mechanism that is currently in development will enable simple, reliable and secure updates to IoT applications running in the field

The basics

It is easiest to get a feel for the easy to read syntax by looking at some simple code examples.

Osiris supports the declaration of variables and constants using a syntax similar to Pascal:

# An integer variable
let x := 5

# A real variable
let y := 7.5

# A string constant
let name = "Maurice"

# A vector (array) of int literals
let numbers = vector @1 of int[8,42,67]

# A triangular array
let pascalTriangle = @1 of *int[@1 of int[1],
				@1 of int[1,1],
				@1 of int[1,2,1],
				@1 of int[1,3,3,1],
				@1 of int[1,4,6,4,1]]
 
# A 4 by 3 vector of strings
let strings = vector 1 to 4 of vector 1 to 3 of ""

Variables are delared with the assignment operator := and constants with the initialising operator =. For simple declarations like integer, real and string literals, the type can be inferred by the compiler. Vectors are type constructors and their identifiers are typed e.g. in the above example, numbers is of type const *int - a constant vector of integers, and pascalTriangle is of type const **int - a constant vector of a vector of integers. The other point to note about Osiris vectors is that the bounds do not have to start at zero or one, and can indeed be negative. As we can see, the Osiris vector (array) model allows for the creation of complex data structues that are type safe.

Procedures / functions

As procedures are first-class functions and have the same citizenship as integers, reals etc., their declaration follows the same syntax:

# A variable function that returns the result of adding two integers
let add := proc(int x,y -> int)
begin
	x+y
end

# A constant, single-line function that subtracts two integers
let subtract = proc(int x,y -> int); x-y

# A procedure that takes another procedure as a parameter and calls it
let doIt = proc(proc(int,int -> int) p); write p(5,4)

# A function that returns another (nested) function
let getMultiply = proc(-> proc(int,int -> int))
begin
	# Declare the multipy function
        let multiply = proc(int x,y -> int); x*y
        
	# return it...
	multiply
end

# A vector of procedures that provide arithmetic operations (add, subtract and multiply above) 
let operators = vector @1 of proc(int,int ->int)[add,subtract,getMultiply()]

As above, Osiris functions are declared by including the return type in the proc signature e.g. -> int and the last expression before the end of the function (end or newline) is returned - there is no explicit return statement (e.g. Java,C/C++) or assignment to a pseudo variable (e.g. Delphi). Procedures and functions can also be declared on a single line by including a semicolon after the signature and everything until the newline will be accepted as the body. Osiris allows braces {} and begin/ end to be freely interchanged as long as they are matched pairs and not mixed e.g. begin ... }.

As getMultiply in the above example shows, procedures / functions can be nested and passed back as return values due to their first-class citizenship. Since Osiris is fully block-structured, it has true closures and variables / constants declared in the enclosing environment of a function are accessible wherever that function is called - we'll return to this in more detail in a later post. For now, it is worth noting that this mechanism is similar to accessing object member variables within methods and has been optimised in the Osiris runtime.

The final point to note is that procedure / function identifiers an be reassigned to definitions if they have the same signature (parameter and return types) as the original declaration. We will see how the the same syntax for variable and constant procedure decarations is carried over for methods, making the overriding syntax consistent and easy to understand.

Classes, objects and methods

As mentioned above, the Osiris class-based object model is very simple as there are no access modifier keywords and only methods can be protected or public (referred to as interface procedures). This follows accepted good OO design, as outlined in Arthur J. Riel's Object-Oriented Design Heuristics[2] and Scott Meyers' Effective C++ guideline 22[3] :

Heuristic 2.1: All data should be hidden within its class.
Heuristic 5.3: All data in a base class should be private; do not use protected data.

Item 22: Declare all data members private
• Declare data members private. It gives clients syntactically uniform access to data, affords fine-grained access control, allows invariants to be enforced, and offers class authors implementation flexibility.
• protected is no more encapsulated than public

Encapsulation was a key element that defined Object-Oriented languages when Anubis was designed in 1991 and the protection of data within a class by making it private is perhaps of greater importance today with the vast amount of available class libraries and systems written using OO languages. With this in mind, Osiris refines the Anubis model further by making all data private and not just protected.

The private data model supports the concept that class definitions are protected blocks, where only the enclosed procedures (methods) can access the data - just like data in the environment enclosing a nested procedure, and like nested procedures, they have to be returned from a function to be accessible. This is the inspiration for the Osiris public method or interface procedure syntax - referred to as the class' protocol, akin to a procedure's signature:

Procedure signature: proc(int x,y -> int)
Class protocol: Person(getAge(-> int), setAge(int))

The design of the Osiris, by continually applying Occam's Razor, strives to keep the syntax simple and consistent, without reducing the language's power or expressiveness. The following example shows the declaration of classes and methods in Osiris:

# The base class with one interface procedure (public method) getArea that returns an int
class Shape(getArea(->int)) is 
begin
	
	# A default function to be overridden in subclasses
	let getArea := proc(-> int); 0
end

# A class that extends Shape and implements the interface procedure
class Square(setWidth(int), getWidth(-> int)) inherits Shape is
begin
        let width := 0

        let setWidth = proc(int w); width := w
        let getWidth := proc(-> int); width

	# Override the getArea interface procedure declared in Shape. 
        let getArea := proc(-> int); width * width
end

# A class that extends Square and adds the set/get length interface procedures
class Rectangle(setLength(int), getLength(-> int)) inherits Square is
begin
        let length := 0

        let setLength = proc(int l); length := l
	let getLength := proc(-> int); length

	# Note: this interface procedure overrides that defined in Square
	# 	and calls the getWidth() method because 'width' is private to the class Square
        let getArea := proc(-> int); length * getWidth()
end

As far as possible, the Osiris syntax consists of common English words that are arranged in such an order as to allow natural reading; this makes it easier to remember and, hopefully, to understand programs written in it. Experiences with more symbolic language syntaxes e.g. Perl have suggested that this approach will help make Osiris systems easier to maintain over time.

The above example is fairly trivial but there are some key points to note:

A method must be declared variable using the assignment operator := if it is to be overridden in subclasses.
A constant method is equivalent to a Java final method.
All constant items in a class, including methods, are declared at class level - there is only one copy of them for all instances (objects) of the class. As they cannot be changed once declared in the class definition, we don't need a copy for each object, which is especially important for a memory constrained device such as a Cortex-M based board.
Methods can only be made public by including them in the class interface procedure list (or protocol).
Only item declarations can be made within class definitions e.g. let statements, not clauses such as while loops or procedure calls etc.
Unlike Java, all items must be declared before they are used e.g. member variables cannot be declared after they are referenced in a method. A forward declaration construct is provided to allow procedures and classes to be referenced before they are defined to allow mutually recursive references, but the compiler checks to see that these items are actually defined before the end of the block they are declared within. Normal recursive procedures / methods do not require a forward declaration as the signature is known before the body is defined.

If a class definition does not implement all the interface procedures declared in its protocol, it is marked as abstract and the compiler will generate an error. Osiris doesn't allow abstract classes as they would introduce the null pointer problem when we use an abstract class as a root for objects in a collection or returned from a generic function. For example:

# Get a Shape from a collection
let getNextShape = proc(-> Shape) { ... }  

let obj := getNextShape()

write obj.getArea()

If Shape was abstract then we would get a compiler error when we tried to call obj.getArea(), as obj is of type Shape (inferred by the return type of getNextShape()) and the compiler cannot determine in this case if it is in fact an object of type Square or Rectangle. The check could, as in Anubis, be deferred to runtime and the compiler allow the call to be made. For an embedded systems programming language that strives to be deterministic, this is undesirable. It might seem that the problem can be overcome by introducing casting into the language to allow the compiler to know that it is dealing with the desired type. However, this would mean that we could get invalid casting errors at runtime if the returned object isn't of the type (or a subclass of the type) in the cast. For example, (Rectangle)obj would fail if obj is of type Square as we can't cast further down the heirarchy - an object of type Square cannot satisfy the protocol of Rectangle as it doesn't implement the getLength(-> int) procedure. In Osiris as Shape can't be abstract, the compiler allows the obj.getArea() call to made and everything is fine. However, abstract classes are a nice way to implement programming to the interface without having to implement a new, explicit construct as found in Java. As Anubis allowed classes to be abstract, it supported this programming paradigm and Osiris looks to be lacking in this regard...

The solution for Osiris is to allow classes to be inherited as interfaces - a subclass can use the implements, rather than the inherits, keyword to inherit the protocol (public interface) of a base class, without inheriting its member variables (data), method and interface procedure implementations. In effect, the subclass inherits the base class' protocol and must implement the interface procedures in full or it will be marked as abstract by the compiler. The solution can be implemented in Osiris because, although the subclass doesn't include any of the internal representation (member variables etc.) of the base class, it can be used anywhere as a substitute for the base class due to the constraint that Osiris only allows methods to be public. This allows Osiris to mimic multiple inheritance heirarchies without introducing the diamond problem and adding complexity / additional memory requirements to the runtime for resource constrained embedded systems. We'll cover this in more detail in a later post.

I hope you agree that the design choices for Osiris have resulted in a language that is optimised for embedded systems and the IoT, without imposing additional development effort or unnecessary restrictions. In the next post we'll see how Osiris operates under the covers and look at some of the C code that gets generated as a result.

References

1. R. Gibbs, M. Jamieson, A. Tindale, 1991. Anubis Language Report: Section 3 - Design. Unpublished BSc. (Hons.) dissertation. University of St. Andrews.
2. A. J. Riel, 2005. Object-Oriented Design Heuristics. Addison-Wesley. Boston, United States. ISBN 0-201-63385-X.
3. S. Meyers, 2005. Effective C++ Third Edition. Addison-Wesley. Boston, United States. ISBN 0-321-33487-6.