7.5

C with Classes

don't

“C with Classes” (CwC) is a dialect of C++ 2011 used in this course. CwC has three main goals:
  • be easy to learn for Java developers: to that end, CwC encourages a programming style that emphasizes class-based programming with dynamic dispatch,

  • expose developers to manual memory management: students are faced with design choices between stack and heap allocation and must be hygenic in their use of memory,

  • act as a style guide to ease sharing code among teams: by removing choices, CwC makes it easier to write anonymous code.

We do not aim to teach C++ (other classes do) or teach low-level or systems’ programming (other classes do that). CwC is a vehicle for exploring interesting problems in software design in the presence of finite resources.

C++ supports separate compilation, the code that implements classes would normally be in independent .cpp files compiled independently of one another. While CwC will not scale to large code bases where compile times are substantial, it should serve us well enough.

Files: A CwC program consists of one .cpp file and multiple .h files. Each file should start with a comment that documents the language used, //lang::CwC for files written in the subset described here, and //lang::Cpp files written in C++11. All C library functions are available.

Congruences: Any feature syntactically and semantically congruent to Java is available in CwC. This includes control structures (if-then, for, while, switch, break, return) and primitive types (int, char, long, double, float, short, bool). Modifiers unsigned, long, extern and static can be used.} The type size_t is an alias for unsigned long (i.e. 64-bit) values.

Arrays: Arrays in CwC have a simple representation: a sequence of values allocated contiguously in memory. Arrays can be allocated in the heap, on the stack, or within another data type. An example of a heap allocated array that is accessed by a pointer to the first element is shown next:

int *ia = new int[20]; // allocate an array of 20 integers on the heap

The consequence of this simple representation is that the length of an array isn’t recorded for us. This, in turn, means that it’s the programmer’s responsibility to keep track of the bounds of an array. Reading or writing outside of an array’s bounds means potentially accessing memory that is not owned by our program. There are no "out of bounds" exceptions, it is simply *undefined behavior* and can lead to nasty bugs. Array elements are indexed from 0 and are accessed similarly to Java:

ia[0] = 20;
ia[1] = 30;
ia[2] = ia[0] + ia[1];

Another style of accessing array elements uses pointer arithmetic:

*ia = 20;
ia++;
*ia = 30;
ia++;
*ia = *(ia-2) + *(ia-1);

The code uses *ia to "derefence" the pointer, that is to tell the compiler to write, e.g. 20, into the location pointed by ia. On the first line it is ia[0], then ia[1], etc. We can also write *(ia-1) to request the "previous" value.

A heap-allocated array can be freed using delete[] (note the square brackets):

delete[] ia;

An array can be allocated on the stack, in this case we would write"

int ia[20];
ia[0] = 20;
ia[1] = 30;
ia[2] = ia[0] + ia[1];

In this case the array needs not be deleted (in fact you can’t) as it will be reclaimed when the function where the definition occurs returns.

Lastly, one can declare an array as part of an enclosing data structure. The following is a string class for strings that can have at most 10 characters:

class String {
  char val_[11];
 
  String(char* c) {
    assert(strlen(c) <= 10);
    strcpy(&val_, c);
  }
};

The field val_ is an array of 10 characters that is created each time a String object is created and delete when the string is deleted. The constructor makes sure that the argument string is not too long and copies it in the array by passing its address to strcpy.

To manipulate a group of objects, says strings, one can either define an array of strings or an an array of pointer to strings. There are reasons for picking one or the other depending on the application. Imagine that we wanted to create an array of 10 strings, and query the size of the array’s first element:

String as1[10];
as1[0].size();
 
String* as2[10];
as2[0] = new String;
as2[0]->size();
 
String* as3 = new String[10];
as3[0].size();
 
String** as4 = new String*[10];
as4[0] = new String;
as4[0]->size();

The above code give four variants. as1 is a stack-allocated array of strings. The strings are initialized with their default constructor. as2 is a stack-allocated array of pointer. We must create the object and set a pointer to it in the array. as3 is a pointer to an array of objects. The array and the objects it contains is heap-allocated in one go. Finally, as4 is a pointer to a heap-allocated array of pointers.

To choose one of these alternative one must answer the following questions:
  • Is the array of a fixed size or may it have to grow? Fixed-sized solutions are as1 and as2.

  • Will the array only hold String instances, or is it possible that we need to store subclasses? If we need to store subclasses then use pointers, as2 or as4.

  • Can some of the String be nullptr? If yes, the use pointers, as2 or as4.

  • Should the array outlive the current function? If yes, then use heap-allocation, as3 or as4.

  • May the array have to be shared wit other parts of the system? If yes, then use heap-allocation, as3 or as4.

  • Should the strings in the array be shared with other parts of the system? If yes, use pointers to strings, as2 or as4.

C strings: are defined as an array of characters terminated with a special character ’\0’. To compute the length of a string:

size_t length(char* p) {
  size_t len = 0;
  while(p[len] != '\0')
    len++;
  return len;
}

Traditionally, the length does not count the trailing terminator. Thus a string is constructed by allocating memory for the number of characters that it should hold plus one. The empty string is a character array of length one. To create the string Hi one could write:

size_t len = 2;
char* c = new char[len + 1];
c[0] = 'H'; c[1] = 'i'; c[2] = '\0';

Lastly, string literals, i.e. string constants written "hi" have type const char* to make obvious that they should not modified.

const char* c = "hi";
c[0] = 'H'; // not allowed by the compiler
char* no = const_cast<char*>(c);
no[0] = 'H'; //  allowed but will cause an error

The above code block, declares a variable c that refers to a constant string hi. Trying to change the first letter of c to an upper case H is caught by the C++ compiler. One can cast the const away, but the program will likely stop with an access violation if you try to modify the string literal as they are stored in read-only memory. If you really need to change a literal string, copy it (e.g. with strdup()).

Allocation and deallocation: To allocate a value on the heap, use the new operator, and to deallocate use delete. Calling delete on a nullptr does nothing. The delete[] operator is used for arrays, it deletes every element of the array, and calls their destructor, before freeing the array itself. Note that delete[] deletes objects but does not follow pointer to objects.

char* c1 = new char;
char* c2 = new char[1]; // same as above
char* c3 = new char[0]; // meaningless. Don't
2DPoint* p = new 2DPoint(0,0);  // a single point
2DPoint** arr = new 2DPoint*[10]; // an array of 10 pointers to points

Stack v. Heap: Values can be allocated on the heap (via new) or on the stack. The main difference is that a stack allocated value will be automatically deallocated when the current function returns.

String*  f() {
  String s1("hi");
  String* s0 = new String("hi");
  return s0;
}

The function f creates two string objects, one is stack allocated and will be destructed when the function returns, the other is heap allocated and will continue to exist after the funtion returns. It would be an error to try to return s1}. In fact the compiler would prevent that.

Heap allocated object are typically references through pointer variables. To get to a field or method, we must use the arrow operator (e.g s0->size_ or s0->size()).

Stack allocated objects are access by reference and we use the dot operator (e.g. s1.size_ or s1.size(). The compiler will remind you when you forget.

Pointers v. References Each value is stored at some particular location in memory. When we need to talk about a memory location we can use a pointer (*) or a reference (&). There are a number of design choices linked to the two. The compiler creates references by telling us where a particular value starts in memory. To get a reference you must have a value, thus a reference can never be nullptr. A pointer on the other hand, can be nullptr. A reference, if it denotes a local variable, will become invalid as soon as the function where the variable resides returns. This will lead us to a programming style where pointers are used for values that may outlast a call, values whose lifetime must be thought about carefully. On the other hand, references will be used with values that are on loan to a function and should not be retained.

void recall(String* s) { cache_ = s;  }
 
void read(String& s) { len_ = s.size(); }

The first function, recall, gets a pointer to a string and stores it into a field. The second function, read, gets a reference to a string uses it to call a method but does not retain the object.

Assert The assert() function is useful to check that the state of the program is as expected. The following function checks that its first argument is not a nullptr and that the second argument is larger than 10. If either failes, the program will stop. To use this function #include <assert.h>.

void larger_than_ten(char* s, size_t len) {
   assert(s && len > 10);
}

Const The const modifier can be used to specify that a value should not be modified. One common use is const char* for strings that should be modified.

Classes: CwC classes use a subset of the C++ syntax and features. A CwC class has a name, a parent class, zero or more fields, one or more constructors, a destructor, zero or more methods.

class ClassName  : public ParentName  {
  public:
    fields...
    constructors...
    destructor
    methods...
};

Why are CwC fields public? Like, dynamic languages –Python, Smalltalk, Lua– CwC has no support for access modifiers. In C++, access contol induces a rich design space: (1) a class can inherit from public, protect or private parents, (2) a class can have a public interface for its clients, (3) a class can have another interface for its subclases protected), (4) there is an escape hatch with friend. For our purposes this is too much to properly explain and use.

The keyword public on the parent and before the methods and fields is used to ensure that all declarations are accessible to other classes in the project. Note the semi-colon after the closing brace, if you don’t the compiler will remind you.

/**
 * 2DPoint represents two dimensional coordinates with a pair of
 * integer variables. 2DPoints are immutable.
 * author: jo@husky.neu.edu
 */
class 2DPoint : public Object {
public:
  int x_;  // x coordinate
  int y_;  // y coordinate
 
  /** Default constructor initializes point to the origin */
  2DPoint() : Object() {
     x_ = 0;
     y_ = 0;
  }
 
  /** Initializing constructor. */
  2DPoint(int x, int y) : Object(), x_(x), y_(y) {}
 
  /** Initializing constructor. */
  2DPoint(2DPoint &p) : Object() {
     x_ = p.x_;
     y_ = p.y_;
  }
 
  /** Destructor */
  ~2DPoint() {}
 
  /** Getter for x */
  virtual int get_x() { return x_; }
 
  /** Getter for y */
  virtual int get_x() { return y_; }
};

The above is an example of class declaration for an immutable point class. The getter methods are declared virtual to allow subclasses to override them if need be. The third constructor takes a reference to another point (rather than a pointer) to emphasize that it does not retain that object. The destructor is not marked virtual because we assume the parent class did that for us.

There is another reason for not declaring multiple variables in one line: char* c,d; is equivalent to char* c: and char d;. This is part of the C legacy.

Fields: Fields are declared by giving a type name and a field name. Do not declare multiple fields on the same line as it makes it harder to write comments for each of them.

Constructors: A class may have multiple constructors. A constructor should initialize all the fields of a class. Unlike in Java, fields are not initialize to zero in C++. Unintialized fields can hold random values and can cause surprising bugs. A constructor consists of the name of the class followed by arguments, then it can invoke the parent constructor, field initializers and have a block of code. For example here is a constructor for a color point:

class 2DColorPoint : public 2DPoint {
public:
  char* col_; // owned; the color of this point
 
  /** Constructor */
  2DColorPoint(int x, int y, char* color) : 2DPoint(x,y) {
     assert(color != nullptr);
     size_t len = strlen(color);
     col_ = new char[len+1];
     strcpy(col_,color);
  }

The parent constructor can be left out, in which case the default constructor will be invoked. Fields can be either initialized in the constructor’s body or with field initializers after the colon.

Destructor: There is a single destructor per class. It can be declared virtual if the class will have subclasses. The destructor is run whenever an object is deleted or goes out of scope. The destructor should delete all owned data and finalize any external resources owned by the object.

Methods: are declared by giving the keyword virtual, the type of the return value (or void) the name of the method, a list of arguments and a body. Multiple methods of the same name but with different argument types are allowed (this is call overloading). The virtual keyword can be omitted if we are redefining a method.

There is one last kind of cast, (T)exp, while this seems familiar from Java it is unsafe and should not be used. Semantically, it is the same as reinterpret_cast, it tells the compiler to treat expression exp as if it was of type T,

Casts: We will dynamic_cast to check that an object is a subclass of a type, static_cast for casts that are always safe, reinterpret_cast and const_cast in exceptional situations.

bool equals(Object* o) {
   String other = dynamic_cast<String*>(o);
   String other2 = reinterpret_cast<String*>(o);
   ...

The difference between the above two casts is that the first one is evaluated at run-time. It uses a few cycles, and either returns o or returns nullptr. The other cast is performed by the compiler and always returns o. Dynamic casts are what you should use for checking the type of an object. Other casts are typically not to be used.

Static: Static fields and methods are similar to Java, they are fields and methods shared by the entire class. Static methods are declared with the static keyword, they are invoke by prefixing the name of the class. The following is an example where a static field is declared and a static method is used to initialize it:

class A {
public:
   static int I;
   static void initialize(int i) { I = i; }
};
...
A::initialize(42);

Including files: The #include directive tells the compiler to copy the specified file in the current file. Included files are found in the current directory (#include "helper.h") or looked up in the C++ distribution (#include <assert.h>). Note that we suggest using C-style standard header names inside #include (e. g. assert.h instead of cassert) because that guarantees it is in the global namespace. To avoid including the same file multiple times, start every .h-file with #pragma once.

The main function: The entry point of a CwC program is the main function. This function takes two arguments: the number of command line arguments (by convenction called argc), and an array of strings with the actual arguments (by convention argv). Note, that the first element of the argument array, argv[0], is the executable used to invoke the program. Actual command-line arguments – if any – start at argv[1].

Example:

#include <stdio.h>
 
int main(int argc, char **argv) {
  if (argc == 0) return 0;
  printf("Program invoked as: %s\n", argv[0]);
  printf("Additional arguments:\n");
  for (int i = 1; i < argc; i++)
    printf("  %s\n", argv[i]);
  return 0;
}

Anti-features: The following are not going to be used in CwC.

Towards the end of the class, we will evaluate the impact of these choices on your code and identify the features that would be most useful to incorporate in your code base.

Errata: The function length was incorrect, it returned a result off by one. There was an error in the allocation of an array of object poointers: 2DPoint[10] instead of 2DPoint*[10]. Fixed.