Design Issues

In general, the solutions looked pretty solid. Everyone made good use of the Publisher-Subscriber pattern.

Who is the Subscriber?

In most solutions, Cell inherited from Publisher and implemented the Subscriber interface. Some solutions had the Spreadsheet as the Publisher, the Cell as the Subscriber. Others had the Cell as the Publisher, and an auxiliary Dependent object as the Subscriber. What are the advantages and disadvantages of each design?

Spreadsheet as Publisher: For this design to work, the spreadsheet manages all cell modifications (clients can't change cells directly). The Spreadsheet then notifies all dependents whenever any of its cells change value. It could pass the changed cell as the "void *what" parameter. (The solutions I saw didn't do that.)
One problem is that all changes are broadcast to all potential subscribers; each subscriber has to check to see if they're really interested in the notification by checking the "what" parameter. Without the "what" parameter, this model really becomes unusable. This solution has scaling problems.

Cell as Publisher, Dependent as Subscriber: In this design, the cells publish their changes via notify(), which is picked up by special auxiliary Dependent objects. Each Dependent object is associated with one cell, and it updates the cell's value when its update() method is called. This generally works OK, and cells that just contain simple values don't need to worry about an update() function. One problem is that the extra object requires additional bookkeeping. This model might be very useful if we wanted to apply the Proxy pattern (for example, to be able to dynamically insert some other object between the Dependent and the Cell, perhaps to do extra processing for each update).

A nice variant of this had two kinds of Cells, a base Cell class and a derived Dependent class which inherited from Cell and implemented Subscriber. Only those cells which implement functions are Dependents; all the others are plain Cells. Setting a function for a spreadsheet location involves swapping out a Cell object for a Dependent object, so modifications are a bit more complicated, but it does isolate the update() to just the code that cares about it.

Cell as Publisher and Subscriber: In this design, Cell broadcasts changes to its subscribers and receives notifications from its publishers. Every Cell has to implement update(), even those that just have simple data values and never receive an update. On the other hand, there's only one concrete class to think about rather than two. Overall, I think this is the best solution for this problem, but the Cell/Dependent solution is reasonable as well.

Client Interface

What does the C++ client interface look like? This is the interface that your code presents to another programmer writing C++. If the code is to be reusable, this interface should be simple, self-documenting, and hard to use incorrectly.

Here are the big client interface problems that I saw: Embedding console UI within reusable framework components, requiring clients to perform downcasts, and requiring clients to perform all memory management. Each of these hurts reusability, increases complexity, and/or increases the chance of clients making a mistake.

In general, all classes exposed to clients should either follow the Orthodox Canonical Form (OCF) or explicitly forbid operations they do not support. Almost all classes should be default constructible. Most should be copy constructible, and those that aren't should make the copy constructor private to prevent accidental use. The same applies to operator=. Some people explicitly forbade unsupported operations (good!).

My solution fails OCF because copying a Cell doesn't copy its associated Function, and Cell doesn't disable copying. Mea culpa. I'll fix it in the next version.

Switching on Type IDs

Several of the solutions ended up, in one place or another, doing a big switch statement based on some kind of type or object ID. In moderation, this is fine. It is a reasonable way to deal with non object-oriented data; for example,

switch (ch) {
case '+': ...
case '-': ...
...
};

However, if you find yourself writing code like this

switch (obj.getType()) {
case TYPE_X: /*x processing*/ ;
case TYPE_Y: /*y processing*/ ;
...
};

take a step back and see if there's another way to do things. Downcasts within the switch statements are a bad sign too. The problem with this is that as new types are added, this switch statement needs to be extended. This would not be too bad if this were the only switch statement, but in a real world system you usually end up writing one of these for every operation you want to support. When you start writing the second switch statement, it is definitely time for a redesign. Trying to maintain all the switch statements in parallel will be a maintenance nightmare.

C++ offers a good alternative: virtual functions. Take the switch statements and turn them inside out, making each case block into its own function within the derived class. Give the whole operation a name, and make that name be a virtual function in the base class.

XObject::Process(...) {/*x processing*/}
YObject::Process(...) {/*y processing*/}
...
obj.Process(...); // Calls XObject::Process, YObject::Process, etc. automatically based on actual type.

This is often expressed as a rule of thumb: "Replace switches with virtual functions."

Implementation Issues

Everything in Header Files

A couple of solutions put a lot of code into header files (including functions several dozen lines long). This is not a scalable technique.

Sidebar: A "free function" is a function that doesn't belong to a class. If you put the body of a free function in its header file, and neglect to put the "inline" keyword in front of it, you may get linker errors. The reason is that the compiler will see this function's definition each time it #includes the header file in a new .cpp file. You're only allowed one such definition per program, so the linker will complain if you have more than one .cpp file. It's all right to put the declaration in the header file:
Declaration: int foo(); // OK to put in .h file
Definition: int foo() {} // Don't put in .h file unless you put "inline" in front of it (but see below)
What about member functions? If you put a member function definition inside the class declaration in the header, the compiler assumes that you want to make it inline. It's as if you put the keyword "inline" in front of the member function inside the class declaration. The compiler may or may not actually inline the function.
Declaration: class x {int foo();} // x::foo defined elsewhere in a .cc or .cpp file
Definition: class x {int foo() {} } // x::foo is an inline function
If you put a member function definition inside a header, but outside the class declaration, the rules are the same as for a free function. You will get linker errors if you have more than one .cpp file.
If you are having problems moving from one .cpp file to two or more because the linker is complaining about duplicate symbol definitions, this may be your problem.
Even if you get the inline issues worked out, there are good reasons to keep code out of .h files. If a compiler decides not to inline a function, it basically makes duplicate copies of it in every .cpp file but tells the linker to ignore the duplicate definitions.

Putting code for non-trivial functions into headers instead of .cpp or .cc files is bad for a couple of reasons:

1) If you're unlucky, the compiler will actually inline these and you'll get huge executables. If you're lucky, the compiler will just make duplicate static definitions in every object file and you'll get really big executables. If you're really, really lucky, the linker will eliminate the duplicate definitions automatically, and the only effect is to waste a bunch of time compiling things that are never used. None of these alternatives are good.

2) Every change to the internals of these functions will force a recompile of every client of the interface. This is a big problem for even a medium size project. For a library or framework, it's even worse. Even if it only takes a couple of minutes to rebuild everything, it kills productivity to have to wait two minutes to test every minor modification.

Memory Management

Many solutions had better memory management than in solution #1. But many solutions still paid little attention to memory management. Either memory is leaked, or clients are expected to provide all memory management. The latter is fine for a test driver but won't usually work in the real world. In particular, in Assignment 3 users are allowed to create new cells at well at runtime. So the spreadsheet has to be able to add new cells dynamically, and it should manage their memory. It is not usually possible to retrofit reasonable memory management onto a design which doesn't take it into account -- things often need to be redesigned.

Cell Storage Mechanisms and Cell Access

There were three basic solutions for storing the Cell objects within a spreadsheet. One is to use the "array of arrays" or "vector of vectors" approach. This is the approach I used in my solution. Another is to use a single 1D array or vector, but use arithmetic -- (row*NUMCOLS+col) -- to map a 2D coordinate system onto the 1D container. Either of these solutions is reasonable. (The second one may run into problems when the container needs to be dynamically expanded without losing its current contents, but as long as it's well encapsulated within Spreadsheet it can be changed to a different mechanism without affecting the rest of the system.)

A third solution is to use a sequence of Cells -- typically a list -- and store the row, column coordinates with each cell. Accessing a cell is done by linear search which looks for a cell with specific row, column values. Only cells with actual values need to be stored in this system (it's sparse). This solution has some appeal, but suffers from one big drawback: The lookup time is proportional to the total number of cells, i.e., it's O(n). Now consider the following typical loop to display the contents of the spreadsheet:

for(int row=0;row<numrows;++row) {
    for(int col=0;col<numcols;++col) {
        cout << getCell(row,col);
    }
    cout << endl;
}

If getCell() takes O(n) time, how much time does the entire loop take? Answer: O(n^2). Is this a problem? Not for our simple 3x4 spreadsheet. But scale it up, and you'll start to run into difficulties. This is part of our main display loop and we want it to be fast (that's why we are caching the cell values for computed cells). The O(n^2) behavior means that our caching strategy is mostly irrelevant. So it's probably not a good fit for our problem..

Another solution, which no one used, would be to use an associative array. Here's an example (this gives the general idea, but details need to be filled in):

typedef map<pair<int,int>, Cell*> CellMap; // pair and map are STL template classes
CellMap cellMap; // Create a map from co-ordinates to Cell pointers
cellMap[pair(3,2)] = new Cell(...); // Insert a cell at 3,2
Cell *c = cellMap[pair(3,2)]; // Retrieve cell at 3,2 (O(log n) time)

Using this data structure, we can access cells in O(log n) time, a big improvement for large spreadsheets. It's also possible to iterate through the container in row-major order and print everything out in O(n) time. This might be an attractive solution if we had a "sparse" spreadsheet with few non-null cells relative to the total size. For instance, if the spreadsheet was 500000x500000, but we only used 100 cells, this might be the superior solution.

But for our application, this is probably not an issue. Sparse spreadsheets are probably fairly rare. This solution also requires more complicated code.

So overall, the first two solutions (array of arrays or a mapping to a 1D array) best fit this problem.