Assignment 4: Indexer

In this assignment, you will expand the cross referencer from Assignment 3 into a full indexer that works across multiple files (or other kinds of locations).

To make things interesting, this indexer should be designed keep track of arbitrary kinds of locations, and the design shouldn't be restricted to just a line position within an input file. For example, a file+line number could be a location, or a file+offset, or a URL. New kinds of locations might be added in the future (a paragraph within a Microsoft Word document, for example). In this first version, it will handle file/line number locations.

When run, the indexer should look at the current directory and index all the files within it. It should then stop and wait for input; for each word entered on its command line, it should search for all the locations that word can be found in and print them out in the following form:

file1:line#
file2:line#
...

If asked to search for the special word '*', it should instead print out a report displaying all its indexed information, in the form:

word1 : file1:line# file2:line# ...
word2 : ...

Implementation

Handling arbitrary kinds of locations is a natural place to use polymorphism. Create an abstract base class named Location, and have the indexer's storage and output deal exclusively in Locations. Each word is associated with one or more Locations. Implement TextFileLocation, derived from Location, as a simple class that associates a line number with a File object. It is useful for an indexer to keep track of the last modification timestamp for a file, so File objects contain both the file name (as a string) and the last file modification timestamp. TextFileLocation objects should share File objects for efficiency (no point in duplicating all that data).

Your internal data structures should deal with Locations rather than concrete types wherever possible. They'll also need to share File objects. This implies that they'll need to use either pointers or smart pointers to the Location-derived objects and to File objects. For this assignment, you choose which to use: Either bare pointers (Location* and File*) or shared_ptr<Location> and shared_ptr<File>. If you choose to use bare pointers, make sure that the code tracks ownership of the objects, deletes every object it allocates, and doesn't generate any dangling pointers. If you choose to use shared_ptr, you'll need to download smart_ptr.hpp and support files from the boost web site (www.boost.org). If you use shared_ptr, make sure there are no circular reference chains.

The main program for the indexer will now need more than a single input stream; it will need to iterate over the files in a directory. Unfortunately this is OS-specific. For Unix systems, you can use opendir(), readdir(), closdir(), and stat(). For Windows systems, there are other functions in the Win32 API. You might also try the boost::dir_it directory iteration library (optional, may not work for Windows).

Turn in the code, description of the program input, and sample output.

Extra Credit Assignment: Custom Iterator Adaptors (5pts):

In associative containers, you often want to look at just the key or just the value, but the standard iterator gives you an iterator to the key,value pair structure. This can make it difficult to work with many algorithms. Write two custom iterators, key_iterator_t and value_iterator_t, along with generation functions key_iterator and value_iterator, that wrap an existing iterator-to-pair type. Here's an example of usage:

typedef map<string,string> SSMap;
SSMap mymap; ... for(key_iterator_t<SSMap::iterator> ki(mymap.begin()); ki!=mymap.end(); ++ki) cout << "Key: " << *ki;

// Find element with given _value_ rather than by key:
SSMap::iterator j = find(value_iterator(mymap.begin()),value_iterator(mymap.end()),"some value");

// Print out keys again, using an algorithm:
copy(key_iterator(mymap.begin()),key_iterator(mymap.end()),ostream_iterator<string>(cout,"\n"));

The value_iterator_t and key_iterator_t classes should provide an implicit conversion to the wrapped iterator type. Turn in the header file and test code.