2012-01-25

SICP Exercise 2.76: Compare and Contrast Data Representation Strategies

As a large system with generic operations evolves, new types of data objects or new operations may be needed. For each of the three strategies -- generic operations with explicit dispatch, data-directed style, and message-passing-style -- describe the changes that must be made to a system in order to add new types or new operations. Which organization would be most appropriate for a system in which new types must often be added? Which would be most appropriate for a system in which new operations must often be added?

Let's examine each of the strategies in turn and outline the changes required for adding types and operations before discussing the appropriateness of different strategies for different systems.

Generic Operations with Explicit Dispatch

When using generic operations with explicit dispatch each generic operation must be able to identify the type of passed data and invoke an appropriate type-specific procedure to perform the operation. As a result each generic operation contains type-specific code and so the changes required for additions to the system are as follows:
  • Adding a New Type: Once the representation for the type has been decided upon, procedures must be implemented that perform each of the existing generic operations for that type, along with appropriate constructors for the type, and then each generic operation must be updated so that it tests for the new type and invokes the appropriate procedure when the new type is identified.
  • Adding a New Operation: First a procedure must be implemented for each existing type that performs the new operation for that type. Once these are in place the generic operation itself can be implemented. This will follow the pattern of existing generic operations, testing for each of the existing types in turn and delegating to the appropriate type-specific procedure when the matching type is found.
Both cases expose this strategy's lack of modularization. The developer of a new type must have access to all of the generic procedures, while the developer adding the new operation must write code to test for each type and delegate to the appropriate type-specific procedure.

Data-Directed Style

Under the data-directed style each procedure that performs a particular operation for a specific type is stored in a table keyed under tags for the operation and type. The generic operations simply extract the type tag from the passed data, look up the appropriate procedure in the table and, if it's present, invoke it. Under this style we make changes to the system as follows:
  • Adding a New Type: Once the representation for the type has been decided upon, procedures must be implemented that perform each of the existing generic operations for that type, along with appropriate constructors for the type, and then those procedures must be registered in the table under the appropriate tags.
  • Adding a New Operation: First a procedure must be implemented for each existing type that performs the new operation for that type. Once these are in place the generic operation itself can be implemented. This simply invokes a utility procedure to perform the procedure look-up (such as the apply-generic procedure defined in section 2.4.3) and then delegates the execution of the operation to the returned procedure.
Note that in the first case there is no need to update the existing generic operations to support the type, while in the second case there is no need to add type-specific support into the new generic operation. However, this doesn't quite tell the full story...

In section 2.4.3 we are shown how to group together the set of procedures that implement all of the the operations for a specific type into a package that both defines the procedures and installs them in the table. In part (d) of exercise 2.73 we also explored inverting this packaging so that we group together the procedures that implement a specific operation for all of the types.

So depending upon how the system is factored we may find that, in order to make the changes for adding a type or operation and leave system tidy, we may need to either add a new package or modify all of the packages. For example, if in a particular system a package groups together all of the operations for a specific type then adding a new type will involve adding a new package, while adding a new operation will involve updating all of the packages, adding a procedure and its installation to each package.

Message-Passing Style

The message-passing style produces data objects that can receive and respond to messages indicating the type of operation to perform. We can still have generic operations under such a system - they simply invoke the apply-generic procedure defined in the Message Passing section appropriately. Under the message-passing system we make changes to the system as follows:
  • Adding a New Type: This involves implementing a new constructor procedure for the type that returns a procedure that can dispatch messages appropriately. We've actually just done this in exercise 2.75.
  • Adding a New Operation: First each type's dispatching procedure must be updated so that it can handle a message corresponding to the new operation appropriately. Once these are in place a generic operation can be implemented that simply invokes a utility procedure to perform the message passing (such as the apply-generic procedure defined in the Message Passing section).
Under this strategy the handling of messages (i.e. the operations) is performed by the dispatch procedure associated with each type. As a result the message handling is tightly bound to the data objects, meaning that new types are far more straightforward to add than operations.

Note that the author's indicated in a footnote that message-passing, as implemented in the book, restricts you to generic operations of one argument. This isn't strictly true. A generic operation could be implemented with multiple parameters that then gathered those parameters together into a single list, which could then be passed to the message.

What Strategy When?

We've seen that the approach of generic operations with explicit dispatch suffers from a lack of modularization, meaning that adding either a type or an operation requires large changes to the system.

We've also seen that, under the data-directed system, to add either a type or an operation we only have to register appropriate procedures in a table and then the generic operations can access the appropriate procedures without having to know about all types. However, we further noted that the cleanliness of additions for a particular system under this style may depend upon how said system is factored.

Finally, we saw how message-passing produces data objects that tightly bind the operations to the type, which can produce well-contained types, but does not necessarily make adding operations easy.

Based upon this I would suggest using the message-passing style for a system in which new types are frequently added, and the data-directed style (with appropriate factoring) for a system in which new operations are frequently added.

No comments:

Post a Comment