2012-05-30

SICP Exercise 2.90: Supporting Dense and Sparse Polynomials - Part 1

Suppose we want to have a polynomial system that is efficient for both sparse and dense polynomials. One way to do this is to allow both kinds of term-list representations in our system. The situation is analogous to the complex-number example of section 2.4, where we allowed both rectangular and polar representations. To do this we must distinguish different types of term lists and make the operations on term lists generic. Redesign the polynomial system to implement this generalization. This is a major effort, not a local change.
They're not kidding - it's not a local change. I've split the solution into two posts. In this post I'll discuss the changes required, and only give a few snippets of code. In the second post I'll present the code itself and talk about various interesting bits of it.

Using the Analogy

We're told that the situation is analogous to the complex-number example, so let's first examine the complex-number example to see how it handles its different representations. Then we can think about how we can apply that approach to this exercise.

In the complex-number example the rectangular and polar representations are extracted to their own packages, named rectangular and polar respectively. These two packages expose identical sets of primitive operations: constructors and accessors for the components of the complex number. They do not expose, or implement in any way, any complex operations such as arithmetic operations. Nor are data objects of these types directly constructed. Instead an overarching complex type is added to support these.

This complex package exposes two different construction operations that allow for the creation of complex number representations appropriate to the semantics of the passed parameters. I.e. rectangular representations are used for {real, complex} tuples and polar representations for {magnitude, angle} tuples. These delegate much of the construction of the representation to the rectangular and polar packages, simply tagging the results as 'complex types. It also provides representation-agnostic implementations of arithmetic operations that utilize the set of primitive operations exposed by the two representations and that select the best representation for the result based upon the operation.

The authors show how this fits into the type system in figure 2.23 at the start of section 2.5. Here it is again:
Programs that use numbers
add sub mul div
Generic arithmetic package
add-rat sub-rat
mul-rat div-rat
add-complex sub-complex
mul-complex div-complex
+ - * /
Rational arithmetic Complex arithmetic Ordinary arithmetic
Rectangular Polar

By analogy this suggest that we should identify a set of primitive operations that are common to both the sparse and dense polynomial representations that will allow for the implementation of the higher-level operations to be defined in terms of these operations only. Doing so will allow us to provide an over-arching polynomial package that can perform operations on either (or both!) of the two representations without bypassing this abstraction layer.

We can show how this will fit into the system by extending the figure above:
Programs that use numbers
add sub mul div
Generic arithmetic package
add-rat sub-rat
mul-rat div-rat
add-complex sub-complex
mul-complex div-complex
add-poly sub-poly
mul-poly div-poly
+ - * /
Rational arithmetic Complex arithmetic Polynomial arithmetic Ordinary arithmetic
Rectangular Polar Dense Sparse

Identifying the Interface

If we look at the original polynomial package implementation (which you can find in its entirety at the top of my solution to exercise 2.87), then you can see that the authors have carefully split out the term-list-specific portions of the operations in the package. add-terms is split out from add-poly, mul-terms is split out from mul-poly and so on.

We've retained this split throughout the subsequent exercises. As a result, if we compare the sparse and dense polynomial packages we've produced then we can see that they differ only from the level of these operations down. This suggests that we should treat these operations as the primitive operations that we should extract. In other words, we'll use different representations for the term-lists, but the surrounding polynomial representation (i.e. a tuple consisting of a variable and a term-list) will remain unchanged.

If we do choose these as our primitive operations then we'll quickly hit a stumbling block: we can't add two polynomials together which have different term-list representations. The implementation of add-terms for sparse term-lists will only work with, and will only be installed for, pairs of sparse term-lists. Similarly the implementation of add-terms for dense term-lists will only work with, and will only be installed for, pairs of dense term-lists. So we'll get an error if we try to add a sparse and a dense term-list together.

Obviously we can't work around this by simply e.g. installing the dense term-list operations for all permutations of term-list representations. The internal representations are different and so errors will occur. Instead we'll need to coerce the types of parameters to some common type (probably sparse term-lists, as we'd not want to expand out a sparse term-list with, say, an order 1,000,000 term to a dense term-list with all the zero-padding that would involve!).

Coercing Dense and Sparse Term-Lists

In section 2.5.2 and the related exercises we developed a mechanism for automatic coercion of types based upon a tower-of-types. Surely all we need to do then is to add the sparse and dense term-list types into the tower-of-types and all will work, won't it?

Well, not quite.

In our solution to this exercise sparse and dense term-lists will not be "primary" types. I.e like the rectangular and polar representations of complex numbers, they are not directly constructable and do not expose the full set of arithmetic operations. Instead they will be "secondary" (or sub-) types of the polynomial type. As a result they do not fit into the tower-of-types we have been using so far. What we need is an entirely separate tower-of-types for term-list representations and have apply-generic support this.

Thankfully this isn't as difficult as it sounds. Our current implementation for raising and dropping types is dependant upon there being an unbroken chain of coercion operations that can raise a type from any point in the tower-of-types right the way up to the top of the tower. This requirement is made explicit in apply-raise, where it tries to obtain an appropriate coercion operation to raise the type one level in the tower. If no such operation can be found it raises an error:
…
(let ((raiser (get-coercion (type-tag x) (cadr types))))
  (if raiser
      (raiser (contents x))
      (error "No coercion procedure found for types"
             (list (type-tag x) (cadr types)))))
And as apply-raise is also used by project to lower types, (by passing in the reversed tower-of-types to apply-raise), the current implementation also depends upon their being a similar chain of coercion operations that can lower any type all the way to the bottom of the tower.

If we loosen this restriction and allow for breaks in the chain of coercion operations up and down the tower then this effectively allows multiple towers-of-types to be represented with a single tower. All we need is for the implementation to assume that a missing coercion operation indicates a point at which a type should not be lowered any further. To do this all we need to do is to return the value passed in instead of raising an error. Here's the updated apply-raise:
(define (apply-raise x types)
  (cond ((null? types)
         (error "Type not found in the tower-of-types"
                (list (type-tag x) tower-of-types)))
        ((eq? (type-tag x) (car types))
         (if (null? (cdr types))
             x
             (let ((raiser (get-coercion (type-tag x) (cadr types))))
               (if raiser
                   (raiser (contents x))
                   x))))
        (else (apply-raise x (cdr types)))))
Having made this change we can just add the sparse and dense term-list types to the tower, and add the appropriate coercion procedures, and it'll all work. We'll leave the coercion procedures for now, as they'll fall out of the work we do to create the term-lists.

Coercing Polynomials

While we're talking about coercions it's worth noting that we can also fit our overarching polynomial package into the tower-of-types too.

Let's start with raising a complex type to a polynomial. What we need to produce is a polynomial of order 0 whose coefficient of the order 0 term is the complex value we're raising. We need to consider, however, what variable we will create the polynomial with. If we choose e.g. 'x as our variable then we'll be unable to use it in any calculations with polynomials whose variable is 'y and vice versa - despite the fact that the polynomial has no actual terms in which the variable is important.

One way of addressing this is to introduce a special marker variable that is only used for polynomials of order 0 that indicates that it can be used in calculations with any other polynomial. Effectively it late-binds the variable at the point the calculation is performed. I chose to call this marker 'unbound. Supporting this first requires a simple extension to same-variable? so that it matches 'unbound with any other variable:
(define (same-variable? v1 v2)
  (and (variable? v1)
       (variable? v2)
       (or (eq? v1 v2)
           (eq? v1 'unbound)
           (eq? v2 'unbound))))
It also requires a modification to the arithmetic operations so that they select the appropriate variable for any polynomial resulting from a calculation. At the moment our arithmetic operations just use the variable of the first polynomial involved in any calculation. All we need to do is to modify these so that they identify whether or not one this variable is 'unbound and, if so, use the other polynomial's variable instead. This selection can be encapsulated in a separate operation as follows:
(define (select-variable p1 p2)
  (let ((v1 (variable p1)))
    (if (eq? v1 'unbound)
        (variable p2)
        v1)))
Once we can raise complex types to polynomial types in this manner, it's a trivial matter to drop them again. We know that the order 0 term of a polynomial is equivalent to a constant numerical value. As a result, we can successfully drop a polynomial of order 0 to a complex by simply taking the coefficient of the order 0 term. Now due to the manner in drop works (i.e. it projects down the tree, which is allowed to drop information, then raises the result and checks that this is the same), this means that our coercion procedure can follow this approach, regardless of the order of the polynomial. I.e. it takes the coefficient of the order 0 term, and discards the rest of the polynomial.

After we incorporate these changes we'll have a new tower-of-types that effectively looks like this:
    sparse-terms
         ↑
    dense-terms

     polynomial
         ↑
      complex
         ↑
      rational
         ↑
      integer

Creating Polynomials

I noted above that coercing between sparse and dense term-lists would fall out of the work on creating polynomials, so let's look at that next.

The implementations of the polynomial package that we've dealt with in the preceding exercises for sparse and dense polynomials have produced two ways in which we can construct polynomials. For sparse polynomials we create a polynomial using a list of terms, with each term being a tuple of the term's order and its coefficient. For dense polynomials we create a polynomial using a list of the term coefficients, with the order of the term corresponding to a particular coefficient being implicit in the position of the coefficient in that list.

Now let's suppose we implement our system so that we can create either term-list representation in either manner. I.e. we can create a term list with either a list of terms or a list of coefficients, regardless of whether we're using the sparse or dense representations. In such a system we would be able to easily convert (i.e. coerce) a sparse term-list into a dense term-list by simply creating a dense term-list using the sparse term-list's terms. Similarly, we would be able to easily convert a dense term-list into a sparse term-list by creating a sparse term-list using the dense term-list's coefficients.

So, as you can see, by implementing our system in this manner the coercion operations will fall out of the work on creating polynomials.

The creation operations themselves are pretty straightforward. We already have appropriate operations for creating sparse polynomials from term lists and dense polynomials from coefficient lists. In order to create a sparse polynomial from a coefficient list we simply run through the list and add a term of the appropriate order for each coefficient (i.e. the term order for a coefficient is dictated by its position in the list). To create a dense polynomial from a term list we perform a similar operation to the one of building a sparse polynomial from a term list, except that the list we build consists of coefficients only, and so we determine whether or not we're inserting at the right location based upon the length of the list.

Dense or Sparse?

One side-effect of supporting coercion between dense and sparse term-list representations is that, thanks to the work we did in exercise 2.85, apply-generic will automatically drop a sparse term-list to a dense term-list. Now we don't necessarily want to do this, as the result of an arithmetic operation on two sparse term-lists may itself be a term-list that's best represented sparsely. We'll want to block the coercion in this case. One way of doing this is to simply have the coercion procedure decide whether or not it should coerce and, if not, just return the term-list unchanged (well, apart from re-tagging it). Our implementation of drop already checks for coercion procedures doing this and uses it to indicate that the value could not be dropped any further.

To make things a bit more complex, I'd like to generalize my statement above... The result of an arithmetic operation on any pair of term-lists, regardless of their representations, may be a term-list that's best represented sparsely. We should also make sure that the result of dense term-list arithmetic operations is provided in the best representation. The raising won't happen automatically for us here, so we'll need to do it ourselves.

So when should we use a dense term-list and when a sparse term-list? The book defines the difference as follows: "A polynomial is said to be dense if it has nonzero coefficients in terms of most orders. If it has many zero terms it is said to be sparse." There's no hard and fast rule about what having "nonzero coefficients in terms of most orders" or "many zero terms" mean, so I opted for the following: we'll represent a term-list sparsely when more than 10% of its terms have a zero coefficient, with a special rule to deal with term-lists whose highest order term is order 10 or lower.

Right, I think that's enough discussion of it. Let's move on to the code!

No comments:

Post a Comment