|
Memoization (not to be confused with memorization) is an optimization technique that most programmers are familiar with, even if the name might sound somewhat foreign. In brief, it's a caching technique where the results of a computationally expensive function are cached to avoid recomputing the same values multiple times. (Oh, the irony. Did I really say "In brief" before writing that?)
In functional circles, if I say I'm going to memoize a function, I'm not actually going to hack it until I've added a bunch of ugly one-off caching code that obscures the way the original code worked. I'm not going to modify the original function at all. Rather, I'm going to use a memoize function to automatically create an optimized version of the original.
In truly functional languages, this is possible because:
Functions are predictable. Which is to say, given a specific set of parameters, a function will always returns the same value(s).
Functions are first class -- they can be created dynamically, passed around as parameters, etc.
Given the right building blocks one can write a memoize function that, when given our original function as a parameter, creates and returns a new equivalent function, that's been optimized.
Automatic memoization is obviously a beautiful thing. So it shouldn't be surprising that it's not solely the plaything of functional programmers. It is possible to write memoize in many multi-paradigm languages like Lisp, Ruby, and Python.
Vying Games, Ruby, and Memoization
Since Vying Games is written in Ruby (a language with a slight performance handicap), and includes many computationally intense methods, I've become fairly well acquainted with memoize. So, I'd like to discuss, some of the gotchas involved with using memoize in an object oriented environment.
But first, memoization is a been there, done that in the Ruby world. Here's a list of some of the more interesting implementations:
The memoize function I use is closest to JEG2's, which takes a fairly standard approach. It uses alias_method to replace the original method with a memoized version. A hash -- mapping parameters to return values -- is used as its cache.
What to do with self?
In Ruby, we don't really have functions. We have objects with methods. Objects have state, and that state (sometimes) effects the results of method calls. Consider:
"hello".include?( "lo" )
"goodbye".include?( "lo" )
If we were to memoize String#include?, the keys to the cache should include, not only its parameters ("lo"), but also self ("hello" or "goodbye"). If we relied only on the parameters, we'd have a seriously broken include? method.
But this isn't a realistic example. If self does not effect the outcome of the method call, we don't need to include it in the key to the cache. Including self in the key has the potential to reduce cache hits, hurting performance -- so, it would be nice to exclude self where possible.
Unfortunately, we really can't know when self can or cannot be excluded. Which is why I've modified my version of memoize to always include self in the cache's key.
What to do with mutable return values?
Again, objects have state -- including our return values. If we cache mutable objects we leave ourselves open to all kinds of possible bugs. Consider a method that returns a new array every time it's called. When using the unmemoized version it's safe to modify the returned array -- the next time the method is called a brand new array will be returned. However, when the method's memoized, the array is cached and returned with every call. Now, if we modify the returned array, we've changed the value returned by all future calls.
This is bad. Really bad. Speaking from experience, it can take hours to track down such bugs.
Imagine, we have some working, but slow, code that we'd like to optimize. Suppose, we find a method that's a good candidate for memoization. Let's say this code returns a freshly created, mutable object. What if existing code depends on the fact that its safe to modify the object? We won't be able to use memoize without also rewriting that dependent code. That's ugly. That makes Haskell programmers cry. (Or gloat?)
There are a couple possible solutions to this problem. In Ruby, our memoize can freeze all cached return values. Now code that changes our mutable, cached objects will raise an exception. This will at least make it easier to track down bugs, but we still might have to rewrite some code.
Another possible solution would be to dup or clone the cached object before we return it. This is only helpful if the object can be cloned faster than created -- which I suspect is usually true. But, since there's no programmatic way to tell whether an object is mutable, we'd have to take this hit across the board. There's yet another, bigger problem with this approach: Ruby only provides a shallow clone. So, unless, care is taken, object corruption is still possible.
In the end, I went with the freeze approach, though perhaps the best option is to use immutable objects wherever possible.
Memoize and Object Creation
If a class of objects are immutable, but object creation is slow or done frequently, memoizing new can be a good idea. Depending on actual object use, you might save both time and memory. But be careful, if you memoize new your garbage collector will either thank you (you've reduced object churn), or hate you (you've caused a memory leak).
And, of course, in the case where new doesn't take any parameters, memoization will essentially turn the class into a Singleton.
If your class of objects are mutable, the clone trick described above can be useful. Though you won't save any memory, you might be able to save on computation time.
A Little Beaten Up, But Still a Looker
So while our beautiful, functional memoize loses some of its luster in an object oriented world -- it still has a certain something something, if you know what I mean? No? Err, alright. I just meant that, in Ruby, memoize has a bit of an edge to it. It's not as carefree as functional memoize.
Just be careful if you mess around with it. And, as always, don't forget to benchmark your code.
|
Alexander Botero-Lowry says,