Higher Order Blog

home

One aspect of lazy computation

23 Apr 2009

I've often needed to do a combination of filtering and mapping on arrays. E.g. in a Ruby on Rails app, I might have a list of "RecurringActivation" model objects which have a product_id and an integer period. Now I would like a list of product_ids where the current time "matches" the period; in code this might be:
#version 1 - 'functional'
cur_period = Time.now.month - 1
RecurringActivation.find(:all, :select => "product_id, period").select { |r|
  cur_period % r.period == 0
}.map &:product_id


#version 2 - 'imperative'
cur_period = Time.now.month - 1
result = []
RecurringActivation.find(:all, :select => "product_id, period").each { |r|
  result << r.product_id  if cur_period % r.period == 0
}
In the first version I compose functions (methods) 'select' and 'map': the blocks have no side-effects, and this is why I call it 'functional.' This is shorter and more clear. Of course, this is implemented by the library using iteration and imperative assignment, but at least my code feels functional. In the second version I use 'each' with a block that does both filtering and 'mapping' in one step: I add the product_id explicitly to the products array for each object which satisfies my criterion. Notice that the first version first produces a complete filtered list which is then passed on to the map method which produces a complete mapped and filtered list: the list is iterated twice, and an intermediate array containing only the filtered objects exists in memory and is later collected as garbage. In the second version the list is only iterated once, and such an intermediate array of filtered objects is never allocated. Hence one might choose the latter for performance reasons. Now consider the same in Clojure or any lazy functional language. We might have (no active record here):

krukow:~/Projects/private/okooko-prod/tmp$ cl
Clojure
user=> (def recurring_activations '({:product_id 1 :period 2} 
                                    {:product_id 2 :period 3}
				    {:product_id 3 :period 2}))
#'user/recurring_activations
user=> (def res
         (map #(do (println "map") (:product_id %))
             (filter #(do (println "filter") (= 0 (mod (:period %) 2)))
                 recurring_activations)))
#'user/res
user=> 
Nothing has been mapped or filtered yet since Clojure is fully lazy. So res is now a lazy seq which will compute exactly what is needed on demand. Now think about what will happen if I just peek at the first element of res… In a non-lazy language, already the entire recurring_activations list would have been filtered (printing "filter" three times), then that result would have been mapped (printing "map" twice) and finally the first element would be looked up. So the output would be "filter" "filter" "filter" "map" "map" (return first element of list). What happens in Clojure? It prints "filter" "map" and gives the first element:

user=> (first res)
filter
map
1
user=> 
In effect we don't need an intermediate list containing the filtered elements which is then garbage collected. Notice also that if we just look at the entire list (recomputing res first) we get:

user=> res
(filter
map
filter
filter
map
1 3)
user=>
So the side effects are evaluated in a different order than had we realized the entire filtered list first. This can be counter intuitive when one is used to eager languages (which all mainstream languages are); however, once understood laziness can be extremely powerful and elegant. In effect, in lazy languages we can compose functions that work on entire sequences, e.g., map and filter, to obtain a lazy sequence which evaluates the all of the composed functions on each element in sequence. If that sounded abstract and poorly phrased, I can try to say it more concretely: In Ruby (or Java) I would need to write a new function: filter_and_map taking two "blocks"/"procs" and a list, then using e.g. "each" on the list and apply first the filter "proc" then the map "proc" -- the existing functions "map" and "select" can't help me. In Clojure (or Haskell) I can simply compose the existing library functions filter and map to obtain the same thing:

(def filter_map
	(comp (partial map :product_id)
	          (partial filter #(= 0 (mod (:period %) 2)))))
or just use it inline

user=> (def res
	(map :product_id
	 (filter #(= 0 (mod (:period %) 2)) recurring_activations)))
#'user/res
user=> res
(1 3)
user=>