2018-01-06

Transducers

One of the nice things that you’ll come across in Clojure is transducer. In this post I’ll go over what transducers are, how you can use them, how you can make one, and what transducible contexts are.

What are transducers?

In simple terms, transducers are composable transformation pipelines. A transducer does not care about where the input comes from or where the output will go; it simply cares about the transformation of the data that that flows through the pipeline. Let’s look at an example:

(let [xf (comp 
           (map inc)
           (filter even?))]
  (sequence xf (range 10)))

Here xf (external function) is our transducer which will increment every number and then will keep only the even numbers. Calling sequence functions like map, filter, etc. with single arity returns a transducer which you can then compose. The transducer doesn’t know where it will be used - will it be used with a collection or with a channel? So, a transducer captures the essence of your transformation. sequence is responsible for providing the input to the transducer. This is the context in which the transducer will run.

Here’s how the same thing can be done using threading macro:

1
2
3

(->> (range 10)
     (map inc)
     (filter even?))

The difference here is that the 2-arity version of map and filter will create intermediate collections while the 1-artity versions won’t. Transducers are much more efficient than threading together sequence functions.

Chaining
Transducing

Source for images

Inside a transducer

Let’s look at the 1-arity version of map and see what makes a transducer.

(defn map
  ([f]
    (fn [rf]
      (fn
        ([] (rf))
        ([result] (rf result))
        ([result input]
           (rf result (f input)))
        ([result input & inputs]
           (rf result (apply f input inputs))))))
   ...)

When you call 1-arity version of map you get back a transducer which, as shown above, is a function. Functions like map, filter, etc. take a collection and return a collection. Transducers, on the otherhand, take one reducing function and return another. The function returned is expected to have three arities:

0-arity (init): This kickstarts the transformation pipeline. The only thing you do here is call the reducing function `rf`.

2-arity (step): This is where you'll perform the transformation. You get the result so far and the next input. In case of `map`, you call the reducung function `rf` by applying the function `f` to the input. How the value is going to be added to the result is the job of `rf`. If you don't want to add anything to the result, just return the result as-is. You may call `rf` once, multiple times, or not at all.

1-arity (end): This is called when the transducer is terminating. Here you must call `rf` exactly once and call the 1-arity version. This results in the production of the final value.

So, the general form of a transducer is this:

(fn [rf]
  (fn 
    ([] (rf))
    ([result] (rf result))
    ([result input] ... )))

Using transducers

You can use a transducer in a context. There’s four contexts which come out of the box — into, transduce, sequence, and educe.

into

The simplest way to use a transducer is to pass it to into. This will add your transformed elements to an already-existing collection after applying the transducer. In this example, we’re simply adding a range into a vector.

1
2
3

(let [xf (comp (map inc)
               (map dec))]
  (into [] xf (range 10)))

Internally, into calls transduce.

transduce

transduce is similar to the standard reduce function but it also takes an additional xform as an argument.

1
2
3

(let [xf (comp (map inc)
               (map dec))]
  (transduce xf + (range 10)))

sequence

sequence lets you create a lazy sequence after applying a transducer. In contrast, into and transduce are eager.

1
2
3

(let [xf (comp (map inc)
               (map dec))]
  (sequence xf (range 10)))

eduction

eduction lets you capture applying a transducer to a collection. The value returned is an iterable application of the transducer on the collection items which can then be passed to, say, reduce.

(let [xf (comp (map inc)
               (map dec))
      coll (eduction xf (range 10))]
  (reduce + 0 coll))

Inside a transducible context

As mentioned before, transducers run in transducible contexts. The ones that come as a part of clojure.core would suffice most real-world needs and you’ll rarely see yourself writing new ones. Let’s look at transduce.

(defn transduce
  ([xform f coll] (transduce xform f (f) coll))
  ([xform f init coll]
     (let [f (xform f)
           ret (if (instance? clojure.lang.IReduceInit coll)
                 (.reduce ^clojure.lang.IReduceInit coll f init)
                 (clojure.core.protocols/coll-reduce coll f init))]
       (f ret))))

transduce is just like reduce. The 3-arity version expects an initial value to be supplied by calling the 0-arity version of the supplied function. The 4-arity version is slightly more involved. IReduceInit is an interface implemented by collections to let them provide an initial value. It lets a collection reduce itself. If not, the call goes to coll-reduce which is a faster way to reduce a collection than using first/next recursion.

Stateful transducers

It’s possible for transducers to maintain reduction state.

(defn multiply-xf
  []
  (fn [rf]
    (let [product (volatile! 1)]
      (fn
        ([] (rf))
        ([result] (rf result))
        ([result input]
         (let [new-product (vswap! product * input)]
           (rf result new-product)))))))

Here’s a transducer which will multiply all the incoming numbers. We maintain state by using a Volatile. Whenever we get a new input we multiply it with the product and update the state of Volatile using vswap!. Let’s see this in action:

1 2	(into [] (multiply-xf) [1 2 3]) => [1 2 6]

Early Termination

The way the above transducer is written, it’ll process all the inputs even if one of the inputs is zero. We know that once we encounter a zero, we can safely end the reduction process and return a zero. reduced lets you return a reduced value and end the reduction. Let’s make a minor change to the above transducer and add in early termination.

(defn multiply-xf
  []
  (fn [rf]
    (let [product (volatile! 1)]
      (fn
        ([] (rf))
        ([result] (rf result))
        ([result input]
         (let [new-product (vswap! product * input)]
           (if (zero? new-product)
             (reduced result)
             (rf result new-product))))))))

In the 2-arity function, we check if the new-product is zero. If it is, we know we have a reduced value. We end the reduction by returning the result we have so far. Let’s see this in action:

(into [] (multiply-xf) [1 2 3])
=> [1 2 6]
(into [] (multiply-xf) [1 2 0 3])
=> [1 2]

Conclusion

Transducers can be a very useful tool in your Clojure toolkit that let you process large collections, channels, etc. effectively by letting you make composable transformation pipelines that process one element at a time. They require a little getting used-to but once you’re past the learning curve, performance galore!