clojure

Practical applications of Y Combinators in Clojure

When the term Y Combinator is mentioned in most technology circles today, the image it usually conjures up is of the well known startup incubator with their orange logo and a list of famous funded companies. There’s a very interesting reason they chose this name, and their FAQ page 1 gives some context on why they did so,

The Y combinator is one of the coolest ideas in computer science. It’s also a metaphor for what we do. It’s a program that runs programs; we’re a company that helps start companies.

Which of course begs the question –  why are Y Combinators so cool and are there any real reasons to know about them?

This post aims to provide insight into some practical applications of Y Combinators, and I highly recommend reading some of the following resources on the topic that do a great job in articulating the theory behind them. 2 3 4

A gentle introduction

Let’s assume we want to write a recursive function that does something useful, but with a catch – we don’t want to invoke this recursive function directly from within itself. Why would we want to do this? Multiple reasons might force us to – perhaps we’d like to avoid stack overflows, or perhaps there are other deeper reasons which are out of the scope of this post. 5 6. Irrespective – how then would we go about doing so?

In other words, the problem we’re trying to solve is to “introduce a self reference to the function, without a direct self reference”.

Before diving into the programming aspects, let’s consider this in a purely mathematical form. A recursive function can be defined partly by itself. So, in the case of the Fibonacci series,


\text{Fib}(0)=0\text{ as base case 1,}
\text{Fib}(1)=1\text{ as base case 2,}
\text{For all integers }n>1,~\text{ Fib}(n):=\text{Fib}(n-1) + \text{Fib}(n-2).

Now, how would we give such a recursive function an input, such that its output is the same as its input? For example, in the following function,

\text{F}(x) = x * x

there are two inputs: 0 and 1, for which the function’s output is the same as the input provided to it. This is defined as the fixed-point of this function.

If we can now find a fixed-point p of F such that F(p) is equivalent to p, we can use F(p) or p interchangeably (since they are the same thing) as the “recursive” function without direct self-reference 7.

It turns out that for any generic λ-expression f , (\lambda x. f(x x))(\lambda x. f(x x)) is a fixed-point of f 8.

Given this, we can build a function that returns a fixed-point for any function f by taking the function in as an argument:

\lambda f. (\lambda x. f(x x))(\lambda x. f(x x))

This is known as the fixed point or Y combinator. Therefore, for any function f, Y(f) is a fixed-point of f. That is, f(Y(f)) is equivalent to Y(f).

A real world example

So far so good. But how is this useful and why should you know about it? Let’s examine the case of a Clojure function that gives us the sum of a sequence of numbers 9.

;; An example of a recursive function to sum a sequence of numbers
(defn sum-seq [x]
 (if (empty? x)
  0
  (+ (first x) (sum-seq (rest x))))) ;; invokes itself

;; (sum-seq [1 9])
;; => 10

As we can see, this function invokes itself  at the very last line, which is the kind of behavior we’re trying to avoid. How would we write this without a direct self reference? One way to think about this is to express it as a sequence of function calls that starts by being given a function, and returns to us not the result itself, but a sequence of “next” functions to compute the sum of that sequence. Here’s an example,

;; Function that returns the next function to compute the sum of a sequence of numbers
(defn sum-seq-fn-gen [func]
  (fn [s]
    (if (empty? s)
      0
      (+ (first s) (func (rest s))))))

;; user> ((sum-seq-fn-gen nil) [])
;;  0

;;  user> ((sum-seq-fn-gen (sum-seq-fn-generator nil)) [9])
;;  9

;;  user> ((sum-seq-fn-gen (sum-seq-fn-gen (sum-seq-fn-gen nil))) [1 9])
;;  10

This looks like it works. But as we can see, in order to get the sum for a two element vector, we needed to invoke this function thrice – clearly not a great use of our editing skills or time. How would we make this simpler? By simply finding the fixed point for this function, we could achieve what we set out to do! 10. Let’s start by writing a fixed point combinator.

;; Y combinator
(defn Y [m] ;;  λf
  ((fn [future] ;; λx
     (future future)) ;; f(x x)
   (fn [future] ;; λx
     (m (fn [arg]
          ((future future) arg)))))) ;; f(x x)

If we were to pass the sum-seq-gen function to this Y combinator we just wrote above, life becomes much simpler.

;; user> ((Y sum-seq-fn-gen) [1 2 3])
;; 6

;; user> ((Y sum-seq-fn-gen) (range 0 1000))
;; 499500

So far so good. But the real power of a combinator isn’t just that it allows us to just write non-direct-reference recursive functions. It is that it allows us to create very useful wrappers around our functions, which can allow us to achieve all sorts of cool things without ever needing to rewrite our original function. As an example, let’s consider the use case of needing to log every internal function call that’s going on in the sum-seq-fn function. In the regular programming model, we would need to add these log lines to the sum-seq-fn itself, which is a huge overhead. But by using combinators, we can just define a LoggingY that will do this for us.

;; A logging fixed point combinator
(defn LoggingY [r]
  ((fn [f]
     (do
       (prn "Logging call: " f)
       (f f)))
   (fn [f]
     (do
       (prn "Logging within second f: " f)
       (r (fn [x]
            (do
              (prn "logging within third fn: " x)
              ((f f) x))))))))

viksit.explorations> ((LoggingY sum-seq-fn-gen) [1 2 3])

"Logging call: " #<explorations$LoggingY$fn__14921 viksit.explorations$LoggingY$fn__14921@1ed99bdc>
"Logging within second f: " #<explorations$LoggingY$fn__14921 viksit.explorations$LoggingY$fn__14921@1ed99bdc>
"logging within third fn: " (2 3)
"Logging within second f: " #<explorations$LoggingY$fn__14921 viksit.explorations$LoggingY$fn__14921@1ed99bdc>
"logging within third fn: " (3)
"Logging within second f: " #<explorations$LoggingY$fn__14921 viksit.explorations$LoggingY$fn__14921@1ed99bdc>
"logging within third fn: " ()
"Logging within second f: " #<explorations$LoggingY$fn__14921 viksit.explorations$LoggingY$fn__14921@1ed99bdc>
6

Without having to change the original function, we’ve just added some deep instrumentation into our function.

A memoization example

Let’s consider a slightly more non trivial example. What if we wanted to make a recursive function more efficient by introducing memoization? Could we write a generic non-recursive function and then apply an equally generic combinator to memoize it? Absolutely!

For this exercise, let’s define a more generic fixed point U combinator 11, which applies an “abstract” function myapply on to the function f. We can use the freedom of choosing myapply, for example, to transparently interpose memoization.

;;  U = λh.(h h)
;; Generic U combinator
(defn U [f]
  (f f))
;; More generic function that can take an application function
(defn UM [myapply f]
  (defn g [v]
    (fn [args]
      (myapply (f v) args)))
  (f g))
;; A non recursive function that gives us the nth fibonacci number
(defn fib-nr [f]
  (fn [n]
    (if (< n 2) 1
        (+ ((f f) (- n 1))
           ((f f) (- n 2))))))

We can now create a combinator ready 12 function that returns an anonymous function that will cache a function’s arguments and results for it.

(defn make-memoizer []
  (let [application-cache (atom {})]
    (fn [function & args]
        (if-let [e (find @application-cache args)]
          (val e)
          (let [result (apply function args)]
            (swap! application-cache assoc args result)
            result)))))

;; Time taken to fetch the 38th fibonacci number
viksit.explorations> (time ((U fib-nr) 38))
"Elapsed time: 9700.194 msecs"
63245986

;; Time taken to fetch the same, but with memoization (!)
viksit.explorations> (time ((UM (make-memoizer) fib-nr) 38))
"Elapsed time: 0.153 msecs"
63245986

As we can see, by never having to modify the original function, we’ve been able to use a variety of specific or generic combinators to provide wrappers that can help in making programs more optimal, easier to debug and ultimately, fine-tune.


 

  1. http://www.ycombinator.com/faq/
  2. https://medium.com/@ayanonagon/the-y-combinator-no-not-that-one-7268d8d9c46
  3. http://matt.might.net/articles/implementation-of-recursive-fixed-point-y-combinator-in-javascript-for-memoization
  4. https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-26.html#%_sec_4.1
  5. http://en.wikipedia.org/wiki/Stack_overflow
  6. http://aegis.sourceforge.net/auug97.pdf
  7. Seriously, highly recommend reading up on what fixed point combinators really mean
  8. http://en.wikipedia.org/wiki/Lambda_calculus
  9. Courtesy http://www.fatvat.co.uk/2009/04/understanding-y-combinator.html
  10. This almost seemed like magic the first time I encountered it!
  11. http://lambda-the-ultimate.org/classic/message5463.html
  12. http://stackoverflow.com/questions/15859673/fixed-point-combinators

Reading Java properties file in Clojure

A simple and effective way to read properties files in Clojure, since they transform into Clojure maps!

(into {} (doto (java.util.Properties.)
         (.load (-> (Thread/currentThread)
         (.getContextClassLoader)
         (.getResourceAsStream "log4j.properties")))))

Next, to actually read this in, using atoms to swap the values like this seems to work,

(def *args* 
  (atom {:a 10, :b 20})) 

(defn -main [] 
  (swap! *args* assoc :a (read) :b (read)))

Mutable vs Immutable datastructures – Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea – and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

  • Mutable data structures – In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync – alter to do any modifications needed.

    • The advantage is that all data is being written to in place, and would be rather efficient.
    • The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.

 
{:filename "tab1name",
 :tuples
 #},
      :tup #}
     {:recordid nil,
      :tupdesc
      {:x
       #},
      :tup #}}>,
 :tupledesc
 {:x
  #}}

	

  • Immutable data structures – This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.

    • The advantage here is that having just one ref makes it very simple to serialize – simply de-ref the table, and then write the entire thing to a binary file.
    • The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.

 
#

So at this point, which method would you recommend?

Serializing Clojure Datastructures

I’ve been trying to figure out how best to serialize data structures in Clojure, and discovered a couple of methods to do so. (Main reference thanks to a thread on the Clojure Google Group here )

(def box {:a 1 :b 2})

(defn serialize [o filename]
  (with-open [outp (-> (File. filename) java.io.FileOutputStream. java.io.ObjectOutputStream.)]
    (.writeObject outp o)))

(defn deserialize [filename]
  (with-open [inp (-> (File. filename) java.io.FileInputStream. java.io.ObjectInputStream.)]
    (.readObject inp)))

(serialize box "/tmp/ob1.dat")
(deserialize "/tmp/ob1.dat")

This works well for any Clojure data structure that is serializable. However, my objective is slightly more intricate – I’d like to serialize records that are actually refs. I see a few options for this,

– Either use a method that puts a record into a ref, rather than a ref into a record and then use the serializable, top level map
– Write my own serializer to print this to a file using clojure+read
– Use Java serialization functions directly.

Thoughts?

Stack implementation in Clojure II – A functional approach

My last post on the topic was creating a stack implementation using Clojure protocols and records – except, it used atoms internally and wasn’t inherently “functional”.

Here’s my take on a new implementation that builds on the existing protocol and internally, always returns a new stack keeping the original one unmodified. Comments welcome!

(ns viksit-stack
  (:refer-clojure :exclude [pop]))

(defprotocol PStack
  "A stack protocol"
  (push [this val] "Push element in")
  (pop [this] "Pop element from stack")
  (top [this] "Get top element from stack"))

; A functional stack record that uses immutable semantics
; It returns a copy of the datastructure while ensuring the original
; is not affected.
(defrecord FStack [coll]
  PStack
  (push [_ val]
	"Return the stack with the new element inserted"
	(FStack. (conj coll val)))
  (pop [_]
       "Return the stack without the top element"
	 (FStack. (rest coll)))
  (top [_]
       "Return the top value of the stack"
       (first coll)))

; The funtional stack can be used in conjunction with a ref or atom

viksit-stack> (def s2 (atom (FStack. '())))
#'viksit-stack/s2
viksit-stack> s2
#
viksit-stack> (swap! s2 push 10)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (swap! s2 push 20)
#:viksit-stack.FStack{:coll (20 10)}
viksit-stack> (swap! s2 pop)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (top @s2)
10

Stack implementation in Clojure using Protocols and Records

I was trying to experiment with Clojure Protocols and Records recently, and came up with a toy example to clarify my understanding of their usage in the context of developing a simple Stack Abstract Data Type.

For an excellent tutorial on utilizing protocols and records in Clojure btw – check out Kotka.de – Memoize done right .


;; Stack example abstract data type using Clojure protocols and records
;; viksit at gmail dot com
;; 2010

(ns viksit.stack
  (:refer-clojure :exclude [pop]))

(defprotocol PStack
  "A stack protocol"
  (push [this val] "Push element into the stack")
  (pop [this] "Pop element from stack")
  (top [this] "Get top element from stack"))

(defrecord Stack [coll]
  PStack
  (push [_ val]
	(swap! coll conj val))
  (pop [_]
       (let [ret (first @coll)]
	 (swap! coll rest)
	 ret))
  (top [_]
       (first @coll)))

;; Testing
stack> (def s (Stack. (atom '())))
#'stack/s
stack> (push s 10)
(10)
stack> (push s 20)
(20 10)
stack> (top s)
20
stack> s
#:stack.Stack{:coll #}
stack> (pop s)
20

More tutorial links on Protocols,

[1] http://blog.higher-order.net/2010/05/05/circuitbreaker-clojure-1-2/
[2] http://freegeek.in/blog/2010/05/clojure-protocols-datatypes-a-sneak-peek/
[3] http://groups.google.com/group/clojure/browse_thread/thread/b8620db0b7424712

Thrush Operators in Clojure (->, ->>)

I was experimenting with some sequences today, and ran into a stumbling block: Using immutable data structures, how do you execute multiple transformations in series on an object, and return the final value?

For instance, consider a sequence of numbers,

user> (range 90 100)
(90 91 92 93 94 95 96 97 98 99)

How do you transform them such that you increment each number by 1, and then get their text representation,

"[\\]^_`abcd"

Imperatively speaking, you would run a loop on each word, and transform the sequence data structure in place, and the last operation would achieve the desired result. Something like,


>>> s = ""
>>> a = [i for i in range(90,100)]
>>> a
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

>>> for i in range(0,len(a)):
...   s += chr(a[i]+1)
... 
>>> s
'[\\]^_`abcd'

If you knew about maps in python, this could be achieved with something like,

>>> ''.join([chr(i+1) for i in range(90,100)])
'[\\]^_`abcd'

The easiest way to do this in Clojure is using the excellently named Thrush operator (-> and ->>). According the doc,

Threads the expr through the forms. Inserts x as the
second item in the first form, making a list of it if it is not a
list already. If there are more forms, inserts the first form as the
second item in second form, etc.

It is used like this,

user> (->> (range 90 100) (map inc) (map char) (apply str))
"[\\]^_`abcd"

Basically, the line, (-> 7 (- 3) (- 6)) implies that 7 be substituted as the first argument to -, to become (- 7 3). This result is then substituted as the first argument to the second -, to get (- 4 6), which returns -2.

user> (-> 7 (- 3) (- 6))
-2

Voila!

Implementing Binary Search with Clojure

I was trying to implement a simple binary search using a purely functional approach, and after much hacking, googling and wikibooking, came up with this in Clojure.


(defn binarysearch
( [lst n]
(binarysearch lst 0 (dec (count lst)) n))
( [lst lb ub n]
(if (> lb ub) -1 ; this is the case where no element is found
(let [mid (quot (+ lb ub) 2)
mth (nth lst mid)]
(cond
; mid > n, so search lower
(> mth n) (recur lst lb (dec mid) n)
; mid < n, search upper (< mth n) (recur lst (inc mid) ub n) ; else, found, return index (= mth n) mid)))))

Clojure Application with Command Line Arguments

I was recently looking for a method to create an application with Clojure that would allow specification of command line arguments.

I came across an excellent post on Stack Overflow by alanlcode , that provides a spectacular example. I’ve Github’d it for reference.

20 Days of Clojure

Came across an excellent series of blog posts by Lou Franco, where he uses the SICP videos as input to learn more about Clojure.

His explanation of HashMap implementations in Clojure, using multimethods, as well as pointers on parallelizing functional programs are very well written. I’m currently on his day 10 post.