functional

Mutable vs Immutable datastructures – Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea – and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

  • Mutable data structures – In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync – alter to do any modifications needed.

    • The advantage is that all data is being written to in place, and would be rather efficient.
    • The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.

  • Immutable data structures - This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.

    • The advantage here is that having just one ref makes it very simple to serialize – simply de-ref the table, and then write the entire thing to a binary file.
    • The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.

So at this point, which method would you recommend?

Serializing Clojure Datastructures

I’ve been trying to figure out how best to serialize data structures in Clojure, and discovered a couple of methods to do so. (Main reference thanks to a thread on the Clojure Google Group here )

This works well for any Clojure data structure that is serializable. However, my objective is slightly more intricate – I’d like to serialize records that are actually refs. I see a few options for this,

– Either use a method that puts a record into a ref, rather than a ref into a record and then use the serializable, top level map
– Write my own serializer to print this to a file using clojure+read
– Use Java serialization functions directly.

Thoughts?

Thrush Operators in Clojure (->, ->>)

I was experimenting with some sequences today, and ran into a stumbling block: Using immutable data structures, how do you execute multiple transformations in series on an object, and return the final value?

For instance, consider a sequence of numbers,

How do you transform them such that you increment each number by 1, and then get their text representation,

Imperatively speaking, you would run a loop on each word, and transform the sequence data structure in place, and the last operation would achieve the desired result. Something like,

If you knew about maps in python, this could be achieved with something like,

The easiest way to do this in Clojure is using the excellently named Thrush operator (-> and ->>). According the doc,

Threads the expr through the forms. Inserts x as the
second item in the first form, making a list of it if it is not a
list already. If there are more forms, inserts the first form as the
second item in second form, etc.

It is used like this,

Basically, the line, (-> 7 (- 3) (- 6)) implies that 7 be substituted as the first argument to -, to become (- 7 3). This result is then substituted as the first argument to the second -, to get (- 4 6), which returns -2.

Voila!

Clojure AOT compilation tutorial

I was trying to figure out how to AOT compile a Clojure program, in order to really see some fast execution times. The simplest way to describe AOT compilation would be how its done in Java,

javac file.java
java file

The invocation of the Java compiler (javac) is the pre-compilation of the source file, which is then loaded by the JVM in the next step. In the case of Clojure, when a program is run using,

clj myfile.clj

The code is first compiled, and then executed – resulting in large amounts of time for the output to be displayed even for simple programs.

The AOT compile process turned out to be trickier to set up than I expected, so I thought I’d put it out there for all those who get stuck after reading the original documentation at http://clojure.org/compilation.

Firstly, the directory structure for my experiment looks like this,

Directory structure

I created dir1, and its subdirectories as below. The code is in clojure/examples/ and the classes/ directory is the default compile.path – something that the documentation neglected to mention. Without this path, compilation WILL fail.

/dir1/.clojure
/dir1/clojure/
/dir1/clojure/classes
/dir1/clojure/examples
/dir1/clojure/examples/hello.clj

The .clojure file

This file is used with my clj script, that can be obtained from here.
It contains a list of directories that are to be specified to the Clojure compiler at compile time. The file looks like,

/dir1:/dir1/classes

hello.clj

This code of course is the default from the Clojure website, as a test.

(ns clojure.examples.hello
(:gen-class))

(defn -main
[greetee]
(println (str "Hello " greetee "!")))

The next step is to invoke the Clojure REPL using the clj script, from the dir1 directory.

Type in the following to compile the program in the clojure.examples namespace,

Clojure 1.2.0-master-SNAPSHOT
user=> (compile 'clojure.examples.hello)
clojure.examples.hello
user=>

Success! And the resulting output of the classes directory is,

$ ls /dir1/classes/clojure/examples

hello$_main__5.class
hello$loading__4946__auto____3.class
hello.class
hello__init.class

And lastly, to run this program as any other Java program, you can use,

java -cp ./classes:/opt/jars/clojure.jar:/opt/jars/clojure-contrib.jar clojure.examples.hello Viksit
Hello Viksit!


Also, I highly recommend Stuart Holloway’s book “Programming Clojure”. Its turning out to be an excellent read.