Mutable vs Immutable datastructures – Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea – and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

  • Mutable data structures – In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync – alter to do any modifications needed.

    • The advantage is that all data is being written to in place, and would be rather efficient.
    • The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.

 
{:filename "tab1name",
 :tuples
 #},
      :tup #}
     {:recordid nil,
      :tupdesc
      {:x
       #},
      :tup #}}>,
 :tupledesc
 {:x
  #}}

	

  • Immutable data structures – This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.

    • The advantage here is that having just one ref makes it very simple to serialize – simply de-ref the table, and then write the entire thing to a binary file.
    • The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.

 
#

So at this point, which method would you recommend?