viksit's blog

Decoding the US Military's Cyber Command Logo code

From this Wired article here, it looks like there's a number that is part of the cyber command's logo - 9ec4c12949a4f31474f299058ce2b22a. Well, its 32 characters long, and looks like a hash. Sure enough, a quick python check later of the organization's mission statement with md5 results in,

import hashlib
>>> hashlib.md5("USCYBERCOM plans, coordinates, integrates, synchronizes and conducts activities to: direct the operations and defense of specified Department of Defense information networks and; prepare to, and when directed, conduct full spectrum military cyberspace operations in order to enable actions in all domains, ensure US/Allied freedom of action in cyberspace and deny the same to our adversaries.").hexdigest()
'9ec4c12949a4f31474f299058ce2b22a'

Voila!

Mutable vs Immutable datastructures - Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea - and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

  • Mutable data structures - In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync - alter to do any modifications needed.
    • The advantage is that all data is being written to in place, and would be rather efficient.
    • The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.

 
{:filename "tab1name",
 :tuples
 #<Ref@511d89f8:
   #{{:recordid nil,
      :tupdesc
      {:x
       #<Ref@59a683e6:
         [{:type "int", :field "colid"}
          {:type "string", :field "name"}]>},
      :tup #<Ref@411a9435: {:colid 1, :name "akriti"}>}
     {:recordid nil,
      :tupdesc
      {:x
       #<Ref@59a683e6:
         [{:type "int", :field "colid"}
          {:type "string", :field "name"}]>},
      :tup #<Ref@424f8ad5: {:colid 2, :name "viksit"}>}}>,
 :tupledesc
 {:x
  #<Ref@59a683e6:
    [{:type "int", :field "colid"} {:type "string", :field "name"}]>}}

       

  • Immutable data structures - This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.
    • The advantage here is that having just one ref makes it very simple to serialize - simply de-ref the table, and then write the entire thing to a binary file.
    • The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.

 
#<Ref@4a3e7799:
  {:filename "tab1name",
   :tuples
   #{{:recordid nil,
      :tupdesc
      {:x
       [{:type "int", :field "colid"}
        {:type "string", :field "name"}]},
      :tup {:colid 1, :name "viksit"}}
     {:recordid nil,
      :tupdesc
      {:x
       [{:type "int", :field "colid"}
        {:type "string", :field "name"}]},
      :tup {:colid 1, :name "akriti"}}},
   :tupledesc
   {:x
    [{:type "int", :field "colid"} {:type "string", :field "name"}]}}

       

So at this point, which method would you recommend?

Serializing Clojure Datastructures

I've been trying to figure out how best to serialize data structures in Clojure, and discovered a couple of methods to do so. (Main reference thanks to a thread on the Clojure Google Group here )

(def box {:a 1 :b 2})

(defn serialize [o filename]
  (with-open [outp (-> (File. filename) java.io.FileOutputStream. java.io.ObjectOutputStream.)]
    (.writeObject outp o)))

(defn deserialize [filename]
  (with-open [inp (-> (File. filename) java.io.FileInputStream. java.io.ObjectInputStream.)]
    (.readObject inp)))

(serialize box "/tmp/ob1.dat")
(deserialize "/tmp/ob1.dat")

This works well for any Clojure data structure that is serializable. However, my objective is slightly more intricate - I'd like to serialize records that are actually refs. I see a few options for this,

- Either use a method that puts a record into a ref, rather than a ref into a record and then use the serializable, top level map
- Write my own serializer to print this to a file using clojure+read
- Use Java serialization functions directly.

Thoughts?

Ode to an Orange

A whiff of citrus - vibrant,
shiny, dimpled and thick,
your fingers move, probing
textural ecstacy,
as your tastes await
the sweet tartness within.
Peel away the layers
softly, envelop a piece,
let your tongue steep
in a myriad of flavors,
with the lingering scent
of summer under a blue sky,
look around,
and all is well again.

Stack implementation in Clojure II - A functional approach

My last post on the topic was creating a stack implementation using Clojure protocols and records - except, it used atoms internally and wasn't inherently "functional".

Here's my take on a new implementation that builds on the existing protocol and internally, always returns a new stack keeping the original one unmodified. Comments welcome!

(ns viksit-stack
  (:refer-clojure :exclude [pop]))

(defprotocol PStack
  "A stack protocol"
  (push [this val] "Push element in")
  (pop [this] "Pop element from stack")
  (top [this] "Get top element from stack"))

; A functional stack record that uses immutable semantics
; It returns a copy of the datastructure while ensuring the original
; is not affected.
(defrecord FStack [coll]
  PStack
  (push [_ val]
        "Return the stack with the new element inserted"
        (FStack. (conj coll val)))
  (pop [_]
       "Return the stack without the top element"
         (FStack. (rest coll)))
  (top [_]
       "Return the top value of the stack"
       (first coll)))

; The funtional stack can be used in conjunction with a ref or atom

viksit-stack> (def s2 (atom (FStack. '())))
#'viksit-stack/s2
viksit-stack> s2
#<Atom@69af0fcf: #:viksit-stack.FStack{:coll ()}>
viksit-stack> (swap! s2 push 10)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (swap! s2 push 20)
#:viksit-stack.FStack{:coll (20 10)}
viksit-stack> (swap! s2 pop)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (top @s2)
10

Resolving Chrome's SSL Error

I recently started getting a number of SSL related errors on accessing https links with Google Chrome on Ubuntu. One looks like,

107 (net::ERR_SSL_PROTOCOL_ERROR)

The top link on Google's search results is pretty fuzzy, so here's the solution that works for me.

Go to Settings -> Options -> Under the hood, and enable both SSL 2.0 and SSL 3.0. This should allow Chrome to talk to the server with either protocol.

There's also a DEFLATE bug that got fixed to solve this issue in release 340 something. http://codereview.chromium.org/1585041

Stack implementation in Clojure using Protocols and Records

I was trying to experiment with Clojure Protocols and Records recently, and came up with a toy example to clarify my understanding of their usage in the context of developing a simple Stack Abstract Data Type.

For an excellent tutorial on utilizing protocols and records in Clojure btw - check out Kotka.de - Memoize done right .

;; Stack example abstract data type using Clojure protocols and records
;; viksit at gmail dot com
;; 2010

(ns viksit.stack
  (:refer-clojure :exclude [pop]))

(defprotocol PStack
  "A stack protocol"
  (push [this val] "Push element into the stack")
  (pop [this] "Pop element from stack")
  (top [this] "Get top element from stack"))

(defrecord Stack [coll]
  PStack
  (push [_ val]
        (swap! coll conj val))
  (pop [_]
       (let [ret (first @coll)]
         (swap! coll rest)
         ret))
  (top [_]
       (first @coll)))

;; Testing
stack> (def s (Stack. (atom '())))
#'stack/s
stack> (push s 10)
(10)
stack> (push s 20)
(20 10)
stack> (top s)
20
stack> s
#:stack.Stack{:coll #<Atom@d2c9015: (20 10)>}
stack> (pop s)
20

More tutorial links on Protocols,

[1] http://blog.higher-order.net/2010/05/05/circuitbreaker-clojure-1-2/
[2] http://freegeek.in/blog/2010/05/clojure-protocols-datatypes-a-sneak-peek/
[3] http://groups.google.com/group/clojure/browse_thread/thread/b8620db0b742...

PyCassa vs Lazyboy (updated)

Update

As Hans points out in the comment below, it appears pycassa natively supports authentication with org.apache.cassandra.auth.SimpleAuthenticator. Lazyboy on the other hand doesn't by default.

It's not too hard to do it though. Intuitively, we could do something like this.

NB: Untested code!! I might create a patch for this when I get the time, so this is just an outline.

# Add this to lazyboy's connection package
from cassandra.ttypes import AuthenticationRequest

And in lazyboy's _connect() function, add another parameter called logins, that is a dict of keyspaces and credentials which looks like the following.

# logins format
{'Keyspace1' : {'username':'myuser', 'password':'mypass'}}

def _connect(self, logins):
"""Connect to Cassandra if not connected."""

    client = self._get_server()
    if client.transport.isOpen() and self._recycle:
        if (client.connect_time + self._recycle) > time.time():
            return client
        else:
            client.transport.close()
   
    elif client.transport.isOpen():
        return client
   
    try:
        client.transport.open()
        # Login code
        # Remember that client is an instance of Cassandra.Client(protocol)
        if logins is not None:
            for keyspace, credentials in logins.iteritems():
                request = AuthenticationRequest(credentials=credentials)
            client.login(keyspace, request)
   
        client.connect_time = time.time()
    except thrift.transport.TTransport.TTransportException, ex:
        client.transport.close()
        raise exc.ErrorThriftMessage(
            ex.message, self._servers[self._current_server])

Original Post
I've been looking to answer which Python library is currently more fully featured to use to communicate with Cassandra.

From Reddit,

API-wise, both look like they are pretty much basic wrappers around the Cassandra Thrift bindings. I'd prefer lazyboy over pycassa though, given that firstly, it's being used in production right now at Digg, and because it looks like lazyboy's connection code is more featured than pycassa.

and

The connection code (Lazyboy) seems to be much more suited for use in production (use of auto pooling, auto load balancing, integrated failover/retry, etc.) (than PyCassa)

Thanks to GitHub, I was able to do some analysis of their traffic and commits,

Traffic Data


LazyBoy


Pycassa

Commit Data


LazyBoy


Pycassa

A larger number of people know about LazyBoy but code commits on it are currently on a stand still. Pycassa on the other hand seems to be growing at a pretty fast rate.

It looks like LazyBoy is probably a better library to start with, for now. I'll talk about my experiences with both in another post.

Thrush Operators in Clojure (->, ->>)

I was experimenting with some sequences today, and ran into a stumbling block: Using immutable data structures, how do you execute multiple transformations in series on an object, and return the final value?

For instance, consider a sequence of numbers,

user> (range 90 100)
(90 91 92 93 94 95 96 97 98 99)

How do you transform them such that you increment each number by 1, and then get their text representation,

"[\\]^_`abcd"

Imperatively speaking, you would run a loop on each word, and transform the sequence data structure in place, and the last operation would achieve the desired result. Something like,

>>> s = ""
>>> a = [i for i in range(90,100)]
>>> a
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

>>> for i in range(0,len(a)):
...   s += chr(a[i]+1)
...
>>> s
'[\\]^_`abcd'

If you knew about maps in python, this could be achieved with something like,

>>> ''.join([chr(i+1) for i in range(90,100)])
'[\\]^_`abcd'

The easiest way to do this in Clojure is using the excellently named Thrush operator (-> and ->>). According the doc,

Threads the expr through the forms. Inserts x as the
second item in the first form, making a list of it if it is not a
list already. If there are more forms, inserts the first form as the
second item in second form, etc.

It is used like this,

user> (->> (range 90 100) (map inc) (map char) (apply str))
"[\\]^_`abcd"

Basically, the line, (-> 7 (- 3) (- 6)) implies that 7 be substituted as the first argument to -, to become (- 7 3). This result is then substituted as the first argument to the second -, to get (- 4 6), which returns -2.

user> (-> 7 (- 3) (- 6))
-2

Voila!

Stock Crash

This is what the stock market looked like at 2pm today.

From the Reuter's article,

The Dow suffered its biggest ever intraday point drop, which may have been caused by an erroneous trade entered by a person at a big Wall Street bank, multiple market sources said.

and the suspected cause? A UI Glitch!

In one of the most dizzying half-hours in stock market history, the Dow plunged nearly 1,000 points before paring those losses—all apparently due to a trader error.

According to multiple sources, a trader entered a "b" for billion instead of an "m" for million in a trade possibly involving Procter & Gamble [ PG 60.75 -1.41 (-2.27%) ], a component in the Dow. (CNBC's Jim Cramer noted suspicious price movement in P&G stock on air during the height of the market selloff. Watch.)

Sources tell CNBC the erroneous trade may have been made at Citigroup [ C 4.04 -0.14 (-3.35%) ].

"We, along with the rest of the financial industry, are investigating to find the source of today's market volatility," Citigroup said in a statement. "At this point we have no evidence that Citi was involved in any erroneous transaction."

According to a person familiar with the probe, one focus is on futures contracts tied to the Standard & Poor’s 500 stock index, known as E-mini S&P 500 futures, and in particular a two-minute window in which 16 billion of the futures were sold.

Citigroup’s total E-mini volume for the entire day was only 9 billion, suggesting that the origin of the trades was elsewhere, according to someone close to Citigroup’s own probe of the situation. The E-minis trade on the CME.

Syndicate content