BeautifulSoup on Cojure? Enlive

I was looking for a suitable library for Clojure that would work like Python’s BeautifulSoup or lxml – and found enlive.

An excellent tutorial here http://github.com/swannodette/enlive-tutorial.

Clojure AOT compilation tutorial

I was trying to figure out how to AOT compile a Clojure program, in order to really see some fast execution times. The simplest way to describe AOT compilation would be how its done in Java,

javac file.java
java file

The invocation of the Java compiler (javac) is the pre-compilation of the source file, which is then loaded by the JVM in the next step. In the case of Clojure, when a program is run using,

clj myfile.clj

The code is first compiled, and then executed – resulting in large amounts of time for the output to be displayed even for simple programs.

The AOT compile process turned out to be trickier to set up than I expected, so I thought I’d put it out there for all those who get stuck after reading the original documentation at http://clojure.org/compilation.

Firstly, the directory structure for my experiment looks like this,

Directory structure

I created dir1, and its subdirectories as below. The code is in clojure/examples/ and the classes/ directory is the default compile.path – something that the documentation neglected to mention. Without this path, compilation WILL fail.

/dir1/.clojure
/dir1/clojure/
/dir1/clojure/classes
/dir1/clojure/examples
/dir1/clojure/examples/hello.clj

The .clojure file

This file is used with my clj script, that can be obtained from here.
It contains a list of directories that are to be specified to the Clojure compiler at compile time. The file looks like,

/dir1:/dir1/classes

hello.clj

This code of course is the default from the Clojure website, as a test.

(ns clojure.examples.hello
(:gen-class))

(defn -main
[greetee]
(println (str "Hello " greetee "!")))

The next step is to invoke the Clojure REPL using the clj script, from the dir1 directory.

Type in the following to compile the program in the clojure.examples namespace,

Clojure 1.2.0-master-SNAPSHOT
user=> (compile 'clojure.examples.hello)
clojure.examples.hello
user=>

Success! And the resulting output of the classes directory is,

$ ls /dir1/classes/clojure/examples

hello$_main__5.class
hello$loading__4946__auto____3.class
hello.class
hello__init.class

And lastly, to run this program as any other Java program, you can use,

java -cp ./classes:/opt/jars/clojure.jar:/opt/jars/clojure-contrib.jar clojure.examples.hello Viksit
Hello Viksit!


Also, I highly recommend Stuart Holloway’s book “Programming Clojure”. Its turning out to be an excellent read.

Moving from MySQL to Cassandra – Pros and Cons

Moving on from the question of which NoSQL database you should choose, after reading these excellent posts from Digg and Twitter, I recently asked a question on StackOverflow regarding the pros and cons of moving from MySQL to Cassandra.

Stackoverflow Question is here [http://stackoverflow.com/questions/2332113/switching-from-mysql-to-cassandra-pros-cons]

I got some excellent insight and feedback, primarily from Jonathan Ellis, one of the maintainers of Cassandra, and a systems architect at Rackspace.

He’s also written a post on the Rackspace blog today as a follow up on the question.

I wanted to highlight a great tip he mentions (via Ian Eure of Digg, and also the creator of a Python Cassandra lib called LazyBoy) that was mentioned at the latest PyCon ’10,

Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.

Also mentioned are a couple of general caveats in using NOSQL vs Relational databases,

The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead. For analytics, the upcoming 0.6 release (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.

Looks like the Cassandra 0.6 beta is coming out tomorrow, and can already be built from repositories in case anyone’s interested in doing so (and telling me about their experiences!).

How “Aardvark was the 6th idea we tried”

Just came across an excellent blog post by Max Ventilla, the co-founder of Aardvark (a company that Google bought a few weeks ago). His description of how Aardvark was the 6th idea that he and his co-founders tried is pretty uplifting to anyone even remotely interested in entrepreneurship.

What I found interesting was that they would build a prototype and launch it to potential users, and see what the uptake was. If it didn’t work, they’d brainstorm some more and launch a new idea to see that worked. It definitely lends credence to the entire build, launch and iterate idea that most people proselytize about, but may not necessarily follow.

Some of these ideas are probably obvious to anyone who’s ever brainstormed about web services – I think the main problem with most people out there is that they get stuck on the execution, trying to make things perfect for launch – which negatively impacts the use and adoption of their product.

Another lesson to take home is to have the ability to take an idea from concept to execution REALLY quickly – which means having an established base of people, code, platforms and frameworks ready to start deploying an idea on. If you were to start from scratch each time, I’m not sure if you’re going to go too far!

Twang,
For posterity’s sake, here’s a list of the early ideas we rejected before committing to Aardvark:

Rekkit – A service to collect your ratings from across the web and give better recommendations to you. The system would also provide APIs to 3rd party websites so they could have richer profile data and better algorithms to do collaborative filtering.

Ninjapa – A way that you could open accounts in various applications through a single website and manage your data across multiple sites. You could also benefit from a single sign-on across the web and streamlined account creation, management, and cancellation.

The Webb – A central phone number that you could call and talk to a person who could do anything for you that you could do on the web. Your account information could be accessed centrally and sequences of simple web tasks could be done easily without the use of a computer.

Web Macros – A way to record sequences of steps on websites so that you could repeat common actions, even across sites, and share “recipes” for how you accomplished certain tasks on the web.

Internet Button Company – A way to package steps taken on a website and smart form-fill functionality. From a button, a user could accomplish tasks, even across multiple sites, quickly without having to leave the site or application where the button was embedded. People could encode buttons and share buttons a la social bookmarking.

Each of these ideas turned out to be interesting but not compelling. My cofounders and I would conceive of an idea, build it in very early prototype form, and get it in the hands of users. People might express enthusiasm for one idea or another but they wouldn’t actually use the product that, in admittedly raw form, offered the particular value proposition. In contrast, Aardvark (a chat buddy that could accept questions and have them answered by people in your network in real-time), got pretty immediate uptake.

As an aside, most of these ideas resemble products that venture funded startups have since brought to market. Even as I see much more impressive implementations of what we prototyped, I’m skeptical of their mass appeal.

Why I quit Google Buzz

Google Buzz seems to have mashed up a number of positive features from Twitter and Friendfeed into itself, and I quite like the idea – or rather, the vision it is supposed to espouse. Unfortunately, it is at a stage where too much of my private data is available to people I would much rather not allow access to.

So I quit it, removed all my buzzes, made my Google profile even more private than it was before, and thought – whew, I’m done. And then, all of a sudden, I’m bombarded with a ton of questions about why. This blog post is to coherently recount my thoughts (and have people correct me if I’m wrong).

First off – the question I hear most is – “Why not post privately to a group of people?”.

Easier said than done, unfortunately. Like many people, my gmail contacts list is a weird amalgamation of everyone from craiglist car-ad replies, to close friends, and even some colleagues. The thought of conversations with friends suddenly becoming visible to them is a bit unsettling.

Now how is this different from being on Twitter, you ask?

Well, in a number of ways. Anyone with a google account can follow you, and you’ve got to proactively block them from doing so – in my opinion, a slightly flawed strategy. I’d much rather prefer the Twitter/FB model of having *you* control who can or can’t follow you at the very beginning. [I also noticed some lag issues with Buzz wherein you may have followed or unfollowed or blocked someone, but 10 hours later, that operation seems never to have gone through. (Yes its a fledgling service, but still).

Coming back to private posts – Buzz does not offer a list of followers to post to. As a result, I don’t have the option of posting only to those active users on Buzz who follow me. Instead, I’m expected to create a list of people on my contact list who I can post to. And who do I see here? A list of contacts on gmail who I’ve interacted with most frequently – and who may not even be on Buzz! Is it really that hard implementing a “Post to Followers” feature?

Next – any time you mention a user with @, it gets autocompleted to @username@gmail.com – and this data is visible on your public google profile. While a twitter username is something that can’t very easily be mapped to a person, their email address is a whole new ball game. And I wouldn’t like my contact list being exposed to the world.

And its not just me. If I comment on a friend’s Buzz, and they haven’t bothered to make it private – this information is as easily obtained.

So till the time Buzz becomes a bit more private – I’m going to only follow Buzzes from a distance and not participate till I feel my issues with its privacy controls have been addressed.

Update:

Major privacy flak Google’s getting! Fugitivus has a very .. expressive .. blog post on this!

Misleading Youtube-Visa ad

Misleading Youtube-Visa ad on youtube.com. Try clicking on the “Close this ad” button or the “sound off” button. It just takes you to the visa-youtube homepage! Misdirection? I think so! (see bottom left – shows the link to the ad page)

On the Dubai financial crisis

.. a poem by Shelley comes to mind.


I met a traveller from an antique land
Who said: “Two vast and trunkless legs of stone
Stand in the desert. Near them on the sand,
Half sunk, a shattered visage lies, whose frown
And wrinkled lip and sneer of cold command
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them and the heart that fed.
And on the pedestal these words appear:
`My name is Ozymandias, King of Kings:
Look on my works, ye mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare,
The lone and level sands stretch far away”.

With prescient élan, the mention of the ‘trunkless legs of stone’ evokes the image of a forlorn tower of stone, much like the increasingly disrupted and abandoned, formerly high profile construction projects in Dubai. A shattered visage of the Bedouin of old, with a sneer on his wrinkled lips and the lone sands that stretch far away – complete the image of recklessness and decay, that now emanates from a city that strived to be a jewel in the crown of the middle east.

Will it ever recover?

Rendering a drop down box in Django using ModelChoiceField

Usually, if you were to use something like,

class testform(forms.Form):
n = forms.ModelChoiceField(queryset=Models.objects.filter(id=32773), empty_label="All")


you’ll end up with a drop down box populated with “M objects” rather than a field from the model.

Instead, this works better,


class vModelChoiceField(forms.ModelChoiceField):
def label_from_instance(self, obj):
return "%s" % obj.name

class testform(forms.Form):
n = vModelChoiceField(queryset=Models.objects.filter(id=32773), empty_label="All")

t = testform()
print t

And you’re done!

Cluud.in

Woo, finally.

Cluud.in is a new service to help you discover new places and follow conversations about your favorite places in your city. This means you can plan out where to go in real time – getting input from everyone who has talked about it! Neat.

Stephen Fry on America’s place in the world

From an address on “America’s place in the world” that Stephen Fry – one of my favorite comedians of all time – gave at the Royal Geographical Society (which you should totally read in full, btw – its hilarious superbly penned).

When referring to the well known idiom of making lemonade if life gives you lemons, he makes a pretty interesting point.

So let me look again at that holy text: ‘if life gives you lemons, make lemonade.’ Huh? But… but… Lemons are amongst the best and most wonderful gifts of nature. They are adaptable, versatile and delicious. A slice for your gin and tonic – juice to zing life into salads, stews, fish and seafood. Oil and sweetness from the rind and zest that is pure and perfumed and precious. They are a staple of what doctors agree is the best dietary regimen we can follow. So if life gives you lemons, shout ‘Thank you, Life, thank you!’ But the American response is ‘make lemonade’ in other words – just add sugar and sell it.

Add sugar and sell it. This can be translated across into culture, can it not?

How very true.