The need for Clojure

It is a need usually provoked after experiencing an emotional conclusion to a difficult life event, such as the breakdown of a close interpersonal relationship [or an unreliable concurrent program, editor] or the death of loved one [or an old familiar programming language, editor]… A person with a high need for closure prefers order and predictability and is decisive [...]

[selected, edited parts from Wikipedia on the need for closure ;-) ]

I’ve just decided to put all my pet projects and programming languages aside to dedicate myself to learning one single language, with the ambition to eventually master it. That language is Clojure.

To force myself to reflect on my learning process and to help the community to the best of my ability, I will write a number of blog postings about my experiences with Clojure. There is already a lot of truly outstanding quality information about the theory and ideas behind Clojure around the net (mostly this excellent quality is due to fact the the creator of Clojure, Rich Hickey, is a great speaker). However, to my knowledge, there is less information about how to actually program Clojure. So this series of postings will explore programming Clojure from a beginners perspective — hopefully others may learn from my mistakes and insights.

In other words: This series is about Beginning Clojure in Practice.

Update: Sat. Oct. 18, 2008: A few points about Clojure, and links to better information.

What is Clojure?
Clojure is a relatively young language (recently, it’s 1-year birthday was celebrated). As I am beginning clojure, I am certainly not the best source of information; however, I would like to share my personal top 5 reasons why Clojure stands out in the ocean of programming languages that seems to be growing only faster by the day. If you want to know more about clojure, I strongly encourage reading the entire homepage and watching the videos on blip.tv.

Why Clojure?

  • Clojure is a LISP (perhaps it is more precise to say that it is strongly inspired by LISP, in some sense it is more general). Most importantly this means the full power of macros.
  • It is compiled to the JVM, and designed to embrace it. This means making use of a powerful infrastructure, e.g., garbage collection and a powerful, mature, optimizing JIT. Rich Hickey makes a good point on distinguishing languages that ‘live on’ the JVM (like JRuby or Jython) and languages that are designed for the JVM like (Groovy and Scala). Since JRuby and Jython are ‘ports’ of a language specified elsewhere, they have to fit on the JVM for better or worse (for example no call/cc in JRuby). In contrast, Clojure is designed for the JVM: Rich calls this ‘embracing it.’
  • Java interoperability. Clojure is designed to inter-operate with Java; there is language-level support [syntax, even ;-) ] for interacting with Java (no wrappers). This means that the enormous number of already existing Java libraries can be used from Clojure. This solves the ‘library problem’ (see below).
  • Clojure has an opinion about how to do in-process concurrency (my favorite point): First, immutability is good; it is the default, and all Clojure data structures are immutable. Yet, Clojure accepts that sometimes mutability is needed and even desired, and support for a mutability is provided in the form of Refs and Agents. These are programming language constructs that allow for mutability, but with a concurrency semantics. Refs come with a software transactional memory system, and agents allow an in-process asynchronous programming model (similar to, but not equal Erlang’s actors). Go and read Rich’s short essay: On state and identity; once you get a feel for the language, read it again ;-) . Persistent data structures, are an amazing and important part of Clojure’s concurrency model. This is what makes immutability viable in Clojure. An ‘update’ operation on a persistent data structure doesn’t actually change the data structure (it is immutable), but returns a new data structure (also immutable) representing the updated structure. Now, one might think this entails copying the entire structure; amazingly via structural sharing this is not the case. Clojures data structures (e.g., a hash map) are immutable and persistent yet retain the performance guarantees of their mutable counter-parts (Rich claims performance from faster-than-Java to up-to at most 4x Java speed). More on this in a later posting.
  • Dynamic and mostly functional; this speaks for itself.

I am really looking forward to exploring this language in depth and in practice, and to sharing what I find with anyone actually reading this blog ;-)

Finally, just as a thought: Eric Kidd wrote about choosing between Ruby and LISP.

So if LISP is still more powerful than Ruby, why not use LISP? The typical objections to programming in LISP are:

  • There aren’t enough libraries.
  • We can’t hire LISP programmers.
  • LISP has gone nowhere in the past 20 years.

My reply to this would be: Well, with Clojure, LISP is definitely going somewhere; only the second objection applies (and perhaps with time this will no longer be the case, who knows).

Posted in Clojure | Tagged , , | 6 Comments

Jeene update: Performance, features and backlog

Don’t know what Jeene is? It will make your JavaScript code run faster ;-) read the intro first,
then come back to this!

Performance examples

I was curious about the performance gains of functions specialized via Jeene, so I ran a simple performance example in Firefox 2 and 3; Safari 3.1.2; Opera 9.52; SquirrelFish Extreme (the new JIT); IE 6, 7 ; Google’s Chrome and finally, TraceMonkey. All with great results! You can run the simple example here.

The test executes the string concatenation example from the previous post. The actual example is not that interesting; what is interesting is that Jeene works in all current popular JavaScript implementations and that it gives good performance boosts also in the new JITs: V8, TraceMonkey and SquirrelFish Extreme. The figures include the time it takes to specialize the function, so they are actual performance gains. (Note that the Windows and Mac tests were on different hardware).

  • Firefox 2: (Windows Vista)
    1. Without specialization: 1430ms;
    2. With specialization: 635ms;
    3. Specialization reduces time by approx. 56%.
  • Firefox 3: (Mac)
    1. Without specialization: 386ms;
    2. With specialization: 230ms;
    3. Specialization reduces time by approx. 40%.
  • Opera 9.52: (Mac)
    1. Without specialization: 502ms;
    2. With specialization: 251ms;
    3. Specialization reduces time by approx. 50%.
  • Firefox 3.1 – TraceMonkey (Mac)
    1. Without specialization: 302;
    2. With specialization: 152ms;
    3. Specialization reduces time by approx. 50%.
  • Chrome – V8: (Windows Vista)
    1. Without specialization: 227ms;
    2. With specialization: 83ms;
    3. Specialization reduces time by approx. 63%.
  • IE6/7 (approx. the same numbers) (Windows Vista)
    1. Without specialization: 2672ms;
    2. With specialization: 1485ms;
    3. Specialization reduces time by approx. 44%.
  • Safari 3.1.2 (Mac)
    1. Without specialization: 378ms;
    2. With specialization: 157ms;
    3. Specialization reduces time by approx. 58%.
  • WebKit nightly — SquirrelFish Extreme (Mac)
    1. Without specialization: 247ms;
    2. With specialization: 94ms;
    3. Specialization reduces time by approx. 61%.

What can Jeene do right now?

Specialize JavaScript functions written in simplified JavaScript which contain
the following constructs (note as of now you can only specialize with primitive values,
e.g., strings and numbers; this will be fixed soon):

  • variables which occur in the function’s formal parameter
  • literals and constants (“abc”, 42, true, null)
  • simple assignment ‘=’
  • and, or, not logical operators ‘&&’, ‘||’, ‘!’
  • plus and minus binary operators (‘+’ and ‘-’); by default it assumes ‘+’
    to be associative, which is not sound in general but is for common case uses (ask me for details)
  • return statements

What Jeene can’t do right now (i.e., what I am working on).

Some of these are easy, others are harder and need careful analysis.

  • global variables (in general variables in the function’s closure)
  • ‘var’ statements (i.e., local variable declaration and initialization)
  • incrementing/decrementing assignments (‘+=’ and ‘-=’)
  • ternary if, i.e., “exp ? v1 : v2″ operator
  • comparison “===”, “!==”, “<”, “<=”, “>”, “>=”
  • multiplication, division (‘*’,'/’)
  • static/dynamic property lookup (‘.’ and ‘[‘)
  • function calls (“fn()”)
  • unary minus (‘-1′)
  • typeof operator (‘typeof x’)
  • function operator (‘function(){}’)
  • expression statements, array and object literals
  • expression statements, array and object literals
  • ‘if’, ‘break’ statements
  • ‘while’ and ‘for’ statements

New in latest svn commit

Jeene can now handle simple assignments of the form (x = exp) where x is a variable and exp is an expression. Further, formal parameters which are specialized disappear from the formal parameters in the specialized function, but now appear as local variables of that function (so that one can assign to them).

Consider this example:

Function.prototype.specialize = net.higherorder.jeene.Jeene.make();
source_fun = function(a,b,c) {
    c = a;
    return b+c;
};
var env = {a:3, b:2};
res = source_fun.specialize(env);

Result: 

(function (c) {var a, b;return 5;})

Note that the assignment c=a; can be eliminated and the static value of a is propagated to c so that the expression returned is static. Neat ;-)

Check back for more later…

Posted in General | 2 Comments

Jeene: Status

I am happy that Jeene was mentioned on Ajaxian today. This is good ;-) it means that the project is more likely to attract potential contributors which is nice since there is quite some work yet to be done.

I thought, I would provide a status update on Jeene and a “backlog” of work that needs to be done.

What is working:

  • Partial evaluation of simple operations: ‘+’, ‘-’, ‘&&’ and ‘||’
  • code generation for ‘function’ expressions and ‘return’ statements

This means that we can run a number of simple examples. For a quick and dirty performance test checkout: performance_simple1

This is all fine, the basic design is in place. Now it is just a matter of “getting it done” to implement other operators. However, there are some constructs that are more difficult:
What I will be working on next

  • Assignment. Since an assignment changes the static/dynamic status of variables, this is more advanced than e.g. ‘+’ which doesn’t have side effects. For example if ‘x’ is a static variable and ‘y’ is dynamic, then ‘y = x;’ should cause ‘y’ to be static from that program point.
  • Handling of function calls. For now, I will simply replace a function call with a function call where we replace static expressions in the arguments with their values. Later I will consider ‘unrolling’/'inlining’ of function calls.

As always, I welcome anyone interested in contributing. I will run a rule: 1-patch approved = commit rights.

Posted in General | Tagged , , | Leave a comment

Jeene: An automatic partial evaluator for JavaScript

The purpose of this posting is to show that is is possible to create an online partial evaluator for JavaScript, written also in JavaScript. As far as I know, this has been not been done before. This post is the first in a series describing the inner workings of Jeene.

A what? A partial evaluator (or program specializer) is a program which takes two inputs: another program and an environment mapping variables to values; it outputs a specialized (i.e., more efficient) version of the input program with respect to the environment. One can think of a partial evaluator as a mix between an interpreter and a compiler: it interprets the static parts of the program and emits code for the dynamic parts.

For a simple example of code specialization, consider the following function for creating a string corresponding to an HTML tag:

var mk_tag = function(tag_name,clazz,contents) {
   return '<'+tag_name+' class="'+clazz+'">'+contents+'</'+tag_name+'>';
};

If we are only interested in making ‘div’ tags with a class of ‘green’ then a more efficient version would be:

var mk_div_green = function(contents) {
   return '<div class="green">'+contents+'</div>';
};

since it would require fewer string concatenations per call. We can think of ‘mk_div_green’ as a version of ‘mk_tag’ which is specialized for writing ‘div’ tags with a class of ‘green’. A partial evaluator for JavaScript could automatically derive the ‘mk_div_green’ function from the ‘mk_tag’ function.

This is exactly what we can do with the evaluator in this posting.

//This code actually works ;-) 
Function.prototype.specialize = net.higherorder.jeene.Jeene.make();

var mk_tag = function(tag,clz,cont) {
   return "<"+tag+" class='"+clz+"'>"+cont+"</"+tag+">";
};

var mk_div_green = mk_tag.specialize({tag:'div', clz: 'green'});

mk_div_green("Pratt rocks!");
//result: <div class='green'>Pratt rocks!</div>

mk_div_green.toSource ? mk_div_green.toSource() : mk_div_green.toString();
//result:
//(function (cont) {return ("<div class='green'>" + cont) + "</div>";})

This last line, shows that the output function is much more efficient than what is created by general JavaScript curriers which have been seen before: these functions merely wait evaluating the function until all parameters are supplied; instead, a partial evaluator will create specialized function taking advantage of the information given.

The first goal is for the partial evaluator to process any function written in an extension of “simplified JavaScript” (a subset of full JavaScript corresponding to what Crockford calls the “good parts“). This partial evaluator will have a number of useful features: it

  • can easily be extended to full JavaScript;
  • is written in simplified JavaScript (think Futamura projections);
  • is a fast extension of Douglas Crockford’s Pratt parser;
  • because functions implement the toString method, it can be run on dynamically generated functions (e.g., a specialized function can be further specialized);
  • can be embedded in any full JavaScript program as long as it is only used to specialize functions which are syntactically in simplified JavaScript;

I have started a new open source project, Jeene, which aims to create an efficient partial evaluator for full JavaScript that works in any ECMAScript 3 compliant implementation (e.g., all major browsers, Rhino, TraceMonkey, V8 etc). Right now the project is at a very early stage; a proof of concept, for example it cannot specialize itself. Let me know if you are interested in contributing: I will use a 1 patch threshold like Rubinius: if you submit one patch which is accepted, you get commit rights.

Stay tuned here for more information about how Jeene is designed and implemented.

Posted in code specialization, instanceof, JavaScript, partial evaluation | Tagged , , | 3 Comments

Probability Theory: The Logic of Science

This is a bit of an unusual posting. It was triggered because I am frustrated, having just written my third review for the journal TAAS, which I accidentally agreed to do reviews for at some point during my PhD studies. I still review papers on computational trust models which was the topic of my PhD dissertation. I have recommended ‘reject’ for all of these papers, not because these papers are any worse than most of what is being and has been published in that field (or what I’ve published myself); but because I accidentally started reading the book: Probability Theory: The Logic of Science by E.T. Jaynes, Cambridge University Press (June 9, 2003). This book radically changed the way I understand probability theory and its applications (which includes computational trust models). Once you read just the basic parts of his book (which is all I’ve read yet), you realize that much of the work being done in this area is waste; I will claim that it could all be done much simpler and with superior results if based on Jaynes formulation of probability theory (which according to Jaynes goes back to Jeffreys and Laplace).

During my PhD studies I was working on something called experience-based trust management. Fundamentally, this topic is about programs that reason about the behaviour of agents (other dynamic programs) in large open distributed systems (think Internet). Such reasoning is based on information, usually in the form of past interactions with agents or in the form of statements made by other agents about such interaction (i.e., reputation information).

After the first two years we had been working hard on creating a formal model for “computational trust” encompassing uncertainty, based on somewhat hardcore mathematical theory of complete lattices and monotonic functions, complete partial orders and continuous functions (domain theory) and even category theory. It was abstract, it was fun, it was warm, nice and cuddly; it turned out, however, to be essentially useless… Fortunately, after approximately two years I somehow realized this and started working on the same problems, but with a less abstract approach. At some point later I somehow came by the book of Jaynes. Now, I only wish I had read that book in 2004…

Anyway, I don’t know how you were taught probability theory (or worse, statistics) but the courses I took had abstract definitions (corresponds to what is on wikipedia) that seemed magical to a first year computer science student, and abstract but at least general later when I encountered measure theory. While this is all very interesting if one is interested in abstract mathematics, when one reads Jaynes account of probability one cannot avoid to think that Jaynes approach is overwhelmingly appealing: at first sight it intuitive and much is simpler; and once one gets into the later chapters, one learns that it is also more powerful, and in fact, the rules of probability theory is proven to be the unique set of rules that satisfy an absolutely reasonable set of qualitative desiderata (this is known as Cox’s theorem which is on wikipedia, but Jaynes’ exposition is much better in my opinion, a version is here (from page 13)).

I won’t even try to give an account of the book here, but only recommend it to anyone even remotely interested in scientific reasoning and logic, but also applied mathematics and computer science. There are some places that are mathematically challenging for your typical CS grad, but it still has value even if the advanced techniques and proofs are skipped. Read it before you submit any paper on trust ;-)

Posted in General | 4 Comments

Fibers: an exercise

I was involved with arranging and hosting the RubyFools 2008 conference (it was a great conference btw). One of my jobs was to come up with a bunch of Ruby exercises for our Ruby cave. I decided to go with exercises on the new Ruby 1.9 Fiber class since it is new and because I’m interested in concurrency.

One of the exercises was a tricky exercise which, as far as I know, nobody solved. Now whether that was because no one tried or cared, or because it was too tricky, I don’t know. In either case, I think it is a pretty neat exercise. Here it is.

Exercise. Using only the Fiber class from Ruby 1.9, write a class FiberStack which is an implementation of the Stack data structure. You can’t use any heap memory (except for Fiber), so using arrays or linked objects is out of the question.

Please don’t paste code into comments. Instead post solutions as a comment here. Then post a comment here linking to that solution. In this way we won’t spoil the fun for those wanting to solve it on their own.

Posted in Extjs, General | 1 Comment

Fibers: a solution

Here I will post a solution to the Fibers exercise… Post your solutions as comments.

Your code should at least pass the following simple test:

require 'fiber_stack'
require 'test/unit'

class FiberStackTest < Test::Unit::TestCase
      
  def test_simple
    fs = FiberStack.new
    assert fs.empty?
    fs.push 'a'
    fs.push 'b'
    assert_equal 2, fs.size
    
    assert_equal 'b', fs.pop
    assert_equal 1, fs.size
    assert_equal 'a', fs.pop
    assert fs.empty?
    assert_equal 0, fs.size
    
    assert_nil fs.pop
    assert fs.empty?
    assert_equal 0, fs.size
    
    fs.push 'c'
    assert_equal false, fs.empty?
    assert_equal 1, fs.size
    
    assert_equal 'c', fs.pop
    assert fs.empty?
    assert_equal 0, fs.size
  end
end

require 'test/unit/ui/console/testrunner'
Test::Unit::UI::Console::TestRunner.run(FiberStackTest)

with the following result:

krukow:~/Projects/private/Fiber/lib$ ruby run_stack.rb
Loaded suite FiberStackTest
Started
.
Finished in 0.002572 seconds.

1 tests, 15 assertions, 0 failures, 0 errors
krukow:~/Projects/private/Fiber/lib$

Have fun ;-)

Posted in General | 4 Comments

Keeping it dry: Generating JavaScript models from Rails models

As I’ve mentioned before, I advocate using a Model-View-Controller pattern for certain types of JavaScript-heavy web-app clients. In spite of recent licensing issues, I still think ExtJS is among the better libraries supporting MVC. For example, (if you don’t know what namespace/using are, please read this)

namespace('dk.okooko.model');
using(dk.okooko.model).run(function(m){
        
        m.OrderItem = Ext.data.Record.create([
          {"type": "int", "name": "id"},
          {"type": "int", "name": "product_id"},
          {"type": "int", "name": "quantity"},
          {"type": "int", "name": "order_id"},
          {"type": "date", "format": "d/m/Y-H:i", "name": "created_at"},
          {"type": "date", "format": "d/m/Y-H:i", "name": "updated_at"}
        ]);

});

The ‘Record.create’ function gives a way of succinctly creating constructor functions for model objects which are supported throughout the Ext functions; you can read more about it here.

If you are using Ruby on Rails at server-side (something which I have been doing quite a bit recently), then you realize that you are manually duplicating all the rails model classes in JavaScript. For example, here is the corresponding migration:

class CreateOrderItems < ActiveRecord::Migration
  def self.up
    create_table :order_items do |t|
      t.integer :product_id
      t.integer :quantity
      t.integer :order_id

      t.timestamps
    end
  end

  def self.down
    drop_table :order_items
  end
end

I was annoyed by this lack of dryness: whenever I changed the migration, I’d have to manually change the corresponding JS model. So I slapped together a really simple code generator which can create the model files from the rails models. It adds a method acts_as_jsmodel which is intended to be called by a rails model class, e.g.,

class OrderItem < ActiveRecord::Base
  belongs_to :order
  belongs_to :product
  has_one :order_item_status
  acts_as_jsmodel
end

So you can annotate model classes that you’d like to generate js models for. The acts_as_jsmodel causes a file to be written to public/javascripts/dk/okooko/model (generally, it depends on a constant APP_NAMESPACE).

Here is the code for the generator:

module Trifork
  module JSModel 
          
    JS_MODEL_DIR = "public/javascripts/#{APP_NAMESPACE}/model".gsub!('.','/')
    MODEL_PACKAGE = APP_NAMESPACE + '.model'
    def acts_as_jsmodel
      
      export_record "#{JS_MODEL_DIR}/#{self}.js" 
      rescue 
        nil
    end
      
    def export_record(fn)
      File.open(fn,'w') do |f|
        s =<<END
namespace('#{MODEL_PACKAGE}');
using(#{MODEL_PACKAGE}).run(function(m){
        
        m.#{self} = Ext.data.Record.create([
          #{self.to_record}
        ]);

});
END
        f.write s
      end
    end
      
      
    def to_record(spec = {},datefm=nil)
      spec[:exclude] ||= []
      keys = spec[:only] ? [*spec[:only]] : 
              (columns.map &:name).reject {|a| spec[:exclude].include?a}
      props = columns_hash
      
      keys.map do |k|
        p = props[k.to_s]
        if p.nil?
          raise "Property #{p} is not a column of #{self}"
        end
        {:name => p.name}.merge!(Trifork::JSModel::to_ext_type(p.type,datefm)).to_json
      end.join(",\n          ")
      
    end
     
    
    def self.to_ext_type(t,datefm)
      type = case t
      when :integer then 'int'
      when :string, :text then 'string' 
      when :boolean then 'bool' 
      when :datetime then 'date'
      else
        'auto'
      end
      res = {:type => type}
      res.merge!(:format => (datefm || 'd/m/Y-H:i')) if t==:datetime
      res
    end
    
  end
end

You can download the files from this entry here.

Posted in General | 5 Comments

Syntax-highlighting in web pages

I’ve recently moved my blog to http://blog.higher-order.net. In this transition I used WordPress’ feature of importing blog content from my old blog: higher-order.blogspot.com. Unfortunately all the line-breaks from the pre-formatted JavaScript source code disappeared in the process, and so, I will have to insert it manually.

This whole process reminded me that I would really prefer to have syntax-highlighting for code-sniplets in the blog. A little bit of searching lead to GNU Source-highlight. After a little fight with the make system, I managed to install it on Mac OS X — it is looks really neat (even in the default style), and it’s simple to use.

However, the main reason I’m using this tool is not presentation: When writing the source code presentation manually in HTML one has to remember to escape characters like <. This means that the transformed source (i.e. the HTML-fragment) is no longer a syntactically correct program in the programming language of the code-snipplet, which means we can’t just paste it into an interpreter to check if it works. Sometimes this leads to people posting code which isn’t even syntactically valid… If you instead maintain a file which holds the syntactically correct source code (and using a smart editor you will immediately see stupid syntax errors instead of having blog-readers find those errors). You can then compile the source code to HTML using the Source-highlight tool. This is significantly less error prone, and the quality improves.

Here is the namespace function from a previous post.

function namespace(spec,context) {
   var validIdentifier = /^(?:[a-zA-Z_]\w*[.])*[a-zA-Z_]\w*$/,
       i,N;
   context = context || window;
   spec = spec.valueOf();
   if (typeof spec === 'object') {
      if (typeof spec.length === 'number') {//assume an array-like object
         for (i=0,N=spec.length;i<N;i++) {
             namespace(spec[i],context);
         }
      }
      else {//spec is a specification object e.g, {com: {trifork: ['model,view']}}
         for (i in spec) if (spec.hasOwnProperty(i)) {
            context[i] = context[i] || {};
            namespace(spec[i], context[i]);//recursively descend tree
         }
      }
   } else if (typeof spec === 'string') {
          (function handleStringCase(){
              var parts;
              if (!validIdentifier.test(spec)) {
                throw new Error('"'+spec+'" is not a valid name for a package.');
              }
              parts = spec.split('.');
              for (i=0,N=parts.length;i<N;i++) {
                  spec = parts[i];
                  context[spec] = context[spec] || {};
                  context = context[spec];
              }
           })();
    }  else {
       throw new TypeError();
   }
}
Posted in General | Leave a comment

blog moved

I’ve moved my blog to blog.higher-order.net — welcome again ;-)

Posted in General | Tagged | Leave a comment