Circuit Breaker: a small but real-life example of Clojure protocols and datatype

(Update July 2nd 2010:
I’ve cleaned up the code and git repo.
Inlined the protocol function definitions in the state records for native-platform speed.
The policy can now specify which exceptions should be considered errors: this is really useful in real life when you don’t want to trip the circuit breaker say on security exceptions.
Now builds with lein 1.1.0.
…]$ lein jar

…]$ javac -cp lib/clojure-1.2.0-master-20100623.220259-87.jar:circuit-breaker.jar src/C.java
…]$ java -cp lib/clojure-1.2.0-master-20100623.220259-87.jar:circuit-breaker.jar:src C
)

Michael Nygards stability pattern “Circuit Breaker” is useful for failing fast when calling integration points that are unstable (which is every integration point I’ve ever dealt with!). This is done by detecting when integration points fail, and subsequently cutting off access for a time-period. Use the circuit breaker to

… preserve request handling threads in the calling system. Very often, when you make a call to an external integration point that’s broken, it will tie up a thread in a blocking synchronous call for an indefinite period of time. [Michael Nygard, QCon interview]

I’ve written a fast, non-blocking functional implementation of the Circuit Breaker in the 1.2 branch of Clojure which will be released shortly. It uses the new Clojure constructs protocols and datatypes for modeling states, for interop and to obtain platform-speed polymorphic calls.

The implementation exposes a simple Java interface which makes it usable from Java, Scala, JRuby et al.

Why?

Design Patterns are a disease, and Clojure is the cure. :-) (the smiley is mine!)
[http://www.nofluffjuststuff.com/blog/stuart_halloway/2009/10/the_case_for_clojure]

I believe this implementation has a couple of advantages compared to these Java and Scala implementations that use the GoF “State Machine” pattern.

First, I find the functional version simpler (see examples below). Second, this version guarantees that only a single call is made to the integration point when the circuit breaker decides to re-test if it is working again (the other versions seem to allow an unbounded number of calls if more threads concurrently try to access the integration point calling “invoke”). Finally, this version encapsulates the (immutable) state in a single Clojure atom (corresponding to a Java AtomicReference), whereas the GoF implementations use at least three atomics for various counters. Why does that matter? Well, it gives you the ability to obtain a consistent (immutable) snapshot of the state of circuit breaker at any given time which can be used to e.g. logging and analysis – this isn’t possible when you have several atomics in play.

These benefits come naturally from following Clojure’s programming model and concurrency constructs. Let me illustrate the Clojure features that I’ve found useful for this problem.

Protocols and records.
I’m using pure polymorphic functions on-success, on-error, on-before-call as transition functions mapping a state to the next state for an event (successfull call, error call and before a call is initiated). A pure function proceed is a predicate on states that decide whether or not the state allows calls to go though to the integration point.

Together these functions form a Clojure protocol (which is similar to a Java interface, but has additional benefits).

(Show with JavaScript: for non-JS User agents, see http://gist.github.com/390485)

Apart from the definition of the protocol, we define a default implementation of the protocol functions that our states can use. The default transition functions are simply the identity function and proceed defaults to false.

We now define datatypes corresponding to each type of state: closed (calls go through, count failures), open (calls fail-fast, stores a time-stamp when IP failed), initial-half-open (a single call goes through), pending-half-open (waiting for a probing call to return). The datatypes are parameterized by a “transition policy” defining how many failures are “needed” to transition to the open state, and how long to wait in the open state.

(For non-JS User agents, see http://gist.github.com/390492 )

This simply defines the states as datatypes. Note that we use defrecord not deftype. This makes our datatypes work like persistent clojure maps which is extremely useful – for example our states can be destructured (see, e.g. defrecord at http://clojure.org/datatypes).

We can make our new types participate in our CircuitBreakerTransitions protocol

(http://gist.github.com/390494)

A couple of notes:

- We use merge to take the default implementations and “override” with the implementations given (this would correspond to an abstract super-class in Java but is more flexible).
- We use destructing in the function definitions for easy access to the “ClosedState” data, e.g., in the body of (fn [{f :fail-count p :policy, :as s}] ... the states fail-count is available as f and similarly for the policy. The :as s clause makes the state itself available as s.
- Finally, we can construct new instances of the states since they are simple dynamically compiled classes, e.g., (ClosedState. p 0) creates the initial state with a policy p.

Pure functions. A clear advantage of the functional approach is the ease of testing. Consider this simple test of some states and transition functions.

(http://gist.github.com/390505)

This is the core of the circuit breaker itself:

(http://gist.github.com/390529)

A circuit breaker is simply an atomic reference to a state. The function wrap-with takes a function, f to wrap – this represents a function that will call an integration point, and a circuit breaker, named state. It then returns a “wrapped” function which is guarded by the circuit breaker. It uses the “transition-by!” function which makes a state-transition from the current state.

An example usage:

(http://gist.github.com/390535)

Notice that the snapshot of the state is available simply with a deref, e.g., as @cb.

A Java-interface

To expose the functionality to Java I have used Clojures gen-class to create a class that exposes two methods given by this protocol which lets you wrap a function and look at the state:


(defprotocol CircuitBreaker
(#^clojure.lang.IFn wrap [this #^clojure.lang.IFn fn])
(#^net.higher_order.integration.circuit_breaker.states.CircuitBreakerTransitions current-state [this]))

Then using gen-class I generate a class that implements the interface corresponding to that protocol. This gives the possibility of using the circuit breaker from Java:

import clojure.lang.IFn;
import clojure.lang.RT;
import net.higher_order.integration.circuit_breaker.AtomicCircuitBreaker;
import net.higher_order.integration.circuit_breaker.CircuitBreaker;

public class C {
	public static void main(String[] args) {
		CircuitBreaker atomicCircuitBreaker = new AtomicCircuitBreaker();
		IFn wrap = (IFn) atomicCircuitBreaker.wrap(new clojure.lang.AFn() {
			public Object invoke(Object arg0) throws Exception {
				if (arg0 == null) throw new IllegalArgumentException("null arg");
				System.out.println("Invoked with: "+arg0);
				return arg0;
			}
		});
		succeed(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		fail(atomicCircuitBreaker, wrap);
		sleep(1000);
		status(atomicCircuitBreaker);
		fail(atomicCircuitBreaker, wrap);
		sleep(5000);
		succeed(atomicCircuitBreaker, wrap);
	}

	
	private static void sleep(long howlong) {
		try {
			Thread.sleep(howlong);
		} catch (InterruptedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	private static void succeed(CircuitBreaker atomicCircuitBreaker, IFn wrap) {
		try {
			System.out.println(wrap.invoke("KARL"));
			System.out.println(wrap.invoke(42));
		} catch (Exception e) {
			System.out.println(e.getMessage());
		} finally {
			status(atomicCircuitBreaker);
		}
	}

	private static void status(CircuitBreaker atomicCircuitBreaker) {
		System.out.println(RT.printString(atomicCircuitBreaker.current_state()));
	}

	private static void fail(CircuitBreaker atomicCircuitBreaker, IFn wrap) {
		try {
			System.out.println(wrap.invoke(null));
			System.out.println(wrap.invoke(42));
		} catch (Exception e) {
			System.out.println(e.getMessage());
		} finally {
			status(atomicCircuitBreaker);
		}
	}
}

This is getting long. I’ll save the comparison for the next post :-)

Github: http://github.com/krukow/clojure-circuit-breaker

Posted in Clojure | Tagged , , , | 3 Comments

Clojure: Rich Hickey and Stuart Halloway at JAOO

See Danish Clojure Users Group posting.

Posted in General | Leave a comment

VersionManager

[update: First, my apologies to jdalton, my post was not meant to derail Fusebox, merely to show a different approach to similar problems. ]

I read a post on Ajaxian about Fusebox, a JavaScript library which is described as:

[...] The problem is that frameworks / libraries / third-party scripts may overwrite native methods or each other’s custom methods resulting in unpredictable outcomes. Fusebox, a limited version of the sandboxing component found in FuseJS, avoids these issues by creating sandboxed natives which can be extended without affecting the document natives.

I think Fusebox is a good project. The code in this blog post shows a different approach which in some situations may be more useful and in others, it may not.

With Fusebox script developers avoid polluting the globals by writing their scripts using FuseBox and its wrappers. E.g, you write code with a “Fusebox prefix” like this:

  var fb = Fusebox();
  fb.Array.prototype.hai = function() {
    return fb.String("Oh hai, we have " + this.length + " items.");
  };
  fb.Array(1,2,3).hai(); // "Oh hai, we have 3 items."
  typeof window.Array.prototype.hai; // undefined

// like the native Array constructor the sandboxed constructor will return [ , , ]
  var a = fb.Array(3);
  // equiv to square-bracket notation [3]
  var b = fb.Array.create(3);
  // converting a native array to a sandboxed array
  var c = fb.Array.fromArray([1, 2, 3]);

That immediately reminded be of some work I’ve have been doing with colleague Jimmy Junker at Trifork (jju at trifork com) motivated by the following problem:

In portal environments where multiple portlets want to use different JavaScript libraries or different versions of the same JavaScript libraries, you are very likely to run in to problems since almost all libraries (except later YUI versions) are designed to live in the global namespace.

After a brainstorming session we developed a simple prototype (no pun intended), uninspiringly named “VersionManager”, with which we succeeded in loading two different versions of the PrototypeJS library. With VersionManager you can write code like this:

with (VersionManager.version("1.6.1")) {
//"1.6.1" refers to prototype which must have been loaded
  console.log("abcdaba".gsub("a","42"));// console.logs '42bcd42b42'
  console.log(typeof Object.extend);// console.logs 'function'
  $A([1,2,3])._each(function(x){
    console.log(x);
  });// console.logs '1','2','3'
}
VersionManager.clear();//otherwise previous version is lingering...
try {
  console.log(typeof "".gsub);
//console.logs 'function' <= this is generic delegating proxy function
  console.log("abcdaba".gsub("a","42"));//throws error
} catch (e) {
  console.log(e);
  //TypeError: "attempt call to delegate with no version defined (see documentation)"
}
console.log(typeof Object.extend);// console.logs 'undefined'
console.log(typeof $A);// console.logs 'undefined'

So version manager lets you load several libraries on the page, and sandboxes changes made to globals. By telling VersionManager which version you want to use for a particular block you code, you can use each of those libraries, even if they define the same global variables or properties on the prototypes of objects, e.g., Array.prototype.

VersionManager also creates sandboxes, but as opposed to Fusebox, what is sandboxed is "changes made to" the built-ins, and global variables defined. Also, the goal is not to avoid extending the prototypes, but almost the opposite: to allow several scripts/libraries that would otherwise conflict to live in the same page without seeing each other, even if they define the same global names or want to extend built-in prototypes in different ways.

With VersionManager you simply write regular user code, but wrap it in a with statement. This means that existing code can easily be rewritten.

The loaded "sandboxed" libraries must adhere to a few syntactic and semantic constraints.
For an example, see a sandboxed version of prototypejs here

Transformed prototypejs

The transformation applied to prototypejs is very simple and can be done automatically (i.e., it is easy to write a program that performs this script-transformation for you). It is a bit technical, but it results in is adding a declaration with-statement "header" at the beginning and a "footer" and the end. In fact, we are writing a program that transforms any JavaScript program/library into an equivalent one that adheres to these constraints.

An alpha version is on github. It is a proof-of-concept that shows that the fundamental approach seems to work.

jdalton pointed out a couple of things :

  • You may run into problems with minifiers given that your code is wrapped in 'with'. However, you can minify the code first, and then wrap it.
  • It doesn't track changes to event or elements.

So far, we've tested it only in latest Firefox and Safari, but I see no reason why it couldn't work in other recent browsers. We have sucessfully loaded prototypejs in a sandbox, and managed to run all prototypejs unit tests successfully inside the sandbox.

Github project: http://github.com/krukow/versionmanager

/Karl

Posted in JavaScript, namespacing, with | Leave a comment

The Joy of Clojure

In case you haven’t noticed there is a very interesting Clojure book coming out, titled “The Joy of Clojure,” written by two very interesting authors that anyone hanging out in the Clojure community should know: Chris Houser and Michael Fogus.

As an appetizer, the first chapter is available for free:

Clojure—A Lisp for the Java Virtual Machine

I’ve read the first chapter and the book looks very promising! To quote the last paragraph of chapter one:

We’ve talked a little about how this book will go beyond what Clojure is to why it’s designed the way it is and how that design can be exploited through idioms that will help you think in Clojure. So lets stop talking about what this book will do and get on with the doing.
Fasten your seat belts.

I’ve fastened my seat belt and ordered my copy :-)

Posted in Clojure, General | Tagged | 3 Comments

Please help funding Clojure

Rich Hickey, creator of Clojure:

As should be obvious, Clojure is a labor of love on my part. Started as a self-funded sabbatical project, Clojure has come to occupy me far more than full-time. However, Clojure does not have institutional or
corporate sponsorship, and was not, and is not, the by-product of
another profitable endeavor. I have borne the costs of developing
Clojure myself, but 2009 is the last year I, or my family, can bear
that.

I would be such a shame if development were to slow down or stop completely because of such a small amount of money… So, please donate to the project.

See details,

http://groups.google.com/group/clojure/t/cc77df25e98ce46b

Posted in General | 1 Comment

Review: Ext JS 3.0 Cookbook

Intro and motivation. While my recent blogging activity and interests have been (and are) about Clojure, I am still very much interested in, and actively programming, JavaScript. That is why I immediately accepted when Amit Sharma from Packt Publishing asked me to review the new book Ext JS 3.0 cookbook by Jorge Ramon (find links below 1).

Historically, I have been known for advocating the Ext JS framework for RIA development in controlled environments since it is very general and customizable, and because it lifts the abstraction level much higher than when developing at the DOM level with many other frameworks. Let me emphasize: You do need to understand the DOM, event models, and browsers, but that you don’t have to think at this low level constantly when developing (this discussion is the topic for another blog post, based on discussions we had at Trifork).

One of the problems with Ext JS is that is has perhaps the steepest of learning curves among JavaScript frameworks, and while the API documentation is good, it doesn’t tell you much about the programming model and idioms, or how structure and modularize the combining of components and containers into applications. Fortunately, the Ext JS developers have provide a large number of examples of how to use the various Ext JS functions. Unfortunately the examples are code only: they don’t come with explanations of why things are coded as they are. So it is your job as a developer to extract this information from the source code, and decide which bits you can and can’t use in your production code. Depending on you ability and willingness to read the examples and the Ext JS source code itself, you may find this more or less viable learning route.

To me it is obvious that there is need for one or more Ext JS books. In fact, I am contemplating considering starting-to-think-about writing something myself … ;-)

Review. I won’t cover the structure of the book other than to say it is organized as a number of recipes, each showing how to build a mini ExtJS-based GUI for a certain use-case. Other reviewers have already covered recipe structure in detail. Instead, I will take a step back and give you my opinion about what you’ll get and what you won’t get with this book, i.e., it’s all about expectations.

(Anyway, the best way to get a feel of the form and flow of the book is to read a sample chapter. I found the following online; there are many more topics in the book, but the form is the same:

http://www.packtpub.com/files/8709-ext-js-cookbook-sample-chapter-3-load-validate-and-submit-forms.pdf)

Ext JS 3.0 markets itself as: “Quick answers to common problems. 109 great recipes for building impressive rich internet applications using the Ext JS JavaScript library.” As other reviewers have mentioned, the book successfully does what it sets out to do: It gives you concrete detailed solutions to a number of problems selected by the author; no less, but also no more. There is code that comes with the book, to you don’t have to type in the examples from the book. The book is very code-heavy: as you can see from the sample chapter, a 60-80% to 40-20% code-to-text ratio.

Aspects I liked.
The book is focused: apart from the preface which is good, there isn’t a long ramble introduction wasting my time. It is into the meat already from Chapter 1, page 1. Good.

It is fairly complete in that parts of ExtJS that cover making UIs, here I wasn’t missing anything in particular.

It does what it sets out to do, and it is well organized, well written and consistent.

There are fairly concise sections “There’s more…” and “How it works…” which discuss aspects of the code just described.

The “How to do it…” sections are is step-by-step, and explaining what each step does.

Aspects I didn’t like so much.
It is not the kind of book I, personally, was looking for: I don’t want a recipe book, since such a book would often be too focused on too great detail, too much boiler-plate code, and things I could have figured out myself. I would prefer a book that was would focus on concepts, real-world problems, density and concision of code examples, common errors, how to organize and structure large applications, development techniques and best practices (the latter it does to some extent).

Conclusion. As mentioned I think this book delivers what it promises. I am happy to have a copy lying around. I think I will use it, and I do think it is valuable as a guide to the extensive Ext JS example code.

If you are a complete beginner or quite advanced user, I do not think this book is of that much value to you. If you are at an intermediate level, having played around with Ext JS and JavaScript, but not built anything too serious, I recommend it.

If you are looking for a more conceptual book, obviously this book is not for you. If you a looking for a concrete step-by-step, “hands-on” book, this book is for you.

If you would like something to complement the Ext JS Sample code, this book is for you. If you can understand and generalize the Ext JS sample code, this book is not for you.

Hope that helps :-)

1Disclaimer: I agree with Giles Bowkett that many times blogs aren’t the best source of truthful, objective information (aka “Godless Communist Bullshit“). It is your job as a responsible thinking reader to consider if there are hidden agendas and economical interests that might cause the publisher to not present the truth, the whole truth and nothing but the truth. The people at Packt Publishing are smart, they understand this. They’ve found a bunch of bloggers and asked them to do reviews (see below).

I want to make it completely visible that I am participating in Packt Publishing’s affiliate program that gives me a payment, not for the review, but for each purchase originating from this site.

My manifesto here is: while there is value in money, I value integrity more. I have striven to give as fair and objective a review as I possibly can (I hope you can see that; if not, let me know).

This is the Ext JS 3.0 cookbook link to use if you want to support me.

http://www.packtpub.com/ext-js-3-0-cookbook/mid/011209p0gzfz?utm_source=blog.higher-order.net&utm_medium=affiliate&utm_content=blog&utm_campaign=mdb_001692

Otherwise use this link.

http://www.packtpub.com/ext-js-3-0-cookbook

Other reviews.

Josh Holmes. I agree with everything Josh is saying, and I don’t want to repeat it here.
http://www.joshholmes.com/blog/2009/11/19/ReviewOfExtJS30Cookbook.aspx

Arthur Kay. I agree with most of what Arthur says. I don’t agree with everything Arthur is saying: I don’t think there is much to learn for experienced ExtJS developers.

http://blog.akawebdesign.com/index.php/2009/11/20/book-review-extjs-3-0-cookbook

ExtJS Forum response: It is obviously very positive since it is posted to Ext JS enthusiasts.

http://www.extjs.com/forum/showthread.php?t=79850

Posted in General | Leave a comment

Clojure circuit breaker

[Update Jan 1., 2010: A couple of people have been linking and twitting this, so I've made the blog entry match the current Clojure version at github. Code works in the 'new' branch of Clojure: tested with commit 3ae9e8874d43f9fd37e59bb7ea8cce0f85bac101.

There is support for creating several circuit breakers wrapping given functions].

As an exercise in Clojure, I’ve implemented a mostly functional version of Michael Nygaard’s stability pattern “Circuit breaker” (http://www.pragprog.com/titles/mnee/release-it).

The implementation uses the new constructs deftype and defprotocol (which are looking really interesting to me!). The implementation maintains a single identity named “state” which is an atom holding the current state (open, closed, initial-half-open or pending-half-open).

More on deftype and defprotocol can be found here.

Here is the code that defines the states and the state transitions (requires JavaScript; otherwise use this link). Notice how the events ‘on-before-call’, ‘on-success’ and ‘on-error’ are implemented as a protocol defining state-transition functions. I really like the features Rich is adding to Clojure – to me it makes modeling so much more natural (with an OO-background), while retaining the benefits of functional programming.

Notice how this snipplet defines the transition functions, the four states as well as an “abstract” implementation of the state transitions. The abs-transitions can be used as default implementations when extending the protocol to the state-types. In Java, this would correspond to an interface (CircuitBreakerTransitions), an abstract super-class implementing the interface, and four immutable classes (the states). However, Clojures deftype and defprotocol does not create a type-hierarchy, and is much more dynamic (you can always define how a given type extends to a protocol – it is not fixed at type definition time). Here is how I extended my types to the protocol:


(http://gist.github.com/267169)

This defines a completely functional transition-system between the four states. I think this reads really well, for example consider the on-error event in the closed state: this checks to see if the states fail-count is equal to the threshold defined in the policy, if so it transitions to the open state, recording the current time (OpenState p (System/currentTimeMillis)) (preserving the policy); otherwise it transitions to a closed state with a higher fail-count.

The actual circuit breaker uses these transition functions:


(http://gist.github.com/267172)

The state of a circuit breaker is an atom (the only mutable construct in this example). The function wrap takes a function f and returns a “wrapped” version of f, which implements the circuit breaker functionality around f.

Notice how dead-simple this is to test:


(http://gist.github.com/267173)

Full Source:
http://github.com/krukow/clojure-circuit-breaker

Posted in General | 3 Comments

Identity, State and Values

Please watch this video carefully at least once:

http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey

It is pretty hard not to agree, isn’t it… Give in now.

Posted in General | Leave a comment

Understanding Clojure’s PersistentHashMap (deftwice…)

[sept. 8th, 21:22: fixed a +/- 1 error]

In a previous post, I gave a high-level description of how Clojure’s PersistentVector is implemented. While the code has changed, the description was high-level enough that the explanations still hold (although some code snipplets don’t correspond to what’s in Git master.)

In this post, I’ll try to explain (also at a high level) how clojure.lang.PersistentHashMap works internally. Reading the mentioned post on PersistentVector is helpful as some of the concepts are the same (e.g., bit-partitioning).

Persistent
PersistentHashMap is a persistent version of the classical hash table data structure. Persistent means that the data structure is immutable, yet has efficient non-destructive operations that correspond to the operations on the classical hash table. E.g., put(K,V) in hash table corresponds to a side-effect free function assoc(P, K, V) which computes from P a new PersistentHashMap P’ which is like P except that it maps key K to value V. The word “efficient” means “on par” with their mutating counterparts. For Clojure data structures, Rich tries to make them within 1-4 of the Java data structure operations; and read-only operations can even be faster than Java’s. Later I will cover ‘transients’ which are a new optimization that make “batch” operations faster.

Array-mapped hash trie
In his paper Ideal Hash Tries Phil Bagwell describes a data structure “Hash Array Mapped Trie” which is an efficient implementation of a Hash Tree, based on a combination of hashing and the trie data structure. Hash Array Mapped Tries, are not persistent or immutable. What Rich did was create a persistent version of Bagwell’s data structure; clojure.lang.PersistentHashMap.

PersistentHashMap basic idea
PersistentHashMap (PHM) maintains a very-wide tree, each node having up to 32 children. Each node is a concrete implementation of a static inner interface, INode, and there are five implementations of this interface: EmptyNode, LeafNode, FullNode, HashCollisionNode, BitmapIndexedNode. I’ll only cover EmptyNode, LeafNode and BitmapIndexedNode; the latter being where most of the interesting stuff happens.

The INode interface look like this:

static interface INode{
    INode assoc(int shift, int hash, Object key, Object val, Box addedLeaf);
    LeafNode find(int hash, Object key);
    //I've left out a few methods
}

The assoc method “adds” a new key-value pair to the map. The find method searches for the Leaf-node holding a key.

An EmptyNode simply represents the empty hash map. LeafNodes are also pretty simple; they hold the actual entries stored in map.

The root node of the tree is initially an EmptyNode. When assoc is called on EmptyNode, it returns a new LeafNode, which the key-value pair. So EmptyNode “becomes” a LeafNode with assoc. In turn, a LeafNode typically “becomes” a BitmapIndexedNode with assoc. We will go into details with BitmapIndexedNode, but first we need to understand…

Bit-partitioning of hash-codes
When PHM assocs a key object K with value object V, it first computes the hashCode of K, just as a hash-table would. The hash code of K yields an int, which has a 32-bit representation in Java (as I explained in the post on PersistentVector). Here are some example bit representations of numbers:

PersistentHashMap ilustration 1

The trick that PHM uses is to partition this bit representation in to blocks of 5-bits, represented with colors in the above example. Each block corresponds to a “level” in the tree structure; for example, the right-most green block corresponds to root-level, and the orange block corresponds to the children of the root. Exactly what “corresponds” means is described below. Levels are multiples of 5. I.e., the root level is level 0, the children of the root are level 5, the grand-children of the root are level 10, etc. Note that a block of five bits corresponds to a number in the range 0-31.

The reason that levels are multiples of five is the following: You have a bit-representation of a hash-code and you are interested in a particular block corresponding to a level n. You obtain this number in two steps: first move the block of bits to the right, until it is the right-most block. Then null-out all other bits except this right-most block. The is done with two bit-operations: you simply right-shift the bits with the level n and then do a bit-wise ‘and’ (&) with the pattern 00..11111. For example, suppose you want the block corresponding to level 5 (the orange block) of the number 1258 (binary: [00001][00111][01010]). You right shift with the level, 5, which is [00000][00001][00111]; then do the null’ing, yielding [00111], which was exactly the orange block of 1258.

The following function does this.

static int mask(int hash, int shift){
	return (hash >>> shift) & 0x01f;
}

Illustrating the tree structure
I’ll use the following picture (adapted from one of Rich’s slides).


PersistentHashMap ilustration 1

The colored nodes are BitmapIndexedNodes and have between 2 and 31 children (should they get a 32nd child, they become FullNodes). A naive implementation of BitmapIndexedNode might be the following: use an int variable, level, to denote the level that this node lives in, and allocate a full 32 element array of INode references for the children. To add a new child: lookup the index via the bit-block corresponding to the level, i.e. given a hashCode hash for the child, and given the level, call mask(hash, level) to get the index in range [0, 31]. But this strategy wastes a lot of memory: each node has a full 32 element array where most entries are simply null, i.e., if there are 4 children there are 28 null references which are just wasting space.

The hard part is to only use as much space as is needed for each BitmapIndexedNode, i.e., if a BitmapIndexedNode has N children it maintains an array of size N. But then we can’t use mask(hash,shift) as the index into the array since it returns a number in the range [0,31] and we need a number only in range [0, N).

bitpos
So we need a function to map numbers in range [0, 31] to indexes in range [0, N). The function has to be fast constant time, since we are using it to find the child of a node from a hash code, which we will do at each level in the tree. The function is a composition of two functions: bitpos and index. Function bitpos maps numbers [0, 31] to powers of two, i.e., numbers that have a binary representation of the form:

{10n | n >= 0}.

For example, bitpos(7) in binary is 10000000. We always look at bitpos(x) in binary form. Function index we return to shortly.

static int bitpos(int hash, int shift){
    return 1 << mask(hash, shift);
}

bitmap
Each BitmapIndexedNode also maintains an int variable bitmap which we also look at in binary form. The bitmap tells us how many children this node has, and also what their indexes are in the child array. All this is encoded into one int variable! How? The bit-map has a binary representation, e.g.,

00000000000000010000000010000101

The number of children is the number of 1‘s in the binary representation. If the nth bit in bitmap is 1 (counting right-to-left, starting with position 0) then there is a child with index n. So to check if a child exists for a certain hash-code: first compute mask(hash,shift) to get the bit-block and number in range [0, 31]. Then compute bitpos of this. You then have a number of form 10n. Now match that with the bitmap to check if there is a 1 in the n‘th position; this match is simply a bit-wise and, ‘&’, with bitpos. We’d better take an example.

Suppose we are at level 5, and looking up an element with hash-code 1258. Suppose also the bitmap-indexed node has four children, with bitmap
bitmap =binary rep 00000000000000010000000010000101

Now check if there is a child for hashCode 1258 at this level:

mask(1258,5) = 7 (i.e., binary 00111 as we saw before).

bitpos(7) =binary rep 10000000

00000000000000010000000010000101 &
00000000000000000000000010000000 = 1

Which means that the child exists. Now what is its index? This is where the index function comes into play…

index
This is the final piece of bit-trickery, I promise ;-) The index of a child is the number of 1‘s to the right of the child’s bitpos in the bit map. In our example above, the index corresponding to hash code 1258 would be 2, since there are two 1‘s to the right of 10000000 in the bitmap. Now the trick is that on many processors there is an efficient instruction called CTPOP which counts the number of ones in the bit representation of an integer (CTPOP is “count (bit) population”). Note that if we subtract 1 from the bitpos, 10n, we get 01n, then binary ‘and’ with the bit map gives us the same bit map, but where only the 1‘s to the right of bitpos are present. If we do a CTPOP on this, we get the index. Hence,

final int index(int bit){
    return Integer.bitCount(bitmap & (bit - 1));
}

To be continued…
From this you should be able to understand how find works. In combines all this:

final static class BitmapIndexedNode implements INode{
    final int bitmap;
    final INode[] nodes;
    final int shift;
    //some stuff left out
    static int bitpos(int hash, int shift){
	return 1 << mask(hash, shift);
    }
    
    final int index(int bit){
	return Integer.bitCount(bitmap & (bit - 1));
    }
    //...some methods left out
    
   public LeafNode find(int hash, Object key){
       int bit = bitpos(hash, shift);
       if((bitmap & bit) != 0)
	   {
	       return nodes[index(bit)].find(hash, key);
	   }
       else
	   return null;
   }
}

In part 2 we look at how assoc works…

Posted in Clojure | Tagged , , | 6 Comments

Clojure talks in Copenhagen and Aarhus – now with Azul Systems Clojure demo

Upcoming events: I am giving a talk on Clojure in Copenhagen and Aarhus – this is the first chance for a dcug meetup (though also non-dcug members are invited). The events are after work and free – there will even be free sandwiches, compliments of Trifork :-)

Abstract for Aarhus/Cph talks.
Clojure is..

… a new functional, dynamic programming language for Java Virtual Machines. The primary novelty of Clojure is its strong focus on and support for in-process concurrency: a unique concurrency model, combining a notion of persistent (i.e., immutable, fast) data structures, with a lock-free concurrency model. This simplifies concurrent programming greatly and has good scalability properties.
Influenced by LISP and Haskell, Clojure supports pure, lazy functional programming and has a powerful macro system which makes extending the language to support DSLs easy and powerful.

This talk..

… is split in three parts. In the first part, Clojure is introduced for those who don’t know the language. There is so much to cover that this will be a fast tour with pointers to more information, but we will emphasize the unique aspects of the language.

In the second part we go into more depth regarding the implementation of persistent (and transient data structures) – “the secret sauce of Clojure” ;-)

In the third part we get to see Clojure in action running on some very cool technology – a unique opportunity! Azul Systems (www.azulsystems.com) has promised to make available one of their large Vega 3 compute appliances (864 core, 368 GB memory, let’s go concurrent). We will explore how the Clojure concurrency model fares in practice, scaling a demo of a parallel Traveling Sales Problem algorithm. We will also push the implementation to its limits in a high-contention demo. Great fun!


Remember: Active until August 31st:
DCUG members can now get a 15% discount on JAOO tickets.
Simply click the banner below, choose “register here” and use the promotion code: dcug

JAOO Aarhus 2009 - The Conference for the 360 Degree software developer

Posted in Clojure | Tagged , , | Leave a comment