Haskell approaches; matches!

Friday, December 19, 2008

Using a Tree Zipper to reflect a tree drag and drop update

I thought I'd share two interesting approaches that I implemented to update a Tree data structure to reflect a drag and drop operation on a heirarchical navigation tree.

In my current web application project, I have a basic navigation tree, where a user can add and delete items from the tree and drag and drop a subtree from one location to another within the tree. The tree component that displays in the browser is a jQuery Javascript plugin. When someone drags and drops a tree item in the browser, an AJAX call sends to the server the source item ID (the item being dragged), the target item ID (the dragged-to item), and the type of the move (before, after, or inside). The type of the move naturally refers to the proximity of the dropped source to the target.

On the server, the Haskell fastCGI program represents the navigation tree data as a multi-way rose tree (import Data.Tree). I needed a way to transform the tree data to reflect the drag and drop update request. Below, was my first attempt to do this, given the source and target ID's, and the type of drop in reference to the target. My strategy was to find the source subtree (getSrc) and place it into a new tree that had the source subtree pruned from its original location. (exSrc). The putAbove, putBelow, and putInside functions do the work of grafting the source subtree into the proper place in relation to the target subtree.

type ProjectTree = Tree Integer

modTree :: Integer -> Integer -> String -> ProjectTree -> ProjectTree
modTree source target typ rs =
    case typ of
        "before" -> head $ b putAbove
        "inside" -> c putInside
        "after" -> head $ b putBelow
      where
        b proc = proc (fromJust (getSrc source rs)) (fromJust (exSrc source rs))
        c proc = proc (fromJust (getSrc source rs)) (fromJust (exSrc source rs))
        putAbove sub rs@(Node a b) =
            if target == a
            then sub:rs:[]
            else [Node a $ concat $ map (putAbove sub) b]
        putBelow sub rs@(Node a b) =
            if target == a
            then rs:sub:[]
            else [Node a $ concat $ map (putBelow sub) b]
        putInside sub rs@(Node a b) =
            if target == a
            then Node a $ concat [b, [sub]]
            else Node a $ map (putInside sub) b

-- Return the tree without the source subtree
exSrc src (Node a b) = 
    if src == a 
    then Nothing 
    else Just $ Node a $ mapMaybe (exSrc src) b

-- Return the source subtree
getSrc src rs@(Node a b) =
    if src == a
    then Just rs
    else case b of
            [] -> Nothing
            _ -> case mapMaybe (getSrc src) b of
                    [] -> Nothing
                    [item] -> Just item

I then decided to use a Tree Zipper (Data.Tree.Zipper) and see if I could come up with a clearer solution. A Zipper is a functional cursor into a data structure, in this case a Tree, and allows directional movement, along with any kind of modification within the tree. This solution was much shorter and arguably clearer and more direct as well.

modTree2 :: Integer -> Integer -> String -> ProjectTree -> ProjectTree
modTree2 source target typ rs =
    case typ of
        "before" -> act insertLeft
        "after" -> act insertRight
        "inside" -> act insertDownLast
  where
    act proc =
        let loc = findNode source $ fromTree rs
        in toTree $ proc (tree loc) (findNode target $ root $ fromJust $ delete loc)
    findNode topic loc =
        fromJust $ find (\n -> topic == getLabel n) $ getSubtrees $ Just loc
    getSubtrees =
        maybe [] (\n -> [n] ++ (getSubtrees $ firstChild n) ++ 
                                 (getSubtrees $ right n))

The fromTree and toTree functions bring the Tree structure in and out of the zipper. The main computation simply finds the source and target subtrees by walking the tree stucture down using the zipper and inserts the souce subtree into the new tree relative to the target. The general expression is embodied in the 2 lines of code that make up the 'act' function.

I'm pretty sure that these two approaches could be somewhat improved on, especially for the first one. I'm curious if anyone has a more direct approach for functionally moving subtrees around.

Saturday, June 14, 2008

A sort of functional Javascript

I realize my blog is supposed to be about infinitely interesting Haskell programming, but this short post ties in with my previous one. I decided to convert the last Haskell program from my previous posting to Javascript. The one that processed the PCRE documentation. I was able to translate it fairly closely in spirit to the Haskell one, (minus the pure functions and pattern matching, of course), with a few useful tools. The script is wrapped up with the famously awesome jQuery Javascript toolkit, so the Javascript loads from a web page, reads the PCRE.txt from the same server, does the processing in the browser, and inserts the result as an element in the current browser page. I also added Oliver Steel's Functional Javascript library. From this, the script makes use of map, foldl, and curry.

It's entirely possible to do a sort of Functional programming in Javascript. It has closures and lambda functions. Functions are indeed first class. However, Javascript wasn't necessarily designed for this, so function calls will be relatively expensive, making an already slow running process even slower.

Better dynamically typed scripting languages might have been used for scripting the web browsers, but we're stuck with Javascript here for the foreseeable future and with a little help, it's not too bad of a situation. Light and speedy Javascript toolkits like jQuery really ramp up the programming leverage available for web programming.

Compare the Haskell version with the Javascript version.

Friday, May 30, 2008

A little spec writing

In my last posting, I discussed my views on the benefit of exploratory programming. However, even when I decide to start pounding the keyboard, with a priority on exploring the solution space, it's essential to first write a problem specification for all non-trivial programming problems. Most importantly, this practice prevents me from solving the wrong problem, that is, getting to the end of a solution or getting stuck somewhere after a lot of effort and finding out that I've assumed wrongly and implemented the wrong functionality. Solving the wrong problem can be a learning experience, though wasteful, for small projects. For large projects, it sometimes qualifies as a disaster. Almost as important, writing even a short and simple spec gives the writer a much better handle on the boundaries of the problem. This is then used as a launching point for sparking even deeper abstract thinking about what kind of solution will best fit the problem. Following is a very small scale example of mine, reflecting this idea:

During one of my Scheme programming sessions last year, I needed to reference the PCRE library documentation. (That's Perl Compatible Regular Expressions, the regular expression library that Chicken Scheme uses.) However, what PCRE provides for documentation on it's site is the PCRE man pages, concatenated together into one 150+ page length .txt file. No table of contents and, of course, no section links. So, I wrote a Scheme program to parse the PCRE text file and create an HTML page out of the sections that I was interested in, making a table of contents derived from each sections headings and sub-headings, that were links back into the correct location within the section. From analyzing the PCRE man-page text, it's easy to pick out main and sub headings. Main having text flush with the left margin, while the sub-headings are indented once (4 spaces) and the remaining body content is all block indented twice. The only minor complication would be creating the table of contents, because the sub headings need to be grouped and contained in an HTML UL tag. So a little extra abstraction would be needed to take care of each sub-heading group when it could be determined that a group span had ended. Here is the implementation of my Scheme script. This is how the resulting HTML page should look.

After I had learned enough Haskell to be slightly dangerous, I decided to see how this little conversion task translated over. I kept the Scheme implementation in mind as I worked through my Haskell one. I soon began realizing, WOW!, this is not going to translate very smoothly to Haskell at all. Scheme has good support for it's own vision of functional programming (not pure), but Haskell, as a purely functional, referentially transparent language, simply doesn't allow for the update of variables. Haskell forces a much different and usually more elegant code structure, but when writing code, it's usually not trivial deciding on the most appropriate abstractions to use. Since my explanation here is weak without an example, I'll walk through my first implementation in Haskell and then revisit with a better second implementation that I coded later, inspired by the creation of a simple problem specification.

Haskell script - Take 1

Here is the code of my first attempt. It's less than 100 lines and fairly straight-forward, so it shouldn't take much time to understand how it operates. I'll outline the main implementation ideas below:

It opens the text file and uses a regular expression pattern to match the sections that I am interested in:

 -- The regular expression section template.
sectPatt :: String -> String
sectPatt shead = "(?ms)^" ++ shead ++ ".*?^NAME.*?(^PCRE.*?)^SEE ALSO"

Here, the shead parameter indicates the unique man-page pattern.

The main heading and sub-heading patterns match against the section text, resulting in a list for both heading types of heading string, offset relation pairs.

The two heading lists are merged into a list of TocEntry data:

-- Level1 is a main heading and Level2 is a sub-heading.
-- Constructor arguments are text of heading, offset location with the section, and
-- heading anchor label.
data TocEntry = Level1 String Offset String | Level2 String Offset String

The index entry list is now in the correct order and is used to assemble the table of contents and final section text for each section.

This solution abstracts the headings out into a TocEntry data structure for them to be handily processed at the assembly stage. This implementation always seemed a little awkward and excessive for this short little script, but it worked and I didn't bother with contemplating on a better abstraction at the time.

Haskell script - Take 2

I had it in the back of my mind that a solution to this problem needed a better abstraction, so several months later, I built up enough resolve to try my hand at it again. This time I decided to write out a simple problem specification, something I probably should have done at the very beginning. Even if a problem seems too small to warrant any kind of formal specification, writing something like this organizes the true nature of the problem, nice-and-neat, in one's mind and can lead to subtle insights into a clearly better implementation. So, here are the basic properties I documented about this problem:

I only want to see two of the man page sections. Take the section beginning with label PCREPATTERN to be the 'Pattern Syntax and Semantics' section and the section beginning with label PCRESYNTAX to be the 'Quick Syntax Reference' section.
The lines without indent are the major headings and the lines
starting with one indent are the minor headings (sub-headings) within a major
heading. This is in contrast to the body content which begins with
two indents.
Each major heading will have a top level entry in the table of
contents and each minor heading will have a sub level entry under the
top level entry that it belongs to. Each table of contents (TOC) entry will hyperlink to
an anchor placed at that entry's heading line inside the body content.
The final structure is Pattern Syntax and Semantics TOC, Quick
Syntax Reference TOC, then followed by the bodies content of the two
sections in the respective order.
The procedure for building the TOC and anchors within the body
involves walking down the list of headings matched within the body,
each time picking the next individual heading based on the offset
location within the section body.
Each section starts with a major heading. When a minor heading is
encountered a sub-list is started that spans until a major heading is
encountered again or the until the section is ended.
If both heading lists have run out, check for previous minor headings
waiting to be made a sub-list.

That's the bulk of it, leaving out lesser details. Reflecting on this list and thinking around the edges gave me valuable insights that led to a different approach for Version 2.

You can see that the intermediate data type I had used for the first Haskell script is gone. It's not needed anymore because no explicit merging is performed. Each section's table of contents and body text recursively assemble themselves as the header lists are consumed. Alas, the magnitude of improvement here, over the first Haskell script, is debatable. I don't think I sped the program up much for something that only takes on the order of milliseconds to complete anyway. I see the real improvement being in the more direct style of implementation. No unnecessary abstractions and no code convolution. Haskell shines at allowing for a shorter and more direct implementation than probably any other programming language. This is very important for readability and maintainability in larger programs. Whenever a definite abstraction is needed, Haskell will bend over backwards to accommodate, but it's a good practice to carefully consider each additional abstraction and use sparingly and purposefully.

Wednesday, May 28, 2008

In the context of exploratory programming

Ah, the joys of exploratory programming! Taking a piece of small and trivial code, swiftly hacking several layers of functionality, refactoring occasionally when more flexible abstractions are recognized, continuing on until Experience is screaming at you to start documenting your work in some sort of design specification. At this point, the need is to stave off madness; a clear and manageable set of assumptions has now become a crowd of sophisticated design decisions dancing Tangos inside your head.

So exploratory programming is just wonderful! Code slinging wearing merely your underwear and a cowboy hat. If you're a lady, I dunno, something similar maybe? Okay, probably not. Seriously, if you're in unfamiliar technical waters, it's a big help to start with example code, adding your own custom code as your solution progresses. It's possible to learn and create from reading only some library reference docs, assuming that they are complete, but it's usually a long and miserable experience.

Sometimes it's a little bit of Catch-22, where you don't understand foreign example code because it uses an unfamiliar library. But after turning to the reference docs, you often discover a chain of dependencies with incomplete explanations that are difficult to make sense of without good matching examples, which are usually absent. So, now the situation digresses to hunting around the web with your trusty Google chops and hoping that something useful eventually falls out of the sky. A good answer usually does, or maybe try pleading for a morsel on the newsgroups. Indeed, these are time consuming hobbies!

To sum up my previous rambling, a clear and adequate context is the foundation of lasting programming progress. This context can be found in the discovery of good examples, in the organization of one's problem solving plans, in reflection on what has already been tried, and in the result of many other useful habits.

The context of past experiences is perhaps the most important guide of all. This is simply known as 'understanding based on experience'. For example, I had decided to continue investing time and effort in learning the Haskell programming language and many of the powerful usage patterns made available by it's design, in part because I had used many other programming languages in the past, and was able to contrast their capabilities with those of Haskell's. If I was just starting out as a programmer and Haskell was the first programming language presented to me, I would probably have had a much harder time justifying the effort in continuing to learn a esoteric technology that can easily appear to make a more difficult experience out of programming than necessary. I suspect that most beginning programmers are initially unable to see past trivial programming examples when trying to get a feel for the most appropriate programming language to use. I've wondered how many novice programmers have decided Object Oriented Programming was "the bee's knees" after only seeing it used in the classic Shapes example. (Square, Circle, Triangle, etc. classes inheriting from class Shape) If a person has always lived just two blocks from where they work and had never encountered a bicycle before, how could they justify the seeming difficulty in riding a bicycle to work and back, if it is so easy to walk there?

Exploratory programming is experience building, which is why it plays such an important part. Without this context, you would have only the wise and obscure words of others to direct your steps; ultimately, not as effective. Doing is learning. Pretty obvious statement, I know.

Contrast this with the philosophy of first deciding upon a formal design specification before any implementation is written. The thinking here is that without mapping out a design, you risk implementing dead ends. In reality, unless you understand your target domain to a high degree and it's already clear in your mind what data structures your solution needs, it's usually more fruitful to begin doing something constructive with code first, then evaluating what's feasible as you progress.

It's a valuable skill to be able to alternate your focus between experimental implementation and formal design specification. Which brings me to a subtle but critical context needed for continued progress. The context of current achievement; deciding how to move forward in the context of what you have already achieved. For example, when coding up a complex real-world solution there is always many unexpected subtleties to encounter and work through. Complications are encountered that couldn't have been anticipated by preparatory design. Because of this, good insight into which direction to proceed next is often found by studying the micro-decisions already made in the code and recognizing the awkward parts that need to be refactored. True, there is always the risk of wasting lots of time flailing around with implementation changes. Here it is really important to encourage code tractability by implementing using either a language like Haskell, with static typing but automatic type-inference, or a dynamically typed language, because when you need to change directions and significantly modify your code, you don't want most of your time and effort being soaked up with the hassle and frustrations involved with explicit typing considerations.

In conclusion, it's often better to begin a project by writing code, testing the results, and exploring the solution space, rather than feeling stuck insisting on developing an uncertain design. A realistic design may not be revealed to you except after a little exploration. There is an art in knowing at what point to pause and reflect on the code for the purpose of fleshing out more of the formal design. A programmer shouldn't be afraid to make early implementation mistakes as long as the implementation is scrutinized regularly. Save your greater concerns for the much more serious mistake of solving the wrong problem for lack of considering what your problem space really is, something I will write about in a future posting.