Sunday, January 24, 2010

Code Kata: A data sifter

Here's a problem inspired by a scheme example I discovered another day.

Write a data sifter, sift, that partitions a string into a list of lists.  Start with the case of using letters as a delimiter, and numbers as data.  There can be any number of repetitions of numbers & letters.

user=>(sift "a1b2cd34")
(("a" ("1")) ("b" ("2")) ("c" ()) ("d" ("3" "4")))

Next, add the ability to your sift function to accept a list as input, as well as a string.

user=>(sift ("a" "1" "b" "2" "c" "d" "3" "4"))
(("a" ("1")) ("b" ("2")) ("c" ()) ("d" ("3" "4")))

After that, add the ability to take a vector/array as an input

user=>(sift ["a" "1" "b" "2" "c" "d" "3" "4"])
(("a" ("1")) ("b" ("2")) ("c" ()) ("d" ("3" "4")))

Finally, let your sift accept a collection of any object, and an arbitrary predicate.  If the predicate is true, the object is a delimiter (e.g. a String).  If the predicate is false, the object is data (e.g. a Number).

user=>(sift string? ["a" 1 "b" 2 "c" "d" 3 4])
(("a" (1)) ("b" (2)) ("c" ()) ("d" (3 4)))

I'll be posting my solution in about a week.

1 comment:

  1. Great blog and screencasts you have here.

    I guess it's considered bad style to argue with a specification but I'll do it anyway. It seems to me that the "correct" specification would be to return

    ([("a") (1)] [("b") (2)] [("c" "d") (3 4)])

    In other words, you're essentially partitioning the list into maximal alternating runs of satisfying and non-satisfying elements. With that fixed spec, the solution is easy and natural:

    A natural use case for this function is the standard regexp split function, although it usually returns a flattened version of sift's output where (cons [as cs] ...) would be replaced by (cons as (cons bs ...)).