Sunday, January 3, 2010

1.2 fn Proposal: same & multisame

Hello Clojure Developers,
Writing software frequently follows the same process. Observing & understanding the processes and coming up with effective solutions is the task of library design. An API is judged by how well it fits into the process.

Application design is a slightly different process. For the purposes of this proposal, it involves three oversimplified steps.
  1. Convert problem data to a form the API can understand.
  2. Use the API to come up with a solved version of the problem.
  3. Convert the API produced solution back to the problem domain solution.
One of features we all love about Clojure is the sequence abstraction. For many problems, it drastically reduces the time required to convert the specific problem to a form that the API understands. Once a problem is transformed to operating on a string, map, vector or set the sequence functions can take it from there. This brings steps 1 & 2 very close to each other.

However, there currently is not much work done to bring steps 2 & 3 closer to each other. This is evidenced by the fact that there are specialized namespaces in contrib for handling strings (str-utils2), functor application (generic.functor), and I was in the middle of proposing additions to the map-utils library.

All this code duplication started to smell. Here we all are writing specialized routines to AVOID using the sequence functions in our code. This is not right.

I have a proposal to eliminate this smell. I've written a higher order function called same. Here's the doc:

lib.sfd.same/same
([index? seq-fn & args])
"same is a mutlimethod that is designed to "undo" seq. It expects a seq-fn that returns a normal seq, and the appropraite args. By default it converts the resulting seq into the same type as the last argument. An optional leading integer, index, can be provided to specify the index of the argument that should be used to convert the seq. If it is a sorted seq, the comparator is preserved.


This operation is fundamentally eager, unless a lazy seq is detected. In this case no conversion is attempted, and laziness is preserved."

Please take a moment to review a fairly robust list of examples now:

http://github.com/francoisdevlin/devlinsf-clojure-utils/blob/master/test/lib/sfd/same_test.clj

Afterwards, you can peruse the code here:

http://github.com/francoisdevlin/devlinsf-clojure-utils/blob/master/src/lib/sfd/same.clj

This one function will provide the same functionality as the proposed map-utils, some of c.c.str-utils2, c.c.generic.functor, or any desired set & vector utils. It's based on a multimethod, so you are a simple defmethod addition away from keyword-utils or symbol-utils (assuming you'd want to treat them like strings).

I've also designed a method, multi-same, for functions that take a sequence in and split it into several sequences. Here's a quick example, as the uses for multi-same are still being developed.

user=>(multi-same partition 2 "abcd")
("ab" "cd")

One thing that I find VERY fascinating is the areas where same & multi-same do NOT allow str-utils2 to be replaced out of the box. Some of these can easily be explained. str-utils2/trim is a very string specific piece of code. However, others cannot easily be explained. Why is it that there is no way to split a sequence similar to a regular expression?

I think these areas where string processing is easier represent places we need to improve our sequence library. I've included some new functions in lib.sfd.seq-utils, and I would ask this group to consider adding them to c.c.seq-utils or core.

There also isn't a parser that works with predicates & sequences in core yet. I suspect fn-parse may be a start. I'd appreciate help from anyone that is good with parsers/monads.

So, here's a chance to simultaneously reduce the amount of code in contrib and add lots of functionality to Clojure. In summary, here's what I'm proposing
  1. Add same to core
  2. Add multi-same to core
  3. Add new sequence fns to contrib or core
  4. Add a new sequence parser to contrib or core
There is one more plea I would like to add on a personal note. People are always detracting from Lisp, asking us to provide examples of how it is more productive than other languages. I believe this is an opportunity to build something that replaces several libraries with one function. It would be an example of how Lisp *eliminates the need* for writing code, very concisely.

UPDATE 1/3:  I re-wrote same & multi-same to work with a protocol, per Stuart Sierra's suggestion.

I look forward to the discussion,
Sean

4 comments:

  1. I’ll have to play with this some more to judge just how useful this is in practice, but from your description and the examples in the test suite I must say that I’m impressed! Anything that results in less code is a win in my book.

    And as a bonus the implementation seems short enough for even me to understand.

    One thing though: the name seems destined to cause confusion, although I’m not sure that I can come up with a better alternative (the immediately obvious “unseq” is just plain ugly), however, I’d like to suggest “coalesce” as an alternative name.

    Anyway - nive work!

    ReplyDelete
  2. Ian,
    Glad you like it. As always, the name could use some work. I'm open to suggestions.

    ReplyDelete
  3. What about "trans", "carry", or "<--" ?

    ReplyDelete