Tom and Michael vs the Monad

May 18^th, 2011

So, over the last few weeks Tom Crinson and I read another paper, this time “The essence of functional programming” (available here) by Philip Wadler, one of the principal designers of Haskell. Tom’s post is here.

The paper

Don’t let yourself be fooled by the title, this is not an introduction to functional programming or lambda calculus. Instead it talks about monads, a topic one reads a lot about and that’s often made out to be a lot more complicated than it really is.

The whole paper is 23 pages long, so I won’t really summarize it all, but rather try to give you an idea if it might be an interesting read for you. Section 1 lays out some basics and explains the difference between pure and impure code. In section 2 the reader finally encounters the monad, defined using the usual triple (M, unitM, bindM). After that we see several versions of a simple call-by-value interpreter, demonstrating various well-known monads (identity, state, error, list etc), as well as a rewrite of the interpreter to implement call-by-name semantics. For some more theory Wadler lays down the monadic laws and provides a definition of monads in terms of mapM and joinM. He also mentions that list comprehension notation generalizes to arbitrary monads, which I found to be quite interesting. Section 3 introduces continuation passing style (CPS) and its relation to monads, before section 4 talks about the influence of monads on Haskell, including some cool info tidbits about early Haskell versions that seem to lack do-notation and the bind operator (the paper is from 1992 after all). Last but not least there’s the wrap-up and the work is put in context of other research.

Takeaway

Monads are not rocket science, they are just another tool in a programmer’s toolbox, especially of a programmer in a pure functional language. In the end, a monad really isn’t much more than a formalized computation, following certain rules like “I put in an A and get out a B, so what ever comes next has to take a B”. Ok, maybe that was a bit oversimplified, but it’s definitely not magic. In fact I wrote up an explanation for Tom, trying to get the basic idea across:

If you imagine a Monad as some sort of box, UnitM is what puts an item into the box. Bind extracts such a value (M a), processes it with a function (a → M b) and returns another item in a box (M b). Why is that cool, because you can’t leave the box! If you think about it – and that’s an oversimplification – it’s a bit like aligning your types so you can chain method calls (just that Unit and Bind have to follow the Monadic laws). Imagine in Ruby you have an intermediary result but instead of returning a plain value OR an array, you always make sure to return [*values]. This way you can chain anything that expects an array, no matter how many intermediary values there were. That’s a bit like having an item wrapped up in some sort of array monad (once again, we are in oversimplification land here, but maybe the analogy helps).

Difficulty of the paper

The author states that he assumes no previous knowledge of category theory or Haskell (the language used in the examples), but judging Tom’s reaction to the paper, I’m not convinced by that. For somebody who has seen Haskell before and/or has a basic understanding of Monads already this is a pretty good and easy to follow read though.

Q&A

Since Tom had some trouble with the paper – but bravely fought his way through it – we exchanged quite some emails on the actual content. This is cool, because it’s exactly the reason we are doing this! First we needed to get some Haskell syntax out of the way:

Let’s do it slow with actual Haskell (this is a stupid function though, you could just do add = (+)):

add :: Int → Int → Int
add x y = x + y

This means that “add” is a function that takes two “Ints” and returns an “Int”. The value after the last → is the actual return value of the function. If you wonder about the intermediate arrows, that’s because Haskell functions technically only take one argument and return a new function which takes another argument etc.

So what happens when you call

add 5 6

is that “add” gets applied to the argument 5, which returns another function that looks roughly like (add y = 5 + y) which in the end returns the number 11. This is a result of Haskell being based on the lambda calculus, where the equivalent expression would be \x.\y.x+y. For practical purposes it’s enough to remember the following though: whatever goes before the last → are the arguments to the function, the thing after the last → is the return type.

What about polymorphic functions? Easy:

add :: (Num a) => a → a → a
add x y = x + y

Here we just say that we take 2 arguments of type “a” and return another “a”, with the constraint that “a” has to be numeric.

Since it also is covered in the paper, here’s an example of a higher order function, good old map (that’s the same as Ruby’s map):

map :: (a → b) → [a] → [b]
map f [] = []
map f (x:xs) = f x : map f xs

This basically tells us that “map” takes two arguments. The first is itself a one-argument function ((a → b)) the second a list of values ([a]) and we return another list of values ([b]). Note that “a” and “b” can be the same type, but don’t have to be.

Could you please, as simply as possible, define a Monad?
I have to quote Brian Beckman on that, “Monads are a theory for interacting things”. Or to paraphrase, Monads are an abstraction that allows you to define how the various stages of a computation interact. It’s a sequence of operations, put into context.

What would you use a Monad for?
As seen in the paper, Monads are quite a flexible abstraction and can be used for many different things (take for example the MapReduce Monad). Personally I’ve never used one outside of my limited Haskell experience, but .Net developers may find it interesting to know that LINQ is a monad (more on that here).

Can I use them in Ruby?
You can, in fact there’s a link listed below that shows simple Monads implemented in Ruby. It’s more a question of how useful they actually are in a mostly object oriented language that allows for side-effects.

If I wanted to learn more about this sort of thing where should I go?
Brian Beckman recorded an awesome 67 minute video on the topic, which is probably the best introduction I ever saw.

Links

Information Overload 2011-05-15

May 15^th, 2011

Crunching the numbers: How many people have ever lived?
Why is math awesome? Because it answers questions like this one. Don’t worry, no spoilers ;-)
The Pirate Bay: ‘The Battle of Internets is About to Begin’
TPB’s answer to the threat of a “Great Firewall” in Europe.
2011 Trends
LongoLounge.com’s annual logo design trends.
How to Stop Worrying and Learn to Love the Internet
Wow, Douglas Adams already had a better grasp of the internet in 1999 than a lot of the people in decision-making positions have today.
Silver Needle in the Skype
Since everyone was talking about Skype this week, I thought I’d include this very interesting presentation from Black Hat 2006.
The History of Pretty Much Everything
If you like astronomy and/or physics, you’ll probably like this too.
Anonymous: peering behind the mask
Pretty good article by Jana Herwig on Anonymous.
Kill Math
While I don’t agree with everything said, this article does have some interesting points and a great visualization of integration.
Satellite Images of North Korea Prison Camps Find 200,000 Living as Slaves
A follow-up to last week’s article by Amnesty International, because this is something we shouldn’t forget about.
Are you living in a computer simulation?
That’s exactly the kind of question I want answered by Oxford’s philosophy department.

Information Overload 2011-05-08

May 8^th, 2011

Computer says: um, er…
Very nice article by Brian Christian who participated as a human confederate in the 2009 Turing Award.
How Goldman Sachs Created the Food Crisis
I lack the data and insight to fully judge the truth of this article, but I’m fairly certain that a non-trivial percentage of the world-wide rise in food prices was due to speculation. Isn’t it great how cynical humans can be?
World’s largest model airport the Knuffingen goes on show
After 7 years and millions of Euros the world’s largest model airport goes on show. This is impressive, make sure to check out the video and photo series in the article!
Images Reveal Scale Of North Korean Political Prison Camps
If Amnesty calls something “some of the worst circumstances we’ve documented in the last 50 years” it must be really, really bad.
How to Identify and Avoid Spreading Misinformation, Myths, and Urban Legends on the Internet
I know I’ve posted links like this one before, but some things really do need to be repeated.
Das Herz des Attentäters
It’s probably not as simple as this, but definitely something to think about (in German).
Google Go boldly goes where no code has gone before
While I don’t necessarily agree with the title, it’s still a nice article on a language we here surprising little about.
25 Abandoned Yugoslavia Monuments that look like they’re from the Future
I always was attracted to socialist architecture and the monuments in this article are no exception.
Small areas of the brain go to sleep when we’re up too late
Sleep is a strange beast I’ve been struggling with to varying degrees for almost as long as I can remember, which is one reason why I like reading about it.
The Neuroscience of the Gut
This is quite interesting, especially since this week I also found out about fecal transplants.

So You Want to Learn Clojure?

May 4^th, 2011

For last Monday’s Lambdaheads meeting I prepared a list of resources for people interested in learning Clojure, so I thought I’ll share it here. The actual presentation had a lot of comments, but I’m afraid I’m too lazy to repeat them in writing. Maybe the list is still useful for some people though:

Books

Books WIP

Online reading

Coding

Screencasts / Videos

Community

Tom and Michael vs Google’s MapReduce

May 2^nd, 2011

The other day on Twitter I complained about my own laziness when it comes to reading academic papers, and Tom Crinson chimed in to let me know that he’d also like to read more of them. That’s how we decided to set up a sort of online based reading group for this type of thing, where we’ll choose a paper to read, discuss it via email and then blog about it (Tom’s post is here).

The first in this series of blog posts is about Google’s MapReduce framework:

MapReduce: Simplified Data Processing on Large Clusters

The paper and a presentation based on the paper

Summary (by Tom Crinson)

This is a paper on an architectural software paradigm and framework that Google uses to process or generate vast amounts of data on commodity hardware.

The basic elements are that each task is split into two stages:

Mapping – a user defined processing task is applied to each job and converted to an intermediate key value format.
Reducing – a second stage of processing takes in the intermediate output from the mapping stage and performs user defined aggregation functions to generate the desired output.

The paper indicates that many different types of tasks can be split up in this way. Obvious ones for Google are processing pages, indexes and links etc, but people are also using the framework to do artificial intelligence tasks. This variety proofs the flexibility of the paradigm.

The idea itself is pretty simple, we do it a lot in day to day programming: get a set of data, do some operation on all of the elements and then perform some kind of aggregation; sum for example. Google just took it one step further and on a huge scale. They talk about setting up clusters of thousands of machines to do the processing. This is where their framework comes in.

The framework takes all of the pain out of distributing the tasks, setting workers onto individual jobs and handling input and output from each. It also takes care of the nitty gritty of fault tolerance, so for example if a worker dies the master process will notice and assign another worker to re-execute all of the tasks that the dead worker had done. The master will also take care of dealing with de-duplicating the jobs so your results are as they should be. It also has a pretty nice feature to handle “stragglers” (slow tasks): the framework preemptively sets another set of workers to duplicate the effort on the last few jobs just in case there is an issue with a worker machine that is slowing the process down. This was shown to have a huge effect in task completion time, an increase of something like 44% was seen without the backup processing.

In case of master failure the whole process stops, but as it goes along the master has been setting “checkpoints” so in case of a crash, a new master can be started up and can continue on from the last checkpoint. This is a manual process though as the user should probably check to see if any of the input caused the failure.

Another very handy feature is bad record detection. If there is a spurious data element that for some reason causes a worker to hang repeatedly, the master is notified and on the next attempt to process that record set, the bad record is skipped.

There are some other key elements to the system, like splitting he input so it can be assigned to the different workers. This has a default hash function on the key, but can be user specified if you don’t simply want hash(key) mod numWorkers as this is quite simplistic. See an article on consistent hashing for alternatives and why it’s bad. The input data tends to consist of just input files, however an API is exposed to allow custom data providers which is a nice feature.

Takeaway

For me the key takeaway was the following sentence: “[it] allows programmers who have no experience with distributed and/or parallel systems to exploit large amounts of resources easily”, which basically means that MapReduce tries to take an inherently complex task and abstracts all the difficult bits away, so that programmers can focus solely on the data manipulation/transformation aspects.

Difficulty of the paper

Very easy. It’s a simple textual description of MapReduce including good examples. There’s basically no math and the language is easy to understand and not full of computer science jargon, so even less technical oriented people should be able to understand most of it. The presentation also serves an excellent summary.

Examples and further reading, food for thought

I hacked together a little something in IRB to illustrate the basic workings of the algorithm (it’s not perfect, but good enough for our purposes):

1
2
3
4
5
6
7
8
9
10
11
1

2

3

4

5

6

7

8

9

10

11

<span class='line'><span class="o">&gt;&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="s2">&quot;foo bar baz qux foo bar lala&quot;</span>
</span><span class='line'><span class="c1">#=&gt; &quot;foo bar baz qux foo bar lala&quot;</span>
</span><span class='line'><span class="o">&gt;&gt;</span> <span class="c1"># map (user-supplied)</span>
</span><span class='line'><span class="o">.</span><span class="n">.</span>   <span class="n">mapped</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">/ /</span><span class="p">)</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">w</span><span class="o">|</span> <span class="o">[</span><span class="n">w</span><span class="p">,</span><span class="mi">1</span><span class="o">]</span> <span class="p">}</span>
</span><span class='line'><span class="c1">#=&gt; [[&quot;foo&quot;, 1], [&quot;bar&quot;, 1], [&quot;baz&quot;, 1], [&quot;qux&quot;, 1], [&quot;foo&quot;, 1], [&quot;bar&quot;, 1], [&quot;lala&quot;, 1]]</span>
</span><span class='line'><span class="o">&gt;&gt;</span> <span class="c1"># aggregate (built-in)</span>
</span><span class='line'><span class="o">.</span><span class="n">.</span>   <span class="n">aggregated</span> <span class="o">=</span> <span class="n">mapped</span><span class="o">.</span><span class="n">group_by</span> <span class="p">{</span> <span class="o">|</span><span class="n">w</span><span class="p">,</span> <span class="n">c</span><span class="o">|</span> <span class="n">w</span> <span class="p">}</span>
</span><span class='line'><span class="c1">#=&gt; {&quot;foo&quot;=&gt;[[&quot;foo&quot;, 1], [&quot;foo&quot;, 1]], &quot;bar&quot;=&gt;[[&quot;bar&quot;, 1], [&quot;bar&quot;, 1]], &quot;baz&quot;=&gt;[[&quot;baz&quot;, 1]], &quot;qux&quot;=&gt;[[&quot;qux&quot;, 1]], &quot;lala&quot;=&gt;[[&quot;lala&quot;, 1]]}</span>
</span><span class='line'><span class="o">&gt;&gt;</span> <span class="c1"># reduce (user-supplied)</span>
</span><span class='line'><span class="o">&gt;&gt;</span> <span class="n">aggregated</span><span class="o">.</span><span class="n">each_with_object</span><span class="p">({})</span> <span class="p">{</span> <span class="o">|</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">),</span> <span class="n">h</span><span class="o">|</span> <span class="n">h</span><span class="o">[</span><span class="n">k</span><span class="o">]</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:last</span><span class="p">)</span><span class="o">.</span><span class="n">inject</span><span class="p">(</span><span class="ss">:+</span><span class="p">)</span> <span class="p">}</span>
</span><span class='line'><span class="c1">#=&gt; {&quot;foo&quot;=&gt;2, &quot;bar&quot;=&gt;2, &quot;baz&quot;=&gt;1, &quot;qux&quot;=&gt;1, &quot;lala&quot;=&gt;1}</span>
</span>

When thinking about the algorithm a bit more, it also dawned on me that MapReduce sounds a lot like a Monad. When I started searching for this, it soon turned out that someone else had the same idea:
MapReduce as a monad
Then there’s also a nice article I found on this last week, more or less by accident:
Map / Reduce – A visual explanation
Last but not least some links to MapReduce related frameworks and libraries:
Hadoop

BashReduce

Skynet

Cascalog
  
  


    
  
  
    
      
  
    
      Information Overload 2011-05-01
    
    
      
        








  


May 1^st, 2011
        
      
    
  


  
Solar power goes viral
A genetically modified virus increases the efficiency of solar cells.
Digging Deeper
In a time where it seems like a lot of journalists just seem to copy whatever they find online, it’s refreshing when a journalist from a small newspaper – who above all is a science and data geek – wins a Pulitzer Price.
Guantanamo leaks lift lid on world’s most controversial prison
The Guardian covers the Guantanamo leaks, Including a video interview with Clive Stafford Smith, legal director of Reprieve, as well as an interactive presentation of all the prisoners’ files.
Eating Healthily for $3 a Day
Bad eating habits are quite common in the hacker community, so I decided to include this link. For me the main problem when it comes to healthy eating is having to work, but even there you can manage without too much overhead.
Richard Feynman on education in Brazil
A good read. During my academic life I met enough people who considered memorizing more important than understanding.
The Consequentialist
A very long and detailed article by the New Yorker that helps better understand the Obama administration’s foreign policy.
Amazon’s $23,698,655.93 book about flies
On the wonders of algorithmic pricing.
LSD: The Geek’s Wonder Drug?
This article is already 5 years old and I feel we still haven’t moved forward in rehabilitating certain drugs and using them for good.
The Ten Commandments of Egoless Programming
Another article from 2006, but still as true today as it was back then.
Programming may one day be about getting the maths right
Yes, I’m an FP aficionado, so what? ;-)

  
  


    
  
  
    
      
  
    
      Information Overload 2011-04-24
    
    
      
        








  


Apr 24^th, 2011
        
      
    
  


  
classes of functions
If you often get confused by terms like injective, surjective and bijective, this blog post including diagrams and programming analogies should help.
Una ragazza e la sua stanza
Not your typical Information Overload post, this photo series portrays teenage girls and young women in the US and the Middle East in their rooms. It’s interesting to see the differences and similarities in clothing, decoration, self-exposure and so on.
How Little Sleep Can You Get Away With?
I know I’m guilty of not sleeping enough (even though it’s not always voluntary) and I know so are a lot of my fellow hackers out there, but maybe we should rethink our strategy?
Schnitt sich van Gogh sein Ohr selbst ab?
I didn’t know there are actually rumours that Gaugin cut Van Gogh’s ear off (in German).
Intuitive Understanding of Sine Waves
I’m generally a fan of Better Explained, but this is an especially good article.
Primordial Weirdness: Did the Early Universe Have One Dimension?
I’m always eager to read about new Physics theories and I’m especially fond of this one.
Brigands seeking harbours
Interesting study regarding piracy in dysfunctional countries. Funny side note: during my brief stint as a German teacher I once taught one of the authors of the cited paper.
What If There Weren’t So Many White People?
OkCupid sometimes do cool blog posts about data gathered from their user base and this one is no exception.
Map / Reduce – A visual explanation
If you are confused my map/reduce, this might help.
Can a complete novice become a golf pro with 10,000 hours of practice?
Do 10000 hours of practice really make you an expert? Dan McLaughlin is trying to find out.

  
  


    
  
  
    
      
  
    
      Ruby Book Recommendations
    
    
      
        








  


Apr 23^rd, 2011
        
      
    
  


  As Ruby matured and became part of the programming mainstream (at least to a certain extent), publishers got on the bandwagon and supply us with more and more Ruby books, many of which are really good. Alas time is limited and most of us won’t manage to read them all, so I thought I’ll write up my current recommendations.
From zero to hero
For people new to Ruby, there are 3 books I consider absolute must reads:

	The Well-Grounded Rubyist by David A. Black:

This book really lives up to its title and gives the reader a solid grounding that will be a great basis for diving deeper into the language (I wrote a longer review here). Alternatively you might want to consider Programming Ruby by Dave Thomas, with Chad Fowler and Andy Hunt, which is also an excellent book and a standard recommendation for Ruby newbies.
	Eloquent Ruby by Russ Olsen:
This book is a true gem! Mostly written for people coming to Ruby from other programming languages, Russ Olsen doesn’t dwell on banalities, but instead teaches you good, idiomatic Ruby. The presented material is excellent and the writing is witty and clever. If you are new to programming this is probably not the best book for you, but if you have experience in some other language or “just” want to get more fluent in Ruby, “Eloquent Ruby” is hard to beat.
	Ruby Best Practices by Gregory T. Brown:

Once you worked your way to the other two books or have already coded in Ruby for a while, you definitely should check out “Ruby Best Practices”. As the title suggests this is not about learning Ruby, it’s about getting the most out of the language, for yourself as well as for potential collaborators and colleagues. No matter how long you’ve been doing Ruby, I’m almost certain you’ll still pick up a little trick here and there. 

Now obviously you can’t just sit down, read those 3 books and become a great Rubyist through reading. Start reading, work with the text, use IRB to experiment and find yourself some challenges to try your hands on (e.g. the Ruby Koans or Code Katas).
Advanced Rubyists

	The Ruby Programming Language by David Flanagan, Yukihiro Matsumoto:

I guess this wouldn’t really need to be mentioned, but for completeness’ sake I’ll include it here. While I don’t think it’s a great book for learning the language, it’s something every serious Ruby developer should own as a reference.
Refactoring: Ruby Edition by Jay Fields:
Granted, this may not sound exciting, but “Refactoring Ruby” is a really good book, that will come in handy once you have to work on a somewhat significant Ruby codebase. Think of it as design patterns for refactoring.

Of course “Ruby Best Practices” also falls in this category, but I already mentioned it in the first section.
What Ruby books do you recommend and why?
  
  


    
  
  
    
      
  
    
      Information Overload 2011-04-17
    
    
      
        








  


Apr 17^th, 2011
        
      
    
  


  
How I got a blank book to the top of the Amazon charts
A good example of what marketing can do for you.
The Grand Tour
Very interesting insight into a Chinese tour group in Europe.
The mathematics of being nice
An interview with Martin Nowak professor of mathematics and biology at Harvard University (and Austrian btw) about evolutionary biology, mathematics and his beliefs.
The Lisp Curse
While I don’t necessarily agree on all points, this is still a pretty interesting read.
“Why is one man primarily responsible for the existence of the letter ‘J’ and what letter did it split off from? ":http://hotword.dictionary.com/j/
The language nerd in me like this a lot.
This Tech Bubble Is Different
Let me sum up this article with an actual quote from it: “The best minds of my generation are thinking about how to make people click ads,” he says. “That sucks.”
The road to being a kick-ass public speaker
This should be mandatory reading for anyone speaking at (tech) conferences.
What happens when an air traffic controller is asleep?
I really like insights into the aviation industry, so I found this quite interesting.
The Worst Mistake in the History of the Human Race
Was adopting agriculture the worst mistake we ever made? According to this article, probably yes, but as always you are invited to form your own opinion.
When Did Girls Start Wearing Pink?
A closer look at yet another gender stereotype.

  
  


    
  
  
    
      
  
    
      ClojureX Is Back
    
    
      
        








  


Apr 14^th, 2011
        
      
    
  


  Considering how many tools for getting up and running with Clojure are available nowadays (e.g. cljr, cake or Dejour) I’m not sure if anyone still cares, but as of today I’m actively working on ClojureX again.
There’s not really a good reason for that, but I noticed I’ve been absent from the Clojure community for way too long and would like to change that again, so see this as a first step.
  
  


    
  
  
    
      ← Older
    
    Blog Archives
    
    Newer →
    
  


  
    
  About citizen428
  I’m Michael Kohl, generally known as citizen428 online. I mainly
  write about programming, and do a regular blog post series collecting interesting
  articles I enjoyed throughout the week.







  Recent Posts
  
    
      
        Review: Statistics Done Wrong
      
    
      
        Information Overload 2015-07-12
      
    
      
        Information Overload 2015-07-06
      
    
      
        Information Overload 2015-06-07
      
    
      
        Information Overload 2015-06-01
      
    
  



  Latest Tweets
  
    Status updating…
  
  
  
  
    Follow @citizen428
  




  GitHub Repos
  
    Status updating…
  
  
  @citizen428 on GitHub
  
  
  



  


    
  
  
  Copyright © 2016 - Michael Kohl -
  Powered by Octopress

citizen428.blog()

Try to learn something about everything

Tom and Michael vs the Monad

Information Overload 2011-05-15

Information Overload 2011-05-08

So You Want to Learn Clojure?

Books

Books WIP

Online reading

Coding

Screencasts / Videos

Community

Tom and Michael vs Google’s MapReduce

Information Overload 2011-05-01

Information Overload 2011-04-24

Ruby Book Recommendations

Information Overload 2011-04-17

ClojureX Is Back