citizen428.blog()

Try to learn something about everything

Information Overload 2010-11-14

Review: Create Your Own Programming Language

Synopsis:
Create Your Own Programming Language by Marc-Andre Cournoyer is a guide on – surprise – creating your own programming language, consisting of a 53 page PDF, exercises and solutions, a toy language written in Ruby and a more full-featured one hosted on the JVM. There’s also a 10 minute long screencast on extending the latter, as well as an online community.

The author:
Hailing from Montreal, Marc-Andre is no stranger, especially if you are a Rubyist. Some of his notable projects include the web server Thin, the Ruby interpreter tinyrb and the web site Refactor My Code. He also co-founded a company and is now CTO of another one.

The book:
Obviously 53 pages are not enough to teach you everything about writing your own production-ready programming language, but that’s also not what the author advertises. What you do get is a quite well-made little PDF, explaining all the important components – lexer, parser, interpreter, runtime model, compiler and virtual machine – of a toy language called “Awesome”, which is basically a subset of Ruby with Python-style indentation.

After the introduction, all chapters follow the same basic structure. First a brief introduction of the topic at hand, including pointers to well established projects like Flex, Ragel, Bison or LLVM. After that you’ll find some well-documented and easy to understand Ruby code, implementing what you just learned about. Chapters finish with small exercises, e.g. implementing the WhileNode in the interpreter or adding inheritance to the runtime. In the end there are some parting thoughts on topics like homoiconicity and self-hosting, as well as a resources section including pointers to events, forums, blogs and notable languages. If you are a language geek there’s probably not much new here, but it’s a nice addition. Last but not least you’ll find solutions to the exercises. Once you came this far you can dive into the JVM language and experiment with extending it.

Summary:
Would I recommend this book? That depends on what you are looking for. If you want a detailed academic text, Create Your Own Programming language is definitely not for you. Likewise if you already implemented your own language or have a good idea about what’s involved in doing so, there are also better books to spend your money on. If however you are looking for a fun quick intro to the topic, including easy to understand and well-document code, as well as the right terms to feed into your preferred search engine, this might just be what you are looking for, but you’ll have to decide if you are willing to spend US$40 on that, which I personally find a bit too steep.

Information Overload 2010-11-07

Viennale 2010

Another Viennale has finished, time for a little recap of the five movies we saw this year:

  • Los labios (Ivan Fund & Santiago Loza)
    This had potential, but didn’t really live up to it. Usually I’m all for mixing professional and amateur actors, but in this case it didn’t work for me. A documentary on this topic would probably haven been more interesting.
  • Vincere (Marco Bellocchio)
    Probably my favorite movie of this year’s festival. An interesting – and true – story, great acting and beautiful presentation, what more can you ask for?
  • Un homme qui crie (Mahamat-Saleh Haroun)
    A solid movie about present-day Chad. Told in a very dry manner, I really enjoyed the first half of this, but found it a bit of a drag towards the end. The lead actor was great though!
  • The Oath (Laura Poitras)
    A documentary about two men from Yemen who worked for Osama Bin Laden. I found this very interesting and as always it’s good to hear the other side of a story.
  • Engkwentro (Pepe Diokno)
    The life of two brothers in a ghetto somewhere in the Philippine – violence, drugs and gangs. This seems to have been made with a fairly low budget, and the nervous handheld camera may upset some people. Overall it was pretty interesting though and I’m glad we saw this.

Information Overload 2010-10-31

Information Overload 2010-10-24

Quick and Dirty Simhash in Ruby

Today at work we ended up talking about simhashing (a hash function which generates similar hashes for similar inputs) and I found this nice article with a step by step explanation of the algorithm, so I wrote a quick Ruby version (needs 1.9):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<span class='line'><span class="k">class</span> <span class="nc">String</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">def</span> <span class="nf">shingles</span>
</span><span class='line'>    <span class="nb">self</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">//</span><span class="p">)</span><span class="o">.</span><span class="n">each_cons</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:join</span><span class="p">)</span><span class="o">.</span><span class="n">uniq</span>
</span><span class='line'>  <span class="k">end</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">def</span> <span class="nf">simhash</span>
</span><span class='line'>    <span class="n">v</span> <span class="o">=</span> <span class="nb">Array</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</span><span class='line'>    <span class="n">hashes</span> <span class="o">=</span> <span class="n">shingles</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:hash</span><span class="p">)</span>
</span><span class='line'>    <span class="n">hashes</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">hash</span><span class="o">|</span>
</span><span class='line'>      <span class="nb">hash</span><span class="o">.</span><span class="n">to_s</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">chars</span><span class="o">.</span><span class="n">each_with_index</span> <span class="k">do</span> <span class="o">|</span><span class="n">bit</span><span class="p">,</span> <span class="n">i</span><span class="o">|</span>
</span><span class='line'>        <span class="n">bit</span><span class="o">.</span><span class="n">to_i</span> <span class="o">&amp;</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">?</span> <span class="n">v</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">+=</span> <span class="mi">1</span> <span class="p">:</span> <span class="n">v</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">-=</span> <span class="mi">1</span>
</span><span class='line'>      <span class="k">end</span>
</span><span class='line'>    <span class="k">end</span>
</span><span class='line'>    <span class="n">v</span><span class="o">.</span><span class="n">map</span><span class="p">{</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">?</span> <span class="mi">1</span> <span class="p">:</span> <span class="mi">0</span> <span class="p">}</span><span class="o">.</span><span class="n">join</span>
</span><span class='line'>  <span class="k">end</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">def</span> <span class="nf">hamming_distance</span><span class="p">(</span><span class="n">other</span><span class="p">)</span>
</span><span class='line'>    <span class="n">other_sh</span> <span class="o">=</span> <span class="n">other</span><span class="o">.</span><span class="n">simhash</span>
</span><span class='line'>    <span class="nb">self</span><span class="o">.</span><span class="n">simhash</span><span class="o">.</span><span class="n">chars</span><span class="o">.</span><span class="n">each_with_index</span><span class="o">.</span><span class="n">inject</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">total</span><span class="p">,</span> <span class="p">(</span><span class="n">bit</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span><span class="o">|</span>
</span><span class='line'>      <span class="n">total</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">bit</span> <span class="o">!=</span> <span class="n">other_sh</span><span class="o">[</span><span class="n">i</span><span class="o">]</span>
</span><span class='line'>      <span class="n">total</span>
</span><span class='line'>    <span class="k">end</span>
</span><span class='line'>  <span class="k">end</span>
</span><span class='line'>
</span><span class='line'><span class="k">end</span>
</span>

Getting the “features” of a string:

1
2
>> "This is a test string".shingles
#=> ["Th", "hi", "is", "s ", " i", " a", "a ", " t", "te", "es", ...]

Simhashing a string:

1
2
>> "This is a test string".simhash
#=> "0100110001100001001100010000000010001001001000110110000010101100"

Calculating the Hamming distance between the symhashes of two strings (higher numbers mean less similar, with 64 being the highest possible score):

1
2
"This is a test string".hamming_distance("This is another test string")
#=> 8

Information Overload 2010-10-17

Information Overload 2010-10-09

Ruby and HTTParty: GitHub Issues to CSV

To fill our backlog at work, our project manager needed a list of our current GitHub issues, ideally in a spreadsheet format. A couple lines of Ruby produce CSV output consisting of issue title, creation date, reporter and labels, which then can be redirected to a file and opened with any spreadsheet program.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<span class='line'><span class="nb">require</span> <span class="s1">&#39;httparty&#39;</span>
</span><span class='line'>
</span><span class='line'><span class="k">class</span> <span class="nc">GitHubIssues</span>
</span><span class='line'>  <span class="kp">include</span> <span class="no">HTTParty</span>
</span><span class='line'>  <span class="n">base_uri</span> <span class="s1">&#39;http://github.com/api/v2/yaml&#39;</span>
</span><span class='line'>
</span><span class='line'>  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">show</span>
</span><span class='line'>    <span class="c1"># username must be of the form &#39;&lt;username&gt;/token:&lt;token&gt;&#39;</span>
</span><span class='line'>    <span class="n">opts</span> <span class="o">=</span> <span class="p">{</span><span class="ss">:basic_auth</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="ss">:username</span> <span class="o">=&gt;</span> <span class="s1">&#39;&#39;</span><span class="p">}}</span>
</span><span class='line'>    <span class="nb">self</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;/issues/list/&lt;user&gt;/&lt;projects&gt;/open&#39;</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span><span class="o">[</span><span class="s2">&quot;issues&quot;</span><span class="o">].</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">issue</span><span class="o">|</span>
</span><span class='line'>      <span class="nb">puts</span> <span class="s2">&quot;</span><span class="si">#{</span><span class="n">issue</span><span class="o">[</span><span class="s2">&quot;title&quot;</span><span class="o">]</span><span class="si">}</span><span class="s2">;</span><span class="si">#{</span><span class="n">issue</span><span class="o">[</span><span class="s2">&quot;created_at&quot;</span><span class="o">]</span><span class="si">}</span><span class="s2">;</span><span class="si">#{</span><span class="n">issue</span><span class="o">[</span><span class="s2">&quot;user&quot;</span><span class="o">]</span><span class="si">}</span><span class="s2">;</span><span class="si">#{</span><span class="n">issue</span><span class="o">[</span><span class="s2">&quot;labels&quot;</span><span class="o">].</span><span class="n">join</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span>
</span><span class='line'>    <span class="k">end</span>
</span><span class='line'>  <span class="k">end</span>
</span><span class='line'><span class="k">end</span>
</span><span class='line'>
</span><span class='line'><span class="no">GitHubIssues</span><span class="o">.</span><span class="n">show</span>
</span>

It’s quick and dirty (e.g. will die if there are no issues), but does exactly what we needed in just a couple of lines thanks to the awesome HTTParty.

This also has been published in slightly different form on our company blog.

Copyright © 2016 - Michael Kohl - Powered by Octopress