un pitone a san luca2014-07-03T03:14:39-07:00http://EnricoGiampieri.github.comStatus reports2013-04-16T07:39:00-07:00http://EnricoGiampieri.github.com/introduction/2013/04/16/status-report<p>First of all, I’m finally moving to python 3.3! as I pointed out in my previous post, <a href="http://enricogiampieri.github.io//introduction/2012/12/10/moving-to-python3-for-numerical-computation/index.html">Moving to python 3k for numerical computation</a>, the situation is still from perfect, but in the last months it got better: still no good response from mayavi and tables (and that’s a real shame), but both scikits.learn and biopython (even if only installing from sources) got the golden status of py3k ready. Given that I rarely use tables or mayavi, almost 100% of my forkflow is ready for the transition. The reason for the transition is quite simple: I like new shiny toys :) aside from that, in the last year I’ve been bitten frequently by the unicode management in python 2.7, and several times I’ve desired strongly to move to a better behaved language like python 3. The python team made a great job, and the result is a language that has a cleaner and more logical structure. Why should I stick to the less than optimal python 2? I don’t particularly love giving myself troubles for free, and I still have very few strings attached so that my transition can be carefree. The only thing that kept me back was the status of the support of my everyday packages. To keep track of the evolution of the support of your preferred package, two good point of reference are <a href="http://python3wos.appspot.com/">python 3 <s>wall of shame</s> superpowers</a> and the official page of the top support in python 3 <a href="http://py3ksupport.appspot.com/">http://py3ksupport.appspot.com/</a>. And do me a favor, start writing python 3 compatible script with a proper use of the <code>__future__</code> statement and the <a href="http://pythonhosted.org/six/">six</a> library!</p>
<p>Aside from that, I spent a lot of time working with the guys of the <a href="http://statsmodels.sourceforge.net/">statsmodels</a> project. I got a pull request accepted to implement the mosaic plot and two more are waiting a response: one is an poor-man implementation of the facet plot and one is targeted to microarray and pathway analysis. I have to admit that this has been a TREMENDOUS experience. I learned a lot of things from them, first of all the huge gap that exist between writing code that do something and code that let other do the same thing. Code readibility, good docstring, package organization and a lot of new, fun things to do. It’s has been a good excuse to get more confident with the git workflow, that I have always wanted to learn but always postponed.</p>
<p>I also tried to post a package on pipy, the central python packages repository. The package is named <a href="https://pypi.python.org/pypi/keggrest/0.1.1">keggrest</a>, and it’s a basic implementation of the rest API of the KEGG biological database. It’s a shame, as it has close to no documentation and no support to python 3. Talk about throwing the first stone :D. In the next few days I will give it some love to make it a package like the one that I expect to use.</p>
<p>See you soon with more updates and material!</p>
Blogging with the ipython netbook and the github/nbviewer combo2013-01-22T15:32:00-08:00http://EnricoGiampieri.github.com/introduction/2013/01/22/blogging-with-ipython-notebook-and-git-github-combination<p>As you can see I’ve just changed the whole aspect of the blog.
This is a response to a couple of design need. The lesser one is that all the plots that I will create will be with a white background, and against a dark background it result in a pain in the eye. The greater one is that Blogger suck for posting code and images. I spend half of my time to check the font and color of the code, and every image require saving it to disk. I can’t even show any more complicated code as the results should be reformatted by hand. And, to be honest, I always dreamed of simply blogging my ipython notebook scripts.</p>
<p>So, thank to <a href="http://brianegranger.com/">Brian E. Granger</a>, who wrote a simple method to <a href="http://brianegranger.com/?p=215">post an ipython notebook as a frame</a> in the post, I can have the cake and eat it too!
The method is simple as adding an iframe tag (with the correct dimension put by hand, but that’s a minor flaw) in the HTML code of the post, and voilà !</p>
<p><code><iframe src="http://nbviewer.ipython.org/3835181/" width="800" height="1500"></iframe></code></p>
<p>Leveraging the magic of the <a href="http://nbviewer.ipython.org/">nbviewer</a> server (that is an amazing service, by the way) I can now write a wonderfully formatted notebook, with code and formulas and plots and everything, and just hand it to you. What I’m going to do is set a <a href="https://github.com/">git repository</a>, create my notebook in there and link them with nbviewer in here. By the way, the notebook that it’s linked it’s the wonderful <a href="http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotlib/">XKCD plot style</a> created by <a href="http://jakevdp.github.com/">Jake Vanderplas</a>, a great blogger and python developer</p>
<p>I will try to explain how I’m going to set up and manage the repository.
First of all I create a new repository called blogger_notebook
Having set up a github account (that is really easy), the next step is to create the directory that will host my material.</p>
<div class="highlight"><pre><code class="bash">mkdir blogger_code
<span class="nb">cd </span>blogger_code/
</code></pre></div>
<p>GitHub give some useful information on how to create a new repository. First of all, we create it by writing</p>
<div class="highlight"><pre><code class="bash">git init
</code></pre></div>
<p>this tell to git that in this directory we have a git repository and that it should keep the version control backup of the data. Now I tell him the online repository location:</p>
<div class="highlight"><pre><code class="bash">git remote add origin https://github.com/EnricoGiampieri/blogger_notebook.git
</code></pre></div>
<p>Ok, we are close to the goal. I copy the notebook that will be my next post, basemap.ipynb. Now I need to tell git to follow it</p>
<div class="highlight"><pre><code class="bash">git add basemap.ipynb
</code></pre></div>
<p>now everytime I make a modification to this file that I want to remember I can save it with the commit command. I also need to add a description of the modification done</p>
<div class="highlight"><pre><code class="bash">git commit basemap.ipynb -m <span class="s2">"creation of an example of basemap usage"</span>
</code></pre></div>
<p>lastly, to keep up the repository online, I should put it into the GitHub repository. This will ask for my username and password and will upload all the modification to the online repository.</p>
<div class="highlight"><pre><code class="bash">git push -u origin master
</code></pre></div>
<p>you can see the results here:</p>
<div class="highlight"><pre><code class="bash">https://github.com/EnricoGiampieri/blogger_notebook
</code></pre></div>
<p>Now, the last step is to create a nbviewer link to the notebook. You should take the link to the raw file (you can obtain it going into the file and search for the RAW button) and give it to the nbviewer main page. it will give you a nice link to the notebook with all it’s content:</p>
<p><a href="http://nbviewer.ipython.org/urls/raw.github.com/EnricoGiampieri/blogger_notebook/master/basemap.ipynb">http://nbviewer.ipython.org/urls/raw.github.com/EnricoGiampieri/blogger_notebook/master/basemap.ipynb</a></p>
<p>Obviously this is barely scratching the surface of the (super)power of git, but there are tons of manuals online that explain it better than what I could ever do. This was just a step by step guide to how to setup this “delayed blogging” method.</p>
Sympy for statistics2013-01-12T16:55:00-08:00http://EnricoGiampieri.github.com/introduction/2013/01/12/sympy-for-statistics<p>One of the python module to which I have the most controversial feelings is without any doubt sympy.</p>
<p>Sympy is a great piece of software that can deal with a huge amount of problem in a quite elegant way, and I would really like to use it more in my work. The main drawback was a very poor support for statistic, and making all those integral by hand felt a little odd.</p>
<p>It was with a lot of happiness that I read about the development of a new module for statistics in sympy, called sympy.stats, that promised to address all (or at least most) of the needs that someone can have working out statistical problems.</p>
<p>The foundation for this module has been put into place by <a href="http://matthewrocklin.com/">Matthew Rocklin</a> in the summer 2012. He made a good job, and the module has been indeed extended to support a great amount of probability distribution, both continuous and finite. There is yet no support for infinite discreet space like the natural numbers, and this means that few very important distribution like the Poisson or the Negative Binomial are still left out, but the overall feeling is very good.</p>
<p>The library is based on the idea of Random variable, defining a probability measure over a certain domain. For example, a normal variable is defined over the whole real axis and implements the gaussian probability density.</p>
<p>A selected amount of operation can be done over these random variables, notably obtaining the density estimation, the probability of an event or the expectation value.</p>
<p>But let the code speak.
Let’s import sympy and sympy.stats, and create a Normal variable with a fixed variance and mean represented by a sympy real variable. Remember that any time we specify a new sympy symbol we have to declare a name for that symbol. in this case our normal distribution will be called simply <code>X</code>.</p>
<div class="highlight"><pre><code class="python"><span class="kn">import</span> <span class="nn">sympy</span>
<span class="kn">import</span> <span class="nn">sympy.stats</span> <span class="kn">as</span> <span class="nn">stats</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">Symbol</span><span class="p">(</span><span class="s">'mu'</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">'X'</span><span class="p">,</span><span class="n">mu</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div>
<p>We can now ask the expected mean and standard deviation of out random variable:</p>
<div class="highlight"><pre><code class="python"><span class="k">print</span> <span class="n">sympy</span><span class="o">.</span><span class="n">simplify</span><span class="p">(</span><span class="n">stats</span><span class="o">.</span><span class="n">E</span><span class="p">(</span><span class="n">X</span><span class="p">))</span>
<span class="k">print</span> <span class="n">sympy</span><span class="o">.</span><span class="n">simplify</span><span class="p">(</span><span class="n">stats</span><span class="o">.</span><span class="n">variance</span><span class="p">(</span><span class="n">X</span><span class="p">))</span>
</code></pre></div>
<p>that return, as expected, mu and 1.
We can also create new random expression based on the original one.
We know for example that a chi squared variable is the sum of N normal, so we can obtain the mean and variance of a 2-degree of freedom Chi distribution simply by summing up the squares of two normal distribution:</p>
<div class="highlight"><pre><code class="python"><span class="n">X</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">'X'</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">'Y'</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Chi</span> <span class="o">=</span> <span class="n">X</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">Y</span><span class="o">**</span><span class="mi">2</span>
<span class="n">stats</span><span class="o">.</span><span class="n">E</span><span class="p">(</span><span class="n">Chi</span><span class="p">)</span>
<span class="c"># 2</span>
<span class="n">stats</span><span class="o">.</span><span class="n">variance</span><span class="p">(</span><span class="n">Chi</span><span class="p">)</span>
<span class="c"># 4</span>
</code></pre></div>
<p>that are exactly the values we were expecting (see <a href="http://en.wikipedia.org/wiki/Chi-squared_distribution">Chi Squared distribution</a>)
We can sample our expression with sample or sample_iter, and we can look at the resulting distribution:</p>
<div class="highlight"><pre><code class="python"><span class="n">samples</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">stats</span><span class="o">.</span><span class="n">sample_iter</span><span class="p">(</span><span class="n">Chi</span><span class="p">,</span> <span class="n">numsamples</span><span class="o">=</span><span class="mf">1e4</span><span class="p">))</span>
</code></pre></div>
<p>we can plot the histogram with pylab as simple as:</p>
<div class="highlight"><pre><code class="python"><span class="n">pylab</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">pylab</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img src="http://enricogiampieri.github.io//assets/chi_squared_hist.png" alt="histogram_of_values" /></p>
<p>We can also evaluate the conditioned probability of events, but on continuous function this lead to some heavy integrals, so I will demonstrate it using the more simpler Die class, that represents the launch of a fair n-sided die.</p>
<div class="highlight"><pre><code class="python"><span class="n">X</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Die</span><span class="p">(</span><span class="s">'X'</span><span class="p">,</span><span class="mi">6</span><span class="p">)</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Die</span><span class="p">(</span><span class="s">'Y'</span><span class="p">,</span><span class="mi">6</span><span class="p">)</span>
<span class="n">W</span> <span class="o">=</span> <span class="n">stats</span><span class="o">.</span><span class="n">Die</span><span class="p">(</span><span class="s">'W'</span><span class="p">,</span><span class="mi">6</span><span class="p">)</span>
<span class="n">Z</span> <span class="o">=</span> <span class="n">X</span> <span class="o">+</span> <span class="n">Y</span> <span class="o">+</span> <span class="n">W</span>
</code></pre></div>
<p>We can ask what is the probability that a realization of X is grater than 4:</p>
<div class="highlight"><pre><code class="python"><span class="n">stats</span><span class="o">.</span><span class="n">P</span><span class="p">(</span><span class="n">X</span><span class="o">></span><span class="mi">4</span><span class="p">)</span>
<span class="c"># 1/3</span>
</code></pre></div>
<p>or that it equals a certain value, say 3 (the ugly syntax cannot be avoided due to how the equality test is evaluated):</p>
<div class="highlight"><pre><code class="python"><span class="n">stats</span><span class="o">.</span><span class="n">P</span><span class="p">(</span><span class="n">sympy</span><span class="o">.</span><span class="n">Eq</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="mi">3</span><span class="p">))</span>
<span class="c"># 1/6 </span>
</code></pre></div>
<p>We can also ask what is the probability that the three dice Z will roll more than 10 given that the first die rolled a 4:</p>
<div class="highlight"><pre><code class="python"><span class="n">stats</span><span class="o">.</span><span class="n">P</span><span class="p">(</span><span class="n">Z</span><span class="o">></span><span class="mi">10</span><span class="p">,</span> <span class="n">sympy</span><span class="o">.</span><span class="n">Eq</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
</code></pre></div>
<p>So, summing up, the stats module of sympy is really promising and I hope that a lot of work will be done on it to make it even better. If I will understand the sympy development process and the module class hierarchy, I will surely try to make a contribution.
Given these praises, for my needs it still lacks several fundamental features:</p>
<ul>
<li>support for non-limited discreet spaces</li>
<li>better support for mixtures of distribution (right now I still get only error complaining about the invertibility of the CDF)</li>
<li>better fall-back to numerical evaluation, as a lot of distribution are described by integrals and special functions and, even if the integration routine of sympy is pretty solid, not everything can be solved analytically</li>
</ul>
<p>My best whishes to the sympy team, thank you for your great job! </p>
Moving to python 3k for numerical computation2012-12-10T09:33:00-08:00http://EnricoGiampieri.github.com/introduction/2012/12/10/moving-to-python3-for-numerical-computation<p>In the last few years of working with python, I’ve always suffered from being kept back to the 2.x version of python by the need of the scientific libraries. The good news is that in the last year most of them made the great step and shipped a 3.x ready version (or, to be honest, a 3.x convertible 2.x version).</p>
<p>So right now I’m having fun trying to install everything on my laptop, an Ubuntu 12.10.</p>
<p>The first step is to install python 3.2 and the appropriate version of the pip packaging system:</p>
<pre><code>sudo apt-get install python3 python3-pip
</code></pre>
<p>then we can just plug the normal installation process using the pip-3.2</p>
<pre><code>sudo pip-3.2 install -U numpy
sudo pip-3.2 install -U scipy
sudo pip-3.2 install -U matplotlib
sudo pip-3.2 install -U sympy
sudo pip-3.2 install -U pandas
sudo pip-3.2 install -U ipython
sudo pip-3.2 install -U nose
sudo pip-3.2 install -U networkx
sudo pip-3.2 install -U statsmodels
sudo pip-3.2 install -U cython
</code></pre>
<p>Sadly mayavi, scikit-learn, numexpr, biopython and tables are still working on the transition, so they’re not yet available. This leave the numerical side of python quite crippled, but I hope that they will soon reach the others and allow us to use py3k as the rest of the world out there. </p>
write controlled class attribute with descriptors2012-12-02T07:59:00-08:00http://EnricoGiampieri.github.com/introduction/2012/12/02/write-controlled-class-attribute-with-descriptors<p>Few days ago I had the occasion to play around with the descriptor syntax of python. The normal question of “What is a descriptor” is always replied with a huge wall of text, but in reality are quite a simple concept: they are a generalization of the properties.</p>
<p>For those not familiar with the concept of properties, they are a trick to call function with the same syntax of an attribute. if <code>prop</code> is a property, you can write this assignment as a normal attribute:</p>
<div class="highlight"><pre><code class="python"><span class="n">A</span><span class="o">.</span><span class="n">prop</span> <span class="o">=</span> <span class="n">value</span>
</code></pre></div>
<p>But the property will allow you to perfom check and similar on the value before the real assignment.
The basic syntax start from a normal get/set syntax (never use them unless you plan to work with a property!), but the you add a new element to the class that put togheter this two function under the name x:</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_x</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_x</span>
<span class="k">def</span> <span class="nf">set_x</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="k">print</span> <span class="s">"set the value of x to"</span><span class="p">,</span><span class="n">value</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hidden_x</span> <span class="o">=</span> <span class="n">value</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="n">get_x</span><span class="p">,</span><span class="n">set_x</span><span class="p">)</span>
</code></pre></div>
<p>You can now use it as a normal class attribute, but when you assign a value to it, it will react with the setter function.</p>
<div class="highlight"><pre><code class="python"><span class="n">a</span> <span class="o">=</span> <span class="n">A</span><span class="p">()</span>
<span class="n">a</span><span class="o">.</span><span class="n">x</span> <span class="o">=</span> <span class="mi">5</span>
<span class="c">#set the value of x to 5</span>
<span class="k">print</span> <span class="n">a</span><span class="o">.</span><span class="n">x</span>
</code></pre></div>
<p>This open a new world of possible interaction between the class and the user, with a very simple syntax. The only limit is that any extra information ha to be stored in the class itself, while sometimes can be useful to keep it separated. It can also become very verbose, which is something that is frown upon when programming in python (Python is not Java, remember).</p>
<p>If we have to create several attribute which behave in a similar way, repeting the same code for the property can be quite an hassle. that’s where the descriptor start to became precious (yes, they can do a lot more, but I don’t have great requirements).</p>
<p>The descriptor is a class which implements the method <code>__get__</code> and, optionally, the method <code>__set__</code> and <code>__delete__</code>. These are the methods that will be called when you try to use the attribute created with these properties.</p>
<p>Let’s see a basic implementation of a constant attribute, i.e. and attribute that is fixed in the class and cannot be modified. To to this we need to implement the <code>__get__</code> method to return the value, and the <code>__set__</code> method to raise an error if one try to modify it. To avoid possible modification, the actual value is stored inside the Descriptor itself (via the self reference). To interact with the object that possess the Descriptor we can use the instance reference</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">ConstantAttr</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">value</span><span class="p">,</span><span class="n">name</span><span class="o">=</span><span class="s">""</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="o">=</span><span class="n">value</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="o">=</span><span class="n">name</span>
<span class="k">def</span> <span class="nf">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="nb">type</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span>
<span class="k">def</span> <span class="nf">__set__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">instance</span><span class="p">,</span><span class="n">value</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">AttributeError</span><span class="p">(</span><span class="s">'the attribute {} cannot be written'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">))</span>
</code></pre></div>
<p>We can now create a class that use this descriptor. We pass the name of the attribute to the <code>__init__</code> otherwise the Descriptor would have no information on which name the class has registered it under.</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">ConstantAttr</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="s">'c'</span><span class="p">)</span>
</code></pre></div>
<p>Using an instance of the class we can see that the value if printed correctly at 10, but if we try to modify it, we obtain an exception.</p>
<div class="highlight"><pre><code class="python"><span class="n">a</span> <span class="o">=</span> <span class="n">A</span><span class="p">()</span>
<span class="k">print</span> <span class="n">a</span><span class="o">.</span><span class="n">c</span>
<span class="c">#10</span>
<span class="n">a</span><span class="o">.</span><span class="n">c</span> <span class="o">=</span> <span class="mi">5</span>
<span class="c">#raise AttributeError: the attribute c cannot be written</span>
</code></pre></div>
<p>Now we can create as many constant attributes as we need with almost no code duplication at all! That’s a good start.</p>
<p>The reason I started playing around with the descriptor was a little more complicated. I needed a set of attributes to have a validity test of the inserted value, raising error if the test wasn’t correct. You can performa this with properties, but you can’t use raise statement in a lambda, forcing you to write a lot of different setters, polluting the class source code and <code>__dict__</code> with a lot of function. To remove the pollution from the dict you can always delete the function you used to create the property</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_x</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_x</span>
<span class="k">def</span> <span class="nf">set_x</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="c">#do the check</span>
<span class="bp">self</span><span class="o">.</span><span class="n">hidden_x</span> <span class="o">=</span> <span class="n">value</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="n">get_x</span><span class="p">,</span><span class="n">set_x</span><span class="p">)</span>
<span class="k">del</span> <span class="n">get_x</span><span class="p">,</span><span class="n">set_x</span>
</code></pre></div>
<p>This could work, but you still have 7 or more lines to define something that is no more than a lambda with a message error attached.</p>
<p>So, here come the Descriptor. To keep the pollution to the minimum, I store all the protected values in a intern dictionary called props.
What this code does is to take a test function for testing if the given value is acceptable, then set it if it’s correct or raise the given error if it’s not.</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">CheckAttr</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""create a class attribute which only check and transform an attribute on setting its value"""</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">default</span><span class="p">,</span> <span class="n">test</span><span class="o">=</span><span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span> <span class="n">error</span><span class="o">=</span><span class="s">'test failed'</span><span class="p">,</span> <span class="n">converter</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="n">v</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">test</span> <span class="o">=</span> <span class="n">test</span>
<span class="bp">self</span><span class="o">.</span><span class="n">error</span> <span class="o">=</span> <span class="n">error</span>
<span class="bp">self</span><span class="o">.</span><span class="n">default</span> <span class="o">=</span> <span class="n">default</span>
<span class="bp">self</span><span class="o">.</span><span class="n">conv</span> <span class="o">=</span> <span class="n">converter</span>
<span class="k">def</span> <span class="nf">checkprops</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">instance</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">instance</span><span class="o">.</span><span class="n">props</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="n">instance</span><span class="o">.</span><span class="n">props</span><span class="o">=</span><span class="p">{}</span>
<span class="k">def</span> <span class="nf">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="nb">type</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkprops</span><span class="p">(</span><span class="n">instance</span><span class="p">)</span>
<span class="k">return</span> <span class="n">instance</span><span class="o">.</span><span class="n">props</span><span class="o">.</span><span class="n">setdefault</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span><span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__set__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="n">val</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">conv</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">checkprops</span><span class="p">(</span><span class="n">instance</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">test</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">val</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">)</span>
<span class="n">instance</span><span class="o">.</span><span class="n">props</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">]</span><span class="o">=</span><span class="n">val</span>
</code></pre></div>
<p>Now it’s time to test it on a simple real case. We want to describe a rectangle, so we have two dimensions, height and width, and we need an attribute to return the area of the rectangle itself.</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">Rect</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CheckAttr</span><span class="p">(</span><span class="s">'h'</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="n">v</span><span class="o">>=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'height must be greater than 0'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="nb">float</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">CheckAttr</span><span class="p">(</span><span class="s">'w'</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="n">v</span><span class="o">>=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'width must be greater than 0'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="nb">float</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">area</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">h</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">w</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">h</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span><span class="n">w</span><span class="o">=</span><span class="mf">0.0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">w</span> <span class="o">=</span> <span class="n">w</span>
<span class="bp">self</span><span class="o">.</span><span class="n">h</span> <span class="o">=</span> <span class="n">h</span>
</code></pre></div>
<p>Annnnnd…That’s it. With this Descriptor code we imposed the condition that bot the width and height should be greater than zero and obtained an attribute area which return the value without giving the possibility of setting it, in only 7 lines of code. Talk about synthesis!</p>
<p>To end with something more difficult let’s try to describe the Triangle, which condition use also the values of the other side. This is not a 100% safe version and not performance fine-tuned, but I guess is simple enough to be used:</p>
<div class="highlight"><pre><code class="python"><span class="n">infty</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s">'inf'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">sqrt</span>
<span class="k">class</span> <span class="nc">Triangle</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">l1</span> <span class="o">=</span> <span class="n">CheckAttr</span><span class="p">(</span><span class="s">'l1'</span><span class="p">,</span> <span class="n">infty</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="n">i</span><span class="o">.</span><span class="n">l2</span><span class="o">+</span><span class="n">i</span><span class="o">.</span><span class="n">l3</span> <span class="o">></span> <span class="n">v</span> <span class="o">>=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'side 1 must be greater than 0 and smaller than the sum of l2 and l3'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="nb">float</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">l2</span> <span class="o">=</span> <span class="n">CheckAttr</span><span class="p">(</span><span class="s">'l2'</span><span class="p">,</span> <span class="n">infty</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="n">i</span><span class="o">.</span><span class="n">l1</span><span class="o">+</span><span class="n">i</span><span class="o">.</span><span class="n">l3</span> <span class="o">></span> <span class="n">v</span> <span class="o">>=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'side 2 must be greater than 0 and smaller than the sum of l1 and l3'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="nb">float</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">l3</span> <span class="o">=</span> <span class="n">CheckAttr</span><span class="p">(</span><span class="s">'l3'</span><span class="p">,</span> <span class="n">infty</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span><span class="p">:</span> <span class="n">i</span><span class="o">.</span><span class="n">l2</span><span class="o">+</span><span class="n">i</span><span class="o">.</span><span class="n">l1</span> <span class="o">></span> <span class="n">v</span> <span class="o">>=</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'side 3 must be greater than 0 and smaller than the sum of l2 and l1'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">v</span><span class="p">:</span><span class="nb">float</span><span class="p">(</span><span class="n">v</span><span class="p">))</span>
<span class="n">p</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">l1</span><span class="o">+</span><span class="bp">self</span><span class="o">.</span><span class="n">l2</span><span class="o">+</span><span class="bp">self</span><span class="o">.</span><span class="n">l3</span><span class="p">)</span>
<span class="n">a</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="n">sqrt</span><span class="p">(</span> <span class="bp">self</span><span class="o">.</span><span class="n">p</span><span class="o">/</span><span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">p</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">l1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">p</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">l2</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">p</span><span class="o">/</span><span class="mi">2</span><span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">l3</span><span class="p">)</span> <span class="p">)</span> <span class="p">)</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">l1</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span><span class="n">l2</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span><span class="n">l3</span><span class="o">=</span><span class="mf">0.0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l1</span> <span class="o">=</span> <span class="n">l1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l2</span> <span class="o">=</span> <span class="n">l2</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l3</span> <span class="o">=</span> <span class="n">l3</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s">"Triangle({t.l1}, {t.l2}, {t.l3})"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">t</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
<span class="n">__repr__</span> <span class="o">=</span> <span class="n">__str__</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">Triangle</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mf">1.5</span><span class="p">)</span>
<span class="k">print</span> <span class="n">t</span>
<span class="c"># Triangle(1.0, 2.0, 1.5)</span>
<span class="k">print</span> <span class="n">t</span><span class="o">.</span><span class="n">p</span>
<span class="c"># 4.5</span>
<span class="k">print</span> <span class="n">t</span><span class="o">.</span><span class="n">a</span>
<span class="c"># 0.726184377414</span>
<span class="n">t</span><span class="o">.</span><span class="n">l3</span><span class="o">=</span><span class="mi">5</span>
<span class="c"># ValueError: side 3 must be greater than 0 and smaller than the sum of l2 and l1</span>
</code></pre></div>
<h3 id="edit"><strong>EDIT:</strong></h3>
<p>I forgot two interesting details for the implementation of the descriptors. The first one address the issue of accessing the descriptor from the class rather than from an instance. I would expect to obtain a reference to the Descriptor instance, but I got the default value. What I should have done was to check if the instance was None (meaning access from the class) and return the descriptor itself:</p>
<div class="highlight"><pre><code class="python"><span class="k">def</span> <span class="nf">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="nb">type</span><span class="p">):</span>
<span class="c">#this allow me to access the descriptor instance</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">instance</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">return</span> <span class="n">instance</span><span class="o">.</span><span class="n">value</span>
</code></pre></div>
<p>The second bit is about the documentation. If I write the documentation of the Descriptor, I lose the opportunity to obtain a documentation for each instance, that is one of the cool feature of the property object. This can be done in a simple way…using a property ;)</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">DocDescriptor</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">doc</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">doc</span><span class="o">=</span><span class="n">doc</span>
<span class="n">__doc__</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">doc</span><span class="p">)</span>
<span class="c">#other methods to follow</span>
</code></pre></div>
<p>This allow to write code like this one:</p>
<div class="highlight"><pre><code class="python"><span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">DocDescriptor</span><span class="p">(</span><span class="s">"this is the documentation of x"</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">DocDescriptor</span><span class="p">(</span><span class="s">"and this is for y"</span><span class="p">)</span>
<span class="n">help</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">x</span><span class="p">)</span>
<span class="c"># this is the documentation of x</span>
</code></pre></div>
Creating a colormap in matplotlib2012-11-22T18:36:00-08:00http://EnricoGiampieri.github.com/introduction/2012/11/22/creating-colormap-in-matplotlib<p>Matplotlib, as I said before, is quite an amazing graphics library, and can do some power heavy-lifting in data visualization, as long as you lose some time to understand how it works. Usually it’s quite intuitive, but one field where it is capable of giving huge headhace is the generation of personalized colormaps.</p>
<p>This page (<a href="http://matplotlib.org/examples/api/colorbar_only.html">http://matplotlib.org/examples/api/colorbar_only.html</a>) of the matplotlib manual give some direction, but it’s not really useful. What we usually want is to create a new, smooth colormap with our colors of choice.
To do that the only solution is the <code>matplotlib.colors.LinearSegmentedColormap</code> class…which is quite a pain to use. Actually there is a very useful function that avoid this pain, but I will tell the secret after we see the basic behavior.</p>
<p>The main idea of the <code>LinearSegmentedColormap</code> is that for each color (red, green and blue) we divide the colormap in intervals and explain to the colormap two colors to interpolate in between. This is the code to create the simplest colormap, a grayscale:</p>
<div class="highlight"><pre><code class="python"><span class="n">mycm</span> <span class="o">=</span> <span class="n">mpl</span><span class="o">.</span><span class="n">colors</span><span class="o">.</span><span class="n">LinearSegmentedColormap</span><span class="p">(</span><span class="s">'mycm'</span><span class="p">,</span>
<span class="p">{</span><span class="s">'red'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">)),</span>
<span class="s">'green'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">)),</span>
<span class="s">'blue'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">)),</span>
<span class="p">},</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>First of all there is the name of the colormap, the last is the number of point of the interpolation and the middle section is the painful one.
The colormap is described for each color by a sequence of three numbers: the first one is the position in the colormap, and can go from 0 to 1, monotolically. The second and the third numbers represents the value of the color before and after the selected position.
This basic example is composed of two point for each color, 0 and 1, and it say that at those position the color is absent (0) or present (1)</p>
<p>To understand better, we can use a colormap that go from red 0 to 0.25 in the first half, then just after the half switch to 0.75 and go to 1 as the colormap go to 1</p>
<div class="highlight"><pre><code class="python"><span class="kn">import</span> <span class="nn">matplotlib</span> <span class="kn">as</span> <span class="nn">mpl</span>
<span class="n">lscm</span> <span class="o">=</span> <span class="n">mpl</span><span class="o">.</span><span class="n">colors</span><span class="o">.</span><span class="n">LinearSegmentedColormap</span>
<span class="n">mycm</span> <span class="o">=</span> <span class="n">lscm</span><span class="p">(</span><span class="s">'mygray'</span><span class="p">,</span>
<span class="p">{</span><span class="s">'red'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.75</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">)),</span>
<span class="s">'green'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">)),</span>
<span class="s">'blue'</span><span class="p">:((</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">),</span> <span class="p">(</span><span class="mf">1.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">,</span> <span class="mf">0.</span><span class="p">)),</span>
<span class="p">},</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>Ok, this is really powerful, but is clearly an overshot in most cases! The matplotlib developers realized this, but for some reason didnt create a whole new class clearly in the module, deciding to create a method of the LinearSegmentedColormap instead, called <code>from_list</code>.
This is the magic cure that we need: to make a simple colormap that goes from red to black to blue, we just need this.</p>
<div class="highlight"><pre><code class="python"><span class="n">mycm</span> <span class="o">=</span> <span class="n">lscm</span><span class="o">.</span><span class="n">from_list</span><span class="p">(</span><span class="s">'mycm'</span><span class="p">,[</span><span class="s">'r'</span><span class="p">,</span><span class="s">'k'</span><span class="p">,</span><span class="s">'b'</span><span class="p">])</span>
</code></pre></div>
<p>of course you can mix named colors with tuple of rgb, at your hearth content!</p>
<div class="highlight"><pre><code class="python"><span class="n">mycm</span> <span class="o">=</span> <span class="n">lscm</span><span class="o">.</span><span class="n">from_list</span><span class="p">(</span><span class="s">'mycm'</span><span class="p">,[</span><span class="s">'pink'</span><span class="p">,</span><span class="s">'k'</span><span class="p">,(</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.95</span><span class="p">)])</span>
</code></pre></div>
<p>Ok, now we have our wonderful colormap…but if we have some nan value in our data, everything is going bad, and value is represented in white, out of our control. Don’t worry, as what we need is just to set the color to use for the nan values (actually, for the masked ones) with the function set_bad. in this case we put it to green:</p>
<div class="highlight"><pre><code class="python"><span class="c">#the colormap</span>
<span class="n">mycm</span> <span class="o">=</span> <span class="n">mpl</span><span class="o">.</span><span class="n">colors</span><span class="o">.</span><span class="n">LinearSegmentedColormap</span><span class="o">.</span><span class="n">from_list</span><span class="p">(</span><span class="s">'mycm'</span><span class="p">,[</span><span class="s">'r'</span><span class="p">,</span><span class="s">'k'</span><span class="p">,</span><span class="s">'b'</span><span class="p">])</span>
<span class="n">mycm</span><span class="o">.</span><span class="n">set_bad</span><span class="p">(</span><span class="s">'g'</span><span class="p">)</span>
<span class="c">#the corrupted data</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span>
<span class="n">a</span><span class="p">[</span><span class="mi">5</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span>
<span class="c">#the image with a nice green spot</span>
<span class="n">matshow</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">cmap</span> <span class="o">=</span> <span class="n">mycm</span><span class="p">)</span>
</code></pre></div>
<p>Note: use matshow when you think that nan values can be present, as pcolor doesn’t get along well with them and imshow keep the white color. </p>
natural sorting (with a hint of regular expressions)2012-11-04T16:35:17-08:00http://EnricoGiampieri.github.com/introduction/2012/11/04/natural-sorting<h3 id="sorting-python-regularexpression">sorting python regularexpression</h3>
<p>When we talk about sorting of strings in informatics we usually mean the lexicographic ordering, i.e. the same ordering that we have in dictionary (a paper one, not the python one). This is formally correct, but have a notorious drawback when we have to present those string to a human.</p>
<p>if we have the following list:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">strings</span> <span class="o">=</span> <span class="p">[</span> <span class="s">'a1'</span><span class="p">,</span> <span class="s">'a2'</span><span class="p">,</span> <span class="s">'a10'</span> <span class="p">]</span>
</code></pre></div>
<p>and we sort it, we encounter an unexpected problem:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="nb">sorted</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="p">[</span><span class="s">'a1'</span><span class="p">,</span> <span class="s">'a10'</span><span class="p">,</span> <span class="s">'a2'</span><span class="p">]</span>
</code></pre></div>
<p>What is happening is that the string <code>'a10'</code> is lexicographically before the string <code>'a2'</code>.
This is very counterintuitive for our users, and in the long run can sometimes give a little headache even to us.</p>
<p>So, what if we want to sort our objects in a lexicographic order? The basic idea is that we want to order the string dividing the proper string part from the numeric part.
if we know how our strings are composed, as in the preceding example, we can simply tamper with the sorted key parameter. This parameter allow us to use a derivated object to order our list instead of the original one. in our case what we need is a tuple with a string part and a numeric part:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">splitter</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="p">(</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="nb">int</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
<span class="o">>>></span> <span class="nb">sorted</span><span class="p">(</span><span class="n">strings</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">splitter</span><span class="p">)</span>
<span class="p">[</span><span class="s">'a1'</span><span class="p">,</span> <span class="s">'a2'</span><span class="p">,</span> <span class="s">'a10'</span><span class="p">]</span>
</code></pre></div>
<p>Ok, this works, but is far from general. the basic idea is good, but we need a way to split a string into his numerical parts, no matter where and how many of them there are!
One method is to use the itertools module (yes, my favourite standard library module), the groupby function, to be exact.
This function run over an iterable and group it’s elements based on a lambda given by the user. In our case we need the isdigit function of the string to identify which pieces are numbers and which aren’t. The solution is a simple one-liner</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">groupby</span>
<span class="o">>>></span> <span class="n">string</span> <span class="o">=</span> <span class="s">'aaa111aaa111aaa111aaa111'</span>
<span class="o">>>></span> <span class="p">[</span> <span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="s">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="n">string</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">s</span><span class="o">.</span><span class="n">isdigit</span><span class="p">())]</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">)]</span>
</code></pre></div>
<p>Where the first value of each tuple is the results of the splitting and the second is the matched text. This is already a solution to our problem, but is rough around the edges. To cite one, it read wrongly the dot inside a floating point number, and it’s not easy to insert any knowledge of the structure of our string.</p>
<p>To solve the first problem we can fuse together the triplets number-dot-number, while the other is quite hard to implement.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">string</span> <span class="o">=</span> <span class="s">'aaa111aaa1.11aaa111aaa111'</span>
<span class="o">>>></span> <span class="n">res</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="s">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="n">string</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">s</span><span class="o">.</span><span class="n">isdigit</span><span class="p">())]</span>
<span class="o">>>></span> <span class="n">res2</span> <span class="o">=</span> <span class="p">[]</span>
<span class="o">>>></span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="o">>>></span> <span class="k">while</span> <span class="n">idx</span><span class="o"><</span><span class="nb">len</span><span class="p">(</span><span class="n">res</span><span class="p">):</span><span class="o">>>></span> <span class="k">if</span> <span class="n">idx</span><span class="o"><</span><span class="nb">len</span><span class="p">(</span><span class="n">res</span><span class="p">)</span><span class="o">-</span><span class="mi">2</span><span class="p">:</span><span class="o">>>></span> <span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">,</span><span class="n">k</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="n">idx</span><span class="p">],</span><span class="n">res</span><span class="p">[</span><span class="n">idx</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span><span class="n">res</span><span class="p">[</span><span class="n">idx</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span><span class="o">>>></span> <span class="k">else</span><span class="p">:</span> <span class="o">>>></span> <span class="n">i</span><span class="o">=</span><span class="bp">None</span><span class="o">>>></span> <span class="k">if</span> <span class="n">i</span> <span class="ow">and</span> <span class="n">i</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">j</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="n">j</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">==</span><span class="s">'.'</span><span class="p">:</span><span class="o">>>></span> <span class="n">res2</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="bp">True</span><span class="p">,</span><span class="s">""</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">i</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="n">j</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="n">k</span><span class="p">[</span><span class="mi">1</span><span class="p">]])))</span><span class="o">>>></span> <span class="n">idx</span><span class="o">+=</span><span class="mi">3</span><span class="o">>>></span> <span class="k">else</span><span class="p">:</span><span class="o">>>></span> <span class="n">res2</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="n">idx</span><span class="p">])</span><span class="o">>>></span> <span class="n">idx</span><span class="o">+=</span><span class="mi">1</span><span class="o">>>></span> <span class="n">res2</span>
<span class="p">[(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'1.11'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">),</span>
<span class="p">(</span><span class="bp">True</span><span class="p">,</span> <span class="s">'111'</span><span class="p">)]</span>
</code></pre></div>
<p>Ok, this works, but is ugly as hell. We need to find a better way. To do this, we need to borrow the power of the regular expressions. The regular expressions (or regex, for short) are a standard way to analyze a string to obtain pieces of it, using a road tested state machine.</p>
<p>To use the regex we need to import the re module, using the findall function to search a string for the given pattern. The pattern is described with another string with a special syntax, but we will come to that later.</p>
<p>Let’s see some basic usage of the re module. We need to feed the findall function with a pattern string, in this case the word dog, to search into the given string. The r before the pattern is to indicate that it is a regex string, and will simplify how to write the patters</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">import</span> <span class="nn">re</span>
<span class="o">>>></span> <span class="n">string</span> <span class="o">=</span> <span class="s">"i have two dogs, the first one is called fido, while the second dog is rex"</span>
<span class="o">>>></span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'dog'</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
<span class="p">[</span><span class="s">'dog'</span><span class="p">,</span> <span class="s">'dog'</span><span class="p">]</span>
</code></pre></div>
<p>So, the re module reply to us that it has found two occurences of the word dog. Note that the resulting list only contains the exact match: so even if the first word was plural (dogs), the matched string is just the <code>'dog'</code> component.</p>
<p>If one of the words starts with a capital letter, the search will find only one of them. If we want to find both the cases we can use the square brackets to indicate that the strings inside are equivalent. So our new code look like this</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">string</span> <span class="o">=</span> <span class="s">"i have two Dogs, the first one is called fido, while the second dog is rex"</span>
<span class="o">>>></span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[Dd]og'</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
<span class="p">[</span><span class="s">'Dog'</span><span class="p">,</span> <span class="s">'dog'</span><span class="p">]</span>
</code></pre></div>
<p>Ok, next step, we want to include the s of the plural if found. To obtain this, we have to say that the last s is optional: if is present, include it, but don’t worry if it’s missing. This is done with a question mark following the subject of interest, the letter s.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[Dd]ogs?'</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
<span class="p">[</span><span class="s">'Dogs'</span><span class="p">,</span> <span class="s">'dog'</span><span class="p">]</span>
</code></pre></div>
<p>Ok, for now I will stop, you can find a huge amount of material online that explain how to use them. Prepare to suffer a little bit, understanding the regex has quite a learning curve.
The pattern to separate the any number of string block from number is the following:</p>
<div class="highlight"><pre><code class="python"><span class="s">r'[0-9]+|[^0-9]+'</span>
</code></pre></div>
<p>It say that you can alternatively (the <code>|</code> operator) match one or more (the <code>+</code> operator) groups of digits (<code>[0-9]</code>) or something that is not a digit (<code>[^0-9]</code>).</p>
<p>Let’s put it to the test:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">test</span> <span class="o">=</span> <span class="p">[</span> <span class="s">'aaa123bbb.tex'</span><span class="p">,</span> <span class="s">'123aaa345.txt'</span> <span class="p">]</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">string</span> <span class="ow">in</span> <span class="n">test</span><span class="p">:</span>
<span class="o">>>></span> <span class="n">res</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[0-9]+|[^0-9]+'</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">string</span><span class="p">,</span><span class="n">res</span>
<span class="n">aaa123bbb</span><span class="o">.</span><span class="n">tex</span> <span class="p">[</span><span class="s">'aaa'</span><span class="p">,</span> <span class="s">'123'</span><span class="p">,</span> <span class="s">'bbb.tex'</span><span class="p">]</span>
<span class="mi">123</span><span class="n">aaa345</span><span class="o">.</span><span class="n">txt</span> <span class="p">[</span><span class="s">'123'</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">,</span> <span class="s">'345'</span><span class="p">,</span> <span class="s">'.txt'</span><span class="p">]</span>
</code></pre></div>
<p>It’s not perfect around the edges, but with a little work it can be perfect. What we can do is to specify that a dot that interrupt a number is part of that number, while one that is not between numbers should be on it’s own</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">test</span> <span class="o">=</span> <span class="p">[</span> <span class="s">'aaa123bbb.tex'</span><span class="p">,</span> <span class="s">'123aaa345.txt'</span><span class="p">,</span> <span class="s">"aaa3.14bbb.jpg"</span> <span class="p">]</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">string</span> <span class="ow">in</span> <span class="n">test</span><span class="p">:</span>
<span class="o">>>></span> <span class="n">res</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[0-9]+\.?[0-9]+]?|[^.0-9]+|.'</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">print</span> <span class="n">string</span><span class="p">,</span><span class="n">res</span>
<span class="n">aaa123bbb</span><span class="o">.</span><span class="n">tex</span> <span class="p">[</span><span class="s">'aaa'</span><span class="p">,</span> <span class="s">'123'</span><span class="p">,</span> <span class="s">'bbb'</span><span class="p">,</span> <span class="s">'.'</span><span class="p">,</span> <span class="s">'tex'</span><span class="p">]</span>
<span class="mi">123</span><span class="n">aaa345</span><span class="o">.</span><span class="n">txt</span> <span class="p">[</span><span class="s">'123'</span><span class="p">,</span> <span class="s">'aaa'</span><span class="p">,</span> <span class="s">'345'</span><span class="p">,</span> <span class="s">'.'</span><span class="p">,</span> <span class="s">'txt'</span><span class="p">]</span>
<span class="n">aaa3</span><span class="o">.</span><span class="mi">14</span><span class="n">bbb</span><span class="o">.</span><span class="n">jpg</span> <span class="p">[</span><span class="s">'aaa'</span><span class="p">,</span> <span class="s">'3.14'</span><span class="p">,</span> <span class="s">'bbb'</span><span class="p">,</span> <span class="s">'.'</span><span class="p">,</span> <span class="s">'jpg'</span><span class="p">]</span>
</code></pre></div>
<p>Dig deeper in the regex module… a lot of power is in it! </p>
Using an IPython Notebook as a module2012-10-26T11:17:00-07:00http://EnricoGiampieri.github.com/introduction/2012/10/26/using-ipython-notebook-as-module<h3 id="ipython-python-module">ipython python module</h3>
<p>First of all, if you never worked with the ipython notebook, put this post on pause, go to the <a href="http://www.ipython.org">ipython home</a>, install everything you find, play with it and fall in love.
I’ll be waiting, don’t worry.</p>
<p>Ok, so you love the ipython notebook, work everyday with it, and cry everytime you have to go back to the usual shell. The only problem is writing a module in it is very uncomfortable: each time you made a modification, you have to save it as a python script, and if you work over a network, you have to send it to the same directory, fiddle with the permissions and so on.</p>
<p>The secret is that you can start the ipython notebook server with the option <code>--script</code>. This option tell the server to save a copy of the notebook in the script version (the .py file, to be clear) any time you save the notebook.My tipical line to execute the ipython notebook became:</p>
<pre><code>ipython notebook ./my_notebook_folder --pylab=inline --no-browser --script
</code></pre>
<p>So, every time you edit a notebook, you will have the corresponding module to import from other notebook, closing the circle. Using a notebook to write your own module has a lot of advantage, in my opinion, first of all the possibility to explain with a lot of well formatted text how your library works, accompaining it with link to pages ( use <code><page to be linked></code> ) or multimedia objects ( use <code>[text](link)</code>) and even latex formulas, really a killer feature when you have to explain scientific code.</p>
<p>The only limitation is that, as far as I know, the script must be traslate into pure python, so no ipython magic or cell magic. It’s not that bad, but would have been a real game changer (anybody thinking “seamless cython integration”?).</p>
<p>So you can declare your classes and functions as usual, and put the test code in a block for execution in main:</p>
<div class="highlight"><pre><code class="python"><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span>
<span class="n">test</span> <span class="n">code</span> <span class="ow">in</span> <span class="n">here</span>
</code></pre></div>
<p>as you would do with a normal script. It will be executed normally when you run the notebook (as it has the <code>__name__</code> set to <code>__main__</code> by default), but will be skipped in the import phase.
Remember to use the <code>__all__</code> parameter to avoid useless name import on</p>
<div class="highlight"><pre><code class="python"><span class="kn">from</span> <span class="nn">mylibrary</span> <span class="kn">import</span> <span class="o">*</span>
</code></pre></div>
<p>…no wait, you should never do that. Forget the <code>__all__</code>.</p>
<p>Last thing, remember to insert documentation for your code! in a matter of few days you will not remember what each parameter does, so write it down.
If you put a string on the first cell of the notebook, it will be seen as the standard documentation of the whole module.</p>
<p>Last of all, one of the way I prefer to implement documentation and testing in one shot is the use of the DocTest module, which scan all the documentation present in the module, search for documentation lines that looks like shell code, execute it and confront it with the result that you have put in the documentation.
If they are not equal it will complain that a test is failed.</p>
<p>This is obviously not the way to document production code, you should use something like unittest, but is very practical and, honestly, i found that it compels you to write both documentation with examples and useful testing at the same time, both very tedious activity and any help to do them is welcome.</p>
<p>The problem is that doctesting a notebook will fail in a spectacular way, due to its dynamic nature. To help with that, I wrote a function that analyze an object (a class, an instance, a function, a module…whatever) and test each docstring it find in it, reporting a dictionary of the results for each method.</p>
<p>Here it is:</p>
<div class="highlight"><pre><code class="python"><span class="k">def</span> <span class="nf">test</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span><span class="n">verbose</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">globs</span> <span class="o">=</span> <span class="nb">globals</span><span class="p">()):</span>
<span class="sd">"""</span>
<span class="sd"> test the docstring of an object, a function or a module</span>
<span class="sd"> if verbose is set to True, it will report the result</span>
<span class="sd"> of each test, otherwise it will report only </span>
<span class="sd"> the failed ones</span>
<span class="sd"> """</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">doctest</span><span class="o">.</span><span class="n">DocTestFinder</span><span class="p">()</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">globs</span><span class="o">=</span><span class="n">globs</span><span class="p">)</span>
<span class="n">runner</span> <span class="o">=</span> <span class="n">doctest</span><span class="o">.</span><span class="n">DocTestRunner</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">name</span> <span class="o">=</span> <span class="s">''</span>
<span class="k">def</span> <span class="nf">out</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">results</span><span class="p">[</span><span class="n">name</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">test</span><span class="p">:</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">t</span><span class="o">.</span><span class="n">name</span>
<span class="n">results</span><span class="p">[</span><span class="n">name</span><span class="p">]</span><span class="o">=</span><span class="p">[]</span>
<span class="n">runner</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">out</span><span class="o">=</span><span class="n">out</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">verbose</span><span class="p">:</span>
<span class="n">rimuovi</span> <span class="o">=</span> <span class="p">[</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="n">results</span><span class="o">.</span><span class="n">iteritems</span><span class="p">()</span> <span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">]</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">rimuovi</span><span class="p">:</span>
<span class="k">del</span> <span class="n">results</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
<span class="k">return</span> <span class="n">results</span>
</code></pre></div>
<p>How does it work? it’s actually pretty simple, once you understand how the doctest module works. First of all you take the object and parse it with the class DocTestFinder:</p>
<div class="highlight"><pre><code class="python"><span class="n">test</span> <span class="o">=</span> <span class="n">doctest</span><span class="o">.</span><span class="n">DocTestFinder</span><span class="p">()</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">globs</span><span class="o">=</span><span class="n">globs</span><span class="p">)</span>
</code></pre></div>
<p>This class will return a list of Test object to be run. To run these tests you use the class DocTestRunner, then create a dictionary to store the results.</p>
<div class="highlight"><pre><code class="python"><span class="n">runner</span> <span class="o">=</span> <span class="n">doctest</span><span class="o">.</span><span class="n">DocTestRunner</span><span class="p">(</span><span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
</code></pre></div>
<p>Now the runner will take each Test in our list and execute it, than confront it with the expected result. It would normally print it to terminal, but we can override this behavior giving it a parameter out, which is a function that can manipulate the result. In our case it put the result of the test in the dictionary under the name of the tested object.</p>
<div class="highlight"><pre><code class="python"><span class="k">def</span> <span class="nf">out</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">results</span><span class="p">[</span><span class="n">name</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">test</span><span class="p">:</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">t</span><span class="o">.</span><span class="n">name</span>
<span class="n">results</span><span class="p">[</span><span class="n">name</span><span class="p">]</span><span class="o">=</span><span class="p">[]</span>
<span class="n">runner</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">out</span><span class="o">=</span><span class="n">out</span><span class="p">)</span>
</code></pre></div>
<p>In the end, if you select a non verbose result (the default argument) it will scan the resulting dictionary and remove all the test that didn’t return error.
Than the resulting dictionary is returned, and if your documentation is up to date, you will obtain a void dictionary.</p>
<p>To print the results in a friendly version, I use this ancillary function, that simply scan and print keys and values of the dictionary (doing nothing if it is empty):</p>
<div class="highlight"><pre><code class="python"><span class="k">def</span> <span class="nf">verify</span><span class="p">(</span><span class="n">obj</span><span class="p">):</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">test</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">res</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="n">res</span><span class="o">.</span><span class="n">iteritems</span><span class="p">():</span>
<span class="k">print</span> <span class="s">"--------------"</span>
<span class="k">print</span> <span class="n">k</span>
<span class="k">for</span> <span class="n">v1</span> <span class="ow">in</span> <span class="n">v</span><span class="p">:</span>
<span class="k">print</span> <span class="n">v1</span>
</code></pre></div>
<p>Just another short tip: if you need to test a random function, don’t worry. just set the seed to a fixed value at the beginning of the test, and the program will execute the same steps with the same random number each time. In numpy this is done with:</p>
<div class="highlight"><pre><code class="python"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div>
<p>That’s all folks! Have fun, and see you soon!</p>
Splitting a sequence2012-10-20T21:05:17-07:00http://EnricoGiampieri.github.com/introduction/2012/10/20/splitting-a-sequence<h3 id="sequence-python">sequence python</h3>
<p>I would like to start with one common exercise, that a lot of people wrote about, each one proposing its own version. </p>
<p>Let say that we have a sequence of objects, and we want to split this sequence in equally sized chunks of given size. The idea is the following:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">seq</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">12</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">split</span><span class="p">(</span><span class="n">seq</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">]]</span>
</code></pre></div>
<p>It looks simple enough, doesn’t it? There are probably one hundred different solution to this problem. Anyone has its own preferred version. I will try to show the most common, curious or instructive ones. </p>
<p>Coming from languages like C or Java would be spontaneous to write code like this:</p>
<div class="highlight"><pre><code class="python"><span class="n">L</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">seq</span><span class="p">)</span>
<span class="n">size</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">solution</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">L</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="n">size</span><span class="p">:</span>
<span class="n">solution</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">seq</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">solution</span><span class="o">.</span><span class="n">append</span><span class="p">([</span><span class="n">seq</span><span class="p">[</span><span class="n">i</span><span class="p">]])</span>
</code></pre></div>
<p>Ok, this gives me the correct answer, but this isn’t python code. This is C written in python.</p>
<p>a better solution is the following:</p>
<div class="highlight"><pre><code class="python"><span class="p">[</span> <span class="n">seq</span><span class="p">[</span><span class="n">size</span><span class="o">*</span><span class="n">i</span><span class="p">:</span><span class="n">size</span><span class="o">*</span><span class="n">i</span><span class="o">+</span><span class="n">size</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">seq</span><span class="p">)</span><span class="o">//</span><span class="n">size</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div>
<p>It does the same work, but is only one line of code, and, after you are used to the list comprehension magic, it looks a lot more readable. But it still suck, if I should be honest.
It calculate the number of pieces the list will be divided into (len(seq)//size), add one to keep the last fragment, the iterate on the obtained indices. Not so elegant.</p>
<p>A more pythonic code can be written using the list comprehension, and is simply one line of code:</p>
<div class="highlight"><pre><code class="python"><span class="p">[</span> <span class="n">seq</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="n">size</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">seq</span><span class="p">),</span><span class="n">size</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div>
<p>What this piece of code does is to iterate over the indices of the list, starting from 0 and increasing of “size” step at the time. for each step it takes the element of the list from the given index to the following “size” elements</p>
<p>This will do the trick in the exact same way as the initial code, but is way more simple to write and to read.</p>
<p>But this is not the end of the story. I would like to show you a slightly more esoteric way to slit the sequence:</p>
<div class="highlight"><pre><code class="python"><span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="p">([</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span> <span class="o">*</span> <span class="mi">3</span><span class="p">))</span>
</code></pre></div>
<p>This is a trick that use several effect at the same time. First an iterator is generated from the list. than this iterator is placed inside a list, and the list is triplicated. Each element of the list is now a reference to the same iterator, as you can see writing</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="k">print</span> <span class="p">[</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span><span class="o">*</span><span class="mi">3</span>
<span class="p">[</span><span class="o"><</span><span class="n">listiterator</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x3900610</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">listiterator</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x3900610</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">listiterator</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x3900610</span><span class="o">></span><span class="p">]</span>
</code></pre></div>
<p>Where the address of the iterator will change each time you lanch the program. This triplicated access to the iterator means that every time the zip function call the next function on one o them to obtain an element, each one will be increased. The result is a splitted sequence. Due to the behavior of the zip function, it will trim the sequence to the shorter one, so if the sequence is not a multiple of the given size, it will be trimmed.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">seq</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">11</span><span class="p">)</span>
<span class="o">>>></span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="p">([</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span> <span class="o">*</span> <span class="mi">3</span><span class="p">))</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">)]</span>
</code></pre></div>
<p>This can be avoided using the function izip_longest of the package itertools, which extend the shortest sequence with a serie of None. It does return a generator that can be easily converted into a list:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">izip_longest</span> <span class="k">as</span> <span class="n">lzip</span>
<span class="o">>>></span> <span class="nb">list</span><span class="p">(</span><span class="n">lzip</span><span class="p">(</span><span class="o">*</span><span class="p">([</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span> <span class="o">*</span> <span class="mi">3</span><span class="p">)))</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="bp">None</span><span class="p">)]</span>
</code></pre></div>
<p>You can specify a filling value for the izip_longest, if the None could lead to problems:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">izip_longest</span> <span class="k">as</span> <span class="n">lzip</span>
<span class="o">>>></span> <span class="nb">list</span><span class="p">(</span><span class="n">lzip</span><span class="p">(</span><span class="o">*</span><span class="p">([</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span> <span class="o">*</span> <span class="mi">3</span><span class="p">),</span> <span class="n">fillvalue</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)]</span>
</code></pre></div>
<p>A similar effect can be obtained with the function map, which by default extend the sequence to the longest element.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="nb">map</span><span class="p">(</span> <span class="k">lambda</span> <span class="o">*</span><span class="n">s</span><span class="p">:</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span><span class="o">*</span><span class="mi">3</span> <span class="p">)</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="bp">None</span><span class="p">)]</span>
</code></pre></div>
<p>with python 2.7 you can also use None instead of the identity lambda *s:s, but it is always a good habit to think of the compatibility with python 3, where it’s possible.
To obtain the initial result of having a shorter last sequence, one can explicitly trim the None:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="nb">map</span><span class="p">(</span> <span class="k">lambda</span> <span class="o">*</span><span class="n">s</span><span class="p">:</span> <span class="p">[</span> <span class="n">r</span> <span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">s</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">r</span> <span class="ow">is</span> <span class="bp">None</span> <span class="p">],</span> <span class="p">[</span><span class="nb">iter</span><span class="p">(</span><span class="n">seq</span><span class="p">)]</span><span class="o">*</span><span class="mi">3</span> <span class="p">)</span>
<span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">)]</span>
</code></pre></div>
<p>Going back to the itertools module, one can think also to the groupby function. This function split an iterator into chuncks that respect the same condition, and when they change it start another chunk. So if we use an integer division, we can split the array into equally sized parts:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">groupby</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">key_value</span><span class="p">,</span> <span class="n">split_generator</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="n">seq</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">s</span><span class="o">//</span><span class="mi">3</span><span class="p">):</span>
<span class="o">....</span> <span class="k">print</span> <span class="n">key_value</span><span class="p">,</span> <span class="nb">list</span><span class="p">(</span><span class="n">split_generator</span><span class="p">)</span>
<span class="mi">0</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
<span class="mi">1</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
<span class="mi">2</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">]</span>
<span class="mi">3</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">]</span>
</code></pre></div>
<p>This version has just one problem: it works only for the sequence of number given by range. to adapt it to a more general case, we need to use the enumerate function to obtain the golden sequence, filter it and then keep only the interesting data:</p>
<div class="highlight"><pre><code class="python"><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">groupby</span>
<span class="n">seq</span> <span class="o">=</span> <span class="s">'abcdefghilm'</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">split_gen</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">seq</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">//</span><span class="mi">3</span><span class="p">):</span>
<span class="k">print</span> <span class="n">key</span><span class="p">,</span> <span class="nb">list</span><span class="p">(</span><span class="n">i</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">split_gen</span><span class="p">)</span>
<span class="mi">0</span> <span class="p">[</span><span class="s">'a'</span><span class="p">,</span> <span class="s">'b'</span><span class="p">,</span> <span class="s">'c'</span><span class="p">]</span>
<span class="mi">1</span> <span class="p">[</span><span class="s">'d'</span><span class="p">,</span> <span class="s">'e'</span><span class="p">,</span> <span class="s">'f'</span><span class="p">]</span>
<span class="mi">2</span> <span class="p">[</span><span class="s">'g'</span><span class="p">,</span> <span class="s">'h'</span><span class="p">,</span> <span class="s">'i'</span><span class="p">]</span>
<span class="mi">3</span> <span class="p">[</span><span class="s">'l'</span><span class="p">,</span> <span class="s">'m'</span><span class="p">]</span>
</code></pre></div>
<p>We will meet again this function when we will talk about natural ordering.</p>
<p>Last, but not least, using the numpy module, one obtain the function array_split, that allow to divide the sequence in n given partition of variable size. In this case, to obtain the same splitting, you should use 4 division. It also return not a list of lists, but a list of numpy arrays.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array_split</span> <span class="k">as</span> <span class="n">split</span>
<span class="o">>>></span> <span class="n">split</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">11</span><span class="p">),</span> <span class="mi">4</span><span class="p">)</span>
<span class="p">[</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]),</span> <span class="n">array</span><span class="p">([</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]),</span> <span class="n">array</span><span class="p">([</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">]),</span> <span class="n">array</span><span class="p">([</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">])]</span>
</code></pre></div>
<p>This function is quite powerful, as it can also split around specific points or in a given axis for multidimensional arrays. Also matplotlib as a similar function, called pieces, under the submodule matplotlib.cbook, which contains several small recipes from the matplotlib cookbook (http://matplotlib.org/api/cbook_api.html). If you use matplotlib give it a look, it is not very well documented, but contains a lot of useful objects.</p>
<p>In conclusion, I would like to gave you a puzzle that i found on StackOverFlow. It is a piece of BAD PYTHON, very bad indeed, but is curious how it get the job done.</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">acc</span><span class="o">=</span><span class="p">[]:</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">:],</span> <span class="n">n</span><span class="p">,</span> <span class="n">acc</span><span class="o">+</span><span class="p">[(</span><span class="n">x</span><span class="p">[:</span><span class="n">n</span><span class="p">])])</span> <span class="k">if</span> <span class="n">x</span> <span class="k">else</span> <span class="n">acc</span>
<span class="o">>>></span> <span class="n">f</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">11</span><span class="p">),</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">]]</span>
</code></pre></div>
<p>I have to admit that my first reaction was “wait…what happened?!?”. I’m actually still confused on how someone could have thought of something like that. sure as hell no one would be able to debug it if something goes in the bad direction. This is a recursive function, that take a list, extract a first chunk and add it to the accumulator value acc, then pass to itself the shorter list and the accumulator, until the list is empty and it return the accumulator. We can visualize what is happening by rewriting the function in a more explicit way and by printing the intermediate results:</p>
<div class="highlight"><pre><code class="python"><span class="o">>>></span> <span class="k">def</span> <span class="nf">g</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">acc</span><span class="o">=</span><span class="p">[]):</span>
<span class="o">...</span> <span class="k">print</span> <span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">x</span><span class="p">,</span><span class="n">acc</span><span class="p">)</span>
<span class="o">...</span> <span class="c">#recursive until exausted</span>
<span class="o">...</span> <span class="k">if</span> <span class="n">x</span><span class="p">:</span>
<span class="o">...</span> <span class="c"># launch the same function on a shorter</span>
<span class="o">...</span> <span class="c"># version of the list with the</span>
<span class="o">...</span> <span class="c"># accumulated list of lists </span>
<span class="o">...</span> <span class="k">return</span> <span class="n">g</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">:],</span> <span class="n">acc</span><span class="o">+</span><span class="p">[(</span><span class="n">x</span><span class="p">[:</span><span class="n">n</span><span class="p">])])</span>
<span class="o">...</span> <span class="c">#when exausted return the result</span>
<span class="o">...</span> <span class="k">else</span><span class="p">:</span>
<span class="o">...</span> <span class="k">return</span> <span class="n">acc</span>
<span class="o">>>></span> <span class="n">s</span> <span class="o">=</span> <span class="n">g</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">11</span><span class="p">))</span>
<span class="o">>>></span> <span class="k">print</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">,</span><span class="n">s</span>
<span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[])</span>
<span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]])</span>
<span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]])</span>
<span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">]])</span>
<span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">]])</span>
<span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">]]</span>
</code></pre></div>
<p>Yes, it is close to black magic, but it works. But remember kids: recursion is bad, unless it is the only sensible solution to your problem (and 99% of the time, it is not).</p>
<p>Of course these methods are not the only that one can think, but should cover a wide range of necessity, teaching us something in between.</p>
<p>See you next time! </p>
Introduction2012-10-20T09:04:17-07:00http://EnricoGiampieri.github.com/introduction/2012/10/20/introduction<h3 id="introduction">introduction</h3>
<p>Hi Internet!</p>
<p>How do you do? I’m not sure where I’m headed, but i would like to try and play with you a little bit.</p>
<p>What i would like to share is the feeling of beauty and power that i feel every time i open my python shell.</p>
<p>Who am I? I’m just a physicist who work on biophysics and biological data analysis, and few years ago fell in love with this programming language.</p>
<p>I’ve always programmed, in a way or another, but for the first time I actually had fun programming. I’m not an expert, but these are my two cents for the python community, which gave me this amazing instrument.</p>
<p>Given my field of research, my best friends are:</p>
<ul>
<li>IPython (http://ipython.org/). and especially its notebook feature..simply stunning</li>
<li>Numpy (http://numpy.scipy.org/), for dealing with matrices and array</li>
<li>Matplotlib (http://matplotlib.sourceforge.net/) for plotting the data</li>
<li>Scipy (http://www.scipy.org/) for all the numerical algorithms</li>
<li>Sympy (http://sympy.org/) for the symbolic mathematics</li>
</ul>
<p>But sometimes I hang out with these buddies:</p>
<ul>
<li>Networkx (http://networkx.lanl.gov/) for dealing with network analysis</li>
<li>Pandas (http://pandas.pydata.org/) for managing tabular data</li>
<li>PyTables (http://www.pytables.org/moin) for managing HUGE tabular data</li>
<li>Cython (http://cython.org/) when performance is a necessity</li>
</ul>
<p>Too often I stumbled upon great snippet of code to do all sort of amazing stuffs, without any eplanation on how they do that, forcing me to lose an entire day to get the idea if I ever wanted to try and modify them. What I will try to do (will I succeed? we’ll see) is to share with you some code snippet, utility classes, insights and similar, focusing on “what i want to do” and “what is the easier code to do that”, explaining the idea behind the code.</p>
<p>Have fun, and say hello if you like the work!</p>