Friday, February 24, 2012

Plots with several histograms

Creating a plot with two histograms

Here is a method that you can use to plot two histograms in the same figure sharing the same X-axis, keeping some distance between the histograms:

 def twohists(x1,x2,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',xlabel='',sharey=False):  
      """  
 Script that plots two histograms of quantities x1 and x2.   
   
 Arguments:  
 - x1,x2: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg: legends for each histogram       
 - xlabel: self-explanatory.  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012: Added sharey argument.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(2,1,1)  
      if sharey==True:  
           b=fig.add_subplot(2,1,2, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(2,1,2, sharex=a)  
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
   
      b.set_xlabel(xlabel)  
      b.set_ylabel('Number',verticalalignment='bottom')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()  

... and here is a example script that uses the method above:

 """  
 Illustrates how to use the twohists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 nemmen.twohists(x1,x2,-3,3,'Normal','Uniform')  
   
 pylab.show()  

... to create the following plot:



Creating a plot with three histograms


I also wrote a recipe that makes a plot with three histograms:

 def threehists(x1,x2,x3,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',x3leg='$x_3$',xlabel='',sharey=False):  
      """  
 Script that plots three histograms of quantities x1, x2 and x3.   
   
 Arguments:  
 - x1,x2,x3: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg, x3leg: legends for each histogram       
 - xlabel: self-explanatory.  
 - sharey: sharing the Y-axis among the histograms?  
   
 Example:  
 x1=Lbol(AD), x2=Lbol(JD), x3=Lbol(EHF10)  
 >>> threehists(x1,x2,x3,38,44,'AD','JD','EHF10','$\log L_{\\rm bol}$ (erg s$^{-1}$)',sharey=True)  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012:     Added sharey keyword.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(3,1,1)  
      if sharey==True:  
           b=fig.add_subplot(3,1,2, sharex=a, sharey=a)  
           c=fig.add_subplot(3,1,3, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(3,1,2, sharex=a)  
           c=fig.add_subplot(3,1,3, sharex=a)            
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
   
      c.hist(x3,label=x3leg,color='y')  
      c.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
      pylab.setp(b.get_xticklabels(), visible=False)  
   
      c.set_xlabel(xlabel)  
      b.set_ylabel('Number')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()  

... and as before, here is a script that illustrates how to use the above method:

 """  
 Illustrates how to use the threehists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 x3=scipy.random.standard_normal(1000)  
   
 nemmen.threehists(x1,x2,x3,-3,3,'Normal ($n=100$)','Uniform','Normal ($n=1000$)')  
   
 pylab.show()  

... creating this plot:



Wednesday, February 22, 2012

Plotting pretty histograms: stay tuned

Do you want to learn how to plot a pretty histogram like this one?


As soon as I have time I will post a sample script that plots it. Stay tuned!

Thursday, February 9, 2012

Additional color names for matplotlib plots

When creating plots with matplotlib, we usually use the "default" color names:
  • b : blue
  • g : green
  • r : red
  • c : cyan
  • m : magenta
  • y : yellow
  • k : black
  • w : white
It turns out that matplotlib accepts not only these default color names, but the full range of html color names! So for example, you can plot some data like this:

 pylab.plot(x, y, 'DarkOrange')  

and get a "dark orange" color.

This is an easy way of specifying colors in addition to the standard ones.

Wednesday, February 8, 2012

How to begin learning Python


Many (perhaps most) people that want to learn Python get confused with the overwhelming number of references sources available. Where to start? So many options! 

Motivated by this, I list in this post the references that I used to learn Python (and object-oriented programming as well), which can serve as a starting point for other people. This post is biased towards the scientists interested in learning Python.

Beginner material

Learned the basic syntax and capabilities of the language with the official Python tutorial. You can download all of this as PDF files. I suggest this for people with previous programming experience. For absolute beginners, have a look at the Think Python book below.

Introductory lecture about Python, its syntax and science applications. It shows what Python is capable of for data analysis and plotting. Inspiring. The audio is also available for download as a MP3 file.

Tutorial on using Python for data analysis! How to replace IDL/Matlab with Python. Includes: plotting, FITS files, signal processing.

I learned object-oriented programming using this material. Very clear and "application-oriented" approach. You don't need to be a biologist to understand this.

Longer introduction for people with no previous extensive programming experience.

Quick reference

This is a cheat sheet with the basic commands needed for data analysis, array processing and plotting.

Migrating from IDL/Matlab to Python.

If you are going to do serious stuff with Python, I suggest using the enhanced interactive Python terminal IPython.

Longer introductory books



Longer reference books


Note: this post is a revised version of the text originally posted here.

Friday, February 3, 2012

Computing the chi-squared and reduced chi-squared of a model

Here are two codes for computing the chi-squared of a model compared to some data. Very useful when judging the goodness-of-fit of a model.

Source code for method that returns the chi-squared:
 def chisqg(ydata,ymod,sd=None):  
      """  
 Returns the chi-square error statistic as the sum of squared errors between  
 Ydata(i) and Ymodel(i). If individual standard deviations (array sd) are supplied,   
 then the chi-square error statistic is computed as the sum of squared errors  
 divided by the standard deviations.     Inspired on the IDL procedure linfit.pro.  
 See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 x,y,sd assumed to be Numpy arrays. a,b scalars.  
 Returns the float chisq with the chi-square statistic.  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
      """  
      # Chi-square statistic (Bevington, eq. 6.9)  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
        
      return chisq  




Source code for method that returns the reduced chi-squared of a model. You need to provide the number of free parameters of the model as input to the method.

 def redchisqg(ydata,ymod,deg=2,sd=None):  
      """  
 Returns the reduced chi-square error statistic for an arbitrary model,   
 chisq/nu, where nu is the number of degrees of freedom. If individual   
 standard deviations (array sd) are supplied, then the chi-square error   
 statistic is computed as the sum of squared errors divided by the standard   
 deviations. See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 ydata,ymod,sd assumed to be Numpy arrays. deg integer.  
   
 Usage:  
 >>> chisq=redchisqg(ydata,ymod,n,sd)  
 where  
  ydata : data  
  ymod : model evaluated at the same x points as ydata  
  n : number of free parameters in the model  
  sd : uncertainties in ydata  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
       """  
      # Chi-square statistic  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
             
      # Number of degrees of freedom assuming 2 free parameters  
      nu=ydata.size-1-deg  
        
      return chisq/nu