Monday, November 19, 2012

MacPorts update broke your ipython/matplotlib installation?

I recently updated my MacPorts installation just to find out that it broke my ipython installation, even though my python environment is configured independently from macports.

The error message was more or less like this: I run the command

ipython --pylab

and matplotlib does not work, complaining about the missing the library file libpng14.14.dylib with a message "image not found".

If you installed python, ipython and matplotlib following my tutorial (i.e. independently from the macports python installation), you just need to reinstall matplotlib with pip:

pip uninstall matplotlib
pip install matplotlib

This will reconfigure matplotlib with the updated dependencies.

Friday, November 9, 2012

Matplotlib 1.2 and ipython 0.13.1 released

Just found out that there new releases of ipython and matplotlib!

Some highlights of the matplotlib update are:

  • python 3x support
  • streamplot method (I have a friend who plots lots of streamlines from his MHD simulations and will find this interesting)
If your scientific python environment was installed following my tutorial, you can easily update these packages via pip using the command:

pip install --upgrade matplotlib

pip install --upgrade ipython

Wednesday, August 29, 2012

Want to get Python 3 working with ipython, scipy and matplotlib on Mac OS X Lion or Mountain Lion?

My instructions on how to get Python working with iPython, Scipy and matplotlib on Lion and Mountain Lion work for Python 2.x. A reader of the blog asked if these instructions work also for Python 3.x.

Well, I tried to get Python 3.x working with iPython etc on Mountain Lion following my previous tutorial and it turns out it works!

The Python binary I tested is the python32 package installed via MacPorts. The only thing you need to replace in my instructions is on step 2: you need to replace /usr/bin/python there with python32.

And when running ipython, you need to replace ipython with ipython3.

Wednesday, August 15, 2012

New favorite source code editor

I have a new favorite source code editor: Sublime Text. Very nice interface with pleasant fonts, cool choices of colors for syntax highlighting and tabs in the upper part of the window as opposed to the left or right side (contrast this behavior with Text Wrangler or Text Mate).

Give it a try.

Downside: it will set you back by 60 bucks if you fall in love with the editor and want to disable the sporadic window that pops up reminding you to get a license.


How to install a scientific Python environment on Mac OS X Mountain Lion

I updated my Macbook Pro to Mountain Lion (hereafter ML) and I wanted to see if my previous Python installation tutorial is valid also for ML. It turns out that most of the commands still work, but some of them need a little fine-tuning.

Please use this guide if you want to get a working scientific Python environment (Python + Scipy + iPython + Numpy + matplotlib) under Mac OS X Mountain Lion. The installation will be based on the Python binary that comes by default with OS X (i.e. Xcode).

Here are the steps:
  1. Install the requirements: Xcode which includes Python (via App Store), gfortran (via Macports), virtualenv, additional libraries required by matplotlib
  2. Create a python environment with virtualenv
  3. Install Numpy, Scipy, matplotlib, ipython with pip. Install readline with easy_install
  4. Create an alias in .profile or .bash_profile (depending on your shell) to run ipython
After these steps are completed, you will get a working Python environment for scientific analysis, visualization and statistics with Mac OS X Mountain Lion. 

1. How to get the requirements working

Xcode + Python

Launch the App Store and download Xcode. After it is installed, open it and go to Xcode's preferences. There, go to the Downloads tab, look for Command Line Tools and click Install. That should install the  default OS X Python binary (these instructions were inspired by this post).

gfortran

In my case, I installed gfortran by installing MacPorts and installing GCC which comes with gfortran:

 sudo port install gcc44  

To make gfortran visible to the system I created an alias in /usr/local/bin:

 cd /usr/local/bin/  
 sudo ln -s /opt/local/bin/gfortran-mp-4.4 gfortran  

virtualenv

I went to web page that hosts virtualenv and downloaded virtualenv.py. You will use virtualenv.py below.

Additional libraries required by matplotlib (optional)

I use the graphical backend TkAgg, which requires the following additional libraries for matplotlib to work: tk, freetype, libpng. I installed them using macports:

sudo port install tk
sudo port install freetype
sudo port install libpng

2. Create a python environment with virtualenv

Create a directory stdpy (in my example) somewhere and issue the command

 /usr/bin/python virtualenv.py stdpy  

to create an isolated python environment based on the python provided by default with Mac OS X. This avoids trouble with mixing libraries. Activate the environment by running

 source stdpy/bin/activate  

You should now see a (stdpy) showing up in your terminal.

3. Install Numpy, Scipy, matplotlib, ipython with pip and readline with easy_install

After activating the python environment, let's proceed and install the additional modules with pip and easy_install:

pip install numpy
pip install git+https://github.com/scipy/scipy#egg=scipy-dev
pip install git+https://github.com/matplotlib/matplotlib.git#egg=matplotlib-dev
easy_install readline
pip install ipython  

The reason why issuing simply the commands "pip install scipy" and "pip install matplotlib" do not work is explained in this blog post.

You may need to install additional libraries in order to get matplotlib compiled, depending on the kind of graphical backend that you choose. In my case, I use TkAgg which depends on Tk, freetype and libpng libraries which I installed via macports.

4. Create an alias in .profile or .bash_profile (depending on your shell) to run python

In my case I use Bash and I added the following line to the file .bash_profile in my home directory:

 alias ipy='source ~/stdpy/bin/activate && ipython --pylab'   

Now, when I open the terminal and issue the command

 ipy  

it will automatically activate the python environment and run ipython.


Let me know if these instructions work well for you.

Changelog:

  • Aug. 18 2012: added matplotlib compilation requirements
  • Sep. 1st 2012: made explanation about matplotlib dependencies clearer (hopefully)

Thursday, July 19, 2012

How to switch from IDL to Python

There is a new DIY tutorial on how to switch from IDL to Python hosted by AstroBetter. It is especially useful if you are an IDL programmer and want to grasp the basic concepts of python.

Go check it out.

Tuesday, June 12, 2012

Parallel computing in Python for the masses


Case scenario: you wrote a python routine that does some kind of time-consuming computation. Then you think, wow, my computer has N cores but my program is using only one of them at a time. What a waste of computing resources. Is there a reasonably easy way of modifying my code to make it exploit all the cores of my multicore machine?

The answer is yes and there are different ways of doing it. It depends on how complex your code is and which method you choose to parallelize your computation.

I will talk here about one relatively easy way of speeding up your code using the multiprocessing python package. I should mention that there are many other options out there but the multiprocessing package comes pre-installed with any python distribution by default.

I am assuming that you really need to make your code parallel. You will have to stop and spend time thinking about how to break your computation in smaller parts that will be sent to the different cores. And I should mention that debugging is harder for parallel code compared to serial code, obviously.

Parallelization is one way of optimizing your code. Other ideas for optimizing your code is using Cython or f2py. Both these approaches may imply >10x speedup and are worth exploring depending on your situation. But both will involve using the C or Fortran languages along with your python code.

The ideal case is when your problem "embarassingly parallel". What I mean by this is: your problem  can be made parallel in a reasonably easy way since the computations which correspond to the bottleneck of the code can be carried out independently and do not need to communicate between each other. Examples:

  • You have a "grid" of parameters that you need to pass to a time-consuming model (e.g., a 1000x1000 matrix with the values of two parameters). Your model needs to evaluate those parameters and provide some output.
  • Your code performs a Monte Carlo simulation with 100000 trials which are carried out in a loop. You can then easily "dismember" this loop and send it to be computed independently by the cores in your machine.

Instead of giving code examples myself, I will point out the material I used to learn parallelization. I learned the basics of parallel programming by reading the excellent tutorial "introduction to parallel programming" written by Blaise Barney.

The next step was learning how to use the multiprocessing package. I learned this with the examples posted in the AstroBetter blog. I began by reading the example implemented with the pprocess package. The caveat here is that 'pprocess' is a non-standard package. The multiprocessing package which comes with python should be used instead. Somebody posted the original example discussed in the blog ported to the multiprocessing package.

As the posts above explain, the basic idea behind using 'multiprocessing' is to use the parallel map method to evaluate your time-consuming function using the many cores in your machine. Once you figure out a way of expressing your calculation in terms of the 'map' method, the rest is easy.

In my experience doing parallel programming in python using 'multiprocessing' I learned a few things which I want to share:

  1. Do not forget to close the parallel engine with the close() method after your computation is done! If you do not do this, you will end up leaving a lot of orphan processes which can quickly consume the available memory in your machine.
  2. Avoid using lambda functions when passing arguments to the parallel 'map' at all costs! Trust me,  multiprocessing does not play well with lambda constructs.
  3. Finally, as I mentioned before, parallelizing a code increases development time and the complexity of debugging your code. Only resort to parallelization if you really need it, i.e. if you think you will get a big speedup in your code execution. For example, if you code takes 24 hours to execute and you think you can get a 6x speedup by resorting to 'multiprocessing', then the execution time can be reduced to 4 hours which is not bad.

Thursday, May 24, 2012

Hidden features of Python

I learned about this link with many useful hidden features of Python via Eduardo.

I particularly like:

  • the use of enumerator in loops: for i,x in enumerate(array)
  • decorators as a simple way of enhancing methods: @method
  • one-line swapping of variables: a,b=b,a

Wednesday, May 9, 2012

Parallel programming with Python: coming soon

Coming soon here: parallel programming in python for the masses!

I will write a tutorial as soon as I have time from my work duties.

Stay tuned.

Thursday, April 26, 2012

Using git to manage source code and more

Recently I learned how to use Git to manage source code (thanks to this guy). Let me tell you, it is such a fantastic tool! Especially when you have thousands of line of source code constantly evolving and you need to keep track of what changes.

In my case, I have been using it to manage the source code I wrote for my different scientific projects. And I will soon begin using it even to manage the writing of one paper.

Let me list the tutorials that I read and have been very useful in getting me started quickly:

  • Git Magic: I began learning git with this one. It goes straight to the point and illustrates the most important commands.
  • Pro Git: need more detailed information and have more time to spend learning git? Have a look at this one.

Quick reference for gitds:

I use SourceTree, a GUI on Mac, to check the evolution of the source code.



Changelog
May 24th 2012: replaced suggestion of GUI GitX -> SourceTree.

Wednesday, April 11, 2012

How to easily do error propagation with Python

You can easily do error propagation using the uncertainties package in Python, without having to estimate analytically the propagated error or doing Monte Carlo simulations.

Example: Suppose you have two arrays x and y which have associated uncertainties errx and erry. Using x and y, you calculate some function z = f(x, y) which can be arbitrarily complicated (but can be expressed in an analytical form) and you want to estimate the resulting uncertainty errz in z from errx and erry.

The script below

  • defines the arrays x, y, errx and erry using numpy
  • defines a function z=log10(x+y^2) for illustration purposes
  • demonstrates how to invoke 'uncertainties' in order to estimate the uncertainty in z from errx and erry (i.e., errz = f(errx, erry) )


Requirements:



 import uncertainties as unc  
 import uncertainties.unumpy as unumpy  
 import numpy  
 import nemmen  
   
 # Defines x and y  
 x=numpy.linspace(0,10,50)  
 y=numpy.linspace(15,20,50)  
   
 # Defines the error arrays, values follow a normal distribution  
 # (method random_normal defined in http://astropython.blogspot.com/2012/04/how-to-generate-array-of-random-numbers.html)  
 errx=nemmen.random_normal(0.1,0.2,50);     errx=numpy.abs(errx)  
 erry=nemmen.random_normal(0.3,0.2,50);     erry=numpy.abs(erry)  
   
 # Defines special arrays holding the values *and* errors  
 x=unumpy.uarray(( x, errx ))  
 y=unumpy.uarray(( y, erry ))  
   
 """  
 Now any operation that you carry on xerr and yerr will   
 automatically propagate the associated errors, as long  
 as you use the methods provided with uncertainties.unumpy  
 instead of using the numpy methods.  
   
 Let's for instance define z as   
 z = log10(x+y**2)  
 and estimate errz.  
 """  
 z=unumpy.log10(x+y**2)  
   
 # Print the propagated error errz  
 errz=unumpy.std_devs(z)  
 print errz  

Update Oct. 23rd 2014: code snippet available on github.

How to generate an array of random numbers following a normal distribution with arbitrary mean and standard deviation

Here is a recipe to generate an array of random numbers following a normal distribution with the supplied mean and standard deviation in Python:

 def random_normal(mean,std,n):  
      """  
 Returns an array of n elements of random variables, following a normal   
 distribution with the supplied mean and standard deviation.  
      """  
      import scipy  
      return std*scipy.random.standard_normal(n)+mean  

Update Oct. 23rd 2014: code snippet available on github

Friday, March 23, 2012

BCES code ported to Python!

Just ported the good old BCES linear regression algorithm (the four methods described in the Akritas & Bershady 1996 paper), originally written in Fortran 77, to Python!!

I am testing the python code and making sure it produces exactly the same answers as the old code.

More news soon...

Update (March 26th 2012): implemented the bootstrapping BCES!

Update (May 19th 2014): the BCES python code is now available as a Github project. Feel free to download it and even contribute improvements.

Thursday, March 22, 2012

Simple progress bar in terminal

If you need to incorporate a simple progress bar in your code, there is a module that does that and is very easy to use: fish.

The following script illustrates how to implement a simple progress bar which advances each time a loop counter increases.

 import fish  
 import time  
   
 steps=input('How many steps? ')  
   
 # Progress bar initialization  
 peixe = fish.ProgressFish(total=steps)       
   
 for i in range(steps):  
      # Progress bar  
      peixe.animate(amount=i)  
        
      time.sleep(0.1)  

Here is a screenshot of what the progress bar looks like in action:


Wednesday, March 7, 2012

How to install a scientific Python environment on Mac OS X Lion

My Mac OS X workstation was just updated to Lion and I had to reinstall Python and associated scientific tools for plotting, statistics etc. What a pain.

After many trial-and-error procedures I finally found a way to get a scientific Python environment (Python + Scipy + iPython + Numpy + matplotlib) working correctly on Mac OS X Lion. I am reporting the steps I carried out hoping that it will help other people.

You will need a Python installation (in my example I use the one that comes by default with OS X), gfortran and Xcode. Here are the steps:
  1. Install the requirements: Xcode which includes Python (via App Store), gfortran (via Macports), virtualenv, additional libraries required by matplotlib
  2. Create a python environment with virtualenv
  3. Install Numpy, Scipy, matplotlib, ipython with pip. Install readline with easy_install
  4. Create an alias in .profile or .bash_profile (depending on your shell) to run ipython
After these steps are completed, you will get a working Python environment for scientific analysis, visualization and statistics with Mac OS X Lion. 

Requirements
  1. Xcode
  2. Python 2.7, which comes pre-installed by default with OS X
  3. gfortran
  4. virtualenv
  5. additional libraries required by matplotlib (optional)

1. How to get the requirements working

gfortran

In my case, I installed it by installing MacPorts and installing GCC which comes with gfortran:

 sudo port install gcc44  

To make gfortran visible to the system I created an alias in /usr/local/bin:

 cd /usr/local/bin/  
 sudo ln -s /opt/local/bin/gfortran-mp-4.4 gfortran  

virtualenv

I went to web page that hosts virtualenv and downloaded virtualenv.py. You will use virtualenv.py below.


Additional libraries required by matplotlib (optional)

I use the graphical backend TkAgg, which requires the following additional libraries for matplotlib to work: tk, freetype, libpng. I installed them using macports:

sudo port install tk
sudo port install freetype
sudo port install libpng


2. Create a python environment with virtualenv

Create a directory stdpy (in my example) somewhere and issue the command

 /usr/bin/python virtualenv.py stdpy  

to create an isolated python environment based on the python provided by default with Mac OS X. This avoids trouble with mixing libraries. Activate the environment by running

 source stdpy/bin/activate  

You should now see a (stdpy) showing up in your terminal.

3. Install Numpy, Scipy, matplotlib, ipython with pip and readline with easy_install

After activating the python environment, let's proceed and install the additional modules with pip and easy_install:

 pip install numpy  
 pip install scipy  
 pip install matplotlib  
 pip install ipython  
 easy_install readline  

You may need to install additional libraries in order to get matplotlib compiled, depending on the kind of graphical backend that you choose. In my case, I use TkAgg which depends on Tk, freetype and libpng libraries which I installed via macports.

4. Create an alias in .profile or .bash_profile (depending on your shell) to run python

In my case I use Bash and I added the following line to the file .bash_profile in my home directory:

 alias ipy='source ~/stdpy/bin/activate && ipython --pylab'   

Now, when I open the terminal and issue the command

 ipy  

it will automatically activate the python environment and run ipython.





Changelog:

  • Aug. 18th 2012: added instructions about additional libraries in matplotlib
  • Sep. 1st 2012: made explanation about matplotlib dependencies clearer (hopefully)

Thursday, March 1, 2012

How to make Homebrew's Python the default version

I am trying out Homebrew and the Python that comes with it. I was having the problem that when I try to run Homebrew's python in the shell, I ended up running the outdated python that comes with OS X instead.

Here is how to make sure that Mac uses the homebrew python instead of the version shipped with OS X.

I am assuming you are using bash. To find out which shell you are using, type:
 echo $SHELL  
I am also assuming you installed homebrew in /usr/local (the default installation directory).

Now, edit the file .bash_profile (e.g. using nano) in your home directory and add the following line in the end of the file:

 # Homebrew Python instead of Apple's  
 export PATH=/usr/local/bin:/usr/local/share/python:$PATH  

Now restart your terminal and there you go!

Friday, February 24, 2012

Plots with several histograms

Creating a plot with two histograms

Here is a method that you can use to plot two histograms in the same figure sharing the same X-axis, keeping some distance between the histograms:

 def twohists(x1,x2,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',xlabel='',sharey=False):  
      """  
 Script that plots two histograms of quantities x1 and x2.   
   
 Arguments:  
 - x1,x2: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg: legends for each histogram       
 - xlabel: self-explanatory.  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012: Added sharey argument.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(2,1,1)  
      if sharey==True:  
           b=fig.add_subplot(2,1,2, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(2,1,2, sharex=a)  
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
   
      b.set_xlabel(xlabel)  
      b.set_ylabel('Number',verticalalignment='bottom')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()  

... and here is a example script that uses the method above:

 """  
 Illustrates how to use the twohists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 nemmen.twohists(x1,x2,-3,3,'Normal','Uniform')  
   
 pylab.show()  

... to create the following plot:



Creating a plot with three histograms


I also wrote a recipe that makes a plot with three histograms:

 def threehists(x1,x2,x3,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',x3leg='$x_3$',xlabel='',sharey=False):  
      """  
 Script that plots three histograms of quantities x1, x2 and x3.   
   
 Arguments:  
 - x1,x2,x3: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg, x3leg: legends for each histogram       
 - xlabel: self-explanatory.  
 - sharey: sharing the Y-axis among the histograms?  
   
 Example:  
 x1=Lbol(AD), x2=Lbol(JD), x3=Lbol(EHF10)  
 >>> threehists(x1,x2,x3,38,44,'AD','JD','EHF10','$\log L_{\\rm bol}$ (erg s$^{-1}$)',sharey=True)  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012:     Added sharey keyword.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(3,1,1)  
      if sharey==True:  
           b=fig.add_subplot(3,1,2, sharex=a, sharey=a)  
           c=fig.add_subplot(3,1,3, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(3,1,2, sharex=a)  
           c=fig.add_subplot(3,1,3, sharex=a)            
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
   
      c.hist(x3,label=x3leg,color='y')  
      c.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
      pylab.setp(b.get_xticklabels(), visible=False)  
   
      c.set_xlabel(xlabel)  
      b.set_ylabel('Number')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()  

... and as before, here is a script that illustrates how to use the above method:

 """  
 Illustrates how to use the threehists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 x3=scipy.random.standard_normal(1000)  
   
 nemmen.threehists(x1,x2,x3,-3,3,'Normal ($n=100$)','Uniform','Normal ($n=1000$)')  
   
 pylab.show()  

... creating this plot:



Wednesday, February 22, 2012

Plotting pretty histograms: stay tuned

Do you want to learn how to plot a pretty histogram like this one?


As soon as I have time I will post a sample script that plots it. Stay tuned!

Thursday, February 9, 2012

Additional color names for matplotlib plots

When creating plots with matplotlib, we usually use the "default" color names:
  • b : blue
  • g : green
  • r : red
  • c : cyan
  • m : magenta
  • y : yellow
  • k : black
  • w : white
It turns out that matplotlib accepts not only these default color names, but the full range of html color names! So for example, you can plot some data like this:

 pylab.plot(x, y, 'DarkOrange')  

and get a "dark orange" color.

This is an easy way of specifying colors in addition to the standard ones.

Wednesday, February 8, 2012

How to begin learning Python


Many (perhaps most) people that want to learn Python get confused with the overwhelming number of references sources available. Where to start? So many options! 

Motivated by this, I list in this post the references that I used to learn Python (and object-oriented programming as well), which can serve as a starting point for other people. This post is biased towards the scientists interested in learning Python.

Beginner material

Learned the basic syntax and capabilities of the language with the official Python tutorial. You can download all of this as PDF files. I suggest this for people with previous programming experience. For absolute beginners, have a look at the Think Python book below.

Introductory lecture about Python, its syntax and science applications. It shows what Python is capable of for data analysis and plotting. Inspiring. The audio is also available for download as a MP3 file.

Tutorial on using Python for data analysis! How to replace IDL/Matlab with Python. Includes: plotting, FITS files, signal processing.

I learned object-oriented programming using this material. Very clear and "application-oriented" approach. You don't need to be a biologist to understand this.

Longer introduction for people with no previous extensive programming experience.

Quick reference

This is a cheat sheet with the basic commands needed for data analysis, array processing and plotting.

Migrating from IDL/Matlab to Python.

If you are going to do serious stuff with Python, I suggest using the enhanced interactive Python terminal IPython.

Longer introductory books



Longer reference books


Note: this post is a revised version of the text originally posted here.

Friday, February 3, 2012

Computing the chi-squared and reduced chi-squared of a model

Here are two codes for computing the chi-squared of a model compared to some data. Very useful when judging the goodness-of-fit of a model.

Source code for method that returns the chi-squared:
 def chisqg(ydata,ymod,sd=None):  
      """  
 Returns the chi-square error statistic as the sum of squared errors between  
 Ydata(i) and Ymodel(i). If individual standard deviations (array sd) are supplied,   
 then the chi-square error statistic is computed as the sum of squared errors  
 divided by the standard deviations.     Inspired on the IDL procedure linfit.pro.  
 See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 x,y,sd assumed to be Numpy arrays. a,b scalars.  
 Returns the float chisq with the chi-square statistic.  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
      """  
      # Chi-square statistic (Bevington, eq. 6.9)  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
        
      return chisq  




Source code for method that returns the reduced chi-squared of a model. You need to provide the number of free parameters of the model as input to the method.

 def redchisqg(ydata,ymod,deg=2,sd=None):  
      """  
 Returns the reduced chi-square error statistic for an arbitrary model,   
 chisq/nu, where nu is the number of degrees of freedom. If individual   
 standard deviations (array sd) are supplied, then the chi-square error   
 statistic is computed as the sum of squared errors divided by the standard   
 deviations. See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 ydata,ymod,sd assumed to be Numpy arrays. deg integer.  
   
 Usage:  
 >>> chisq=redchisqg(ydata,ymod,n,sd)  
 where  
  ydata : data  
  ymod : model evaluated at the same x points as ydata  
  n : number of free parameters in the model  
  sd : uncertainties in ydata  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
       """  
      # Chi-square statistic  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
             
      # Number of degrees of freedom assuming 2 free parameters  
      nu=ydata.size-1-deg  
        
      return chisq/nu       

Friday, January 13, 2012

Patched BCES fit code in Fortran

I have been using the BCES fitting method of Akritas & Bershady (1996) to perform linear regression on data that has uncertainties on both X and Y (with bootstrapping). Unfortunately, the code made available by Akritas et al. has a few bugs which prevented me from getting it compiled with gfortran on my Mac.

I corrected those bugs in the fortran code and now I am able to compile and run the BCES routine without problems.

If you are interested in the patched BCES fortran routine, let me know and I will be glad to share it with you.