Friday, June 24, 2011

Starting a Python script intelligently

Under Linux (or any Unix-style operating system), it is easy to make a Python script executable. You simply add a "shebang" line at the top of the script:
#! /usr/bin/python
and make the script executable:
chmod +x myscript
But real life is sometimes more complicated than this.
  1. The Python interpreter might be somewhere else in the user's $PATH, for example /usr/local/bin/python or $HOME/bin/python.
  2. There might be multiple Python interpreters and you might want to select one intelligently (e.g., use /usr/bin/python2.6 if it is installed, otherwise fall back to /usr/bin/python2.5, but if neither is available then emit a warning and abort).
  3. Other arbitrary work might have to be done before starting the Python interpreter.
So I was looking for a way to use a small blurb of shell code to intelligently start a Python script. Of course one could write a shell script that invokes a separate Python script, but for convenience, I wanted everything in a single file. And I wanted the bulk of the script to look like plain old Python code, not like a giant string that requires extra quoting. Google failed me, so I put together the following solution:
#! /bin/sh
# -*- mode: python; coding: utf-8 -*-

# This file is used as both a shell script and as a Python script.

""":"
# This part is run by the shell.  It looks for an appropriate Python
# interpreter then uses it to re-exec this script.

if test -x /usr/bin/python2.6
then
  PYTHON=/usr/bin/python2.6
elif test -x /usr/bin/python2.5
then
  PYTHON=/usr/bin/python2.5
else
  echo 1>&2 "No usable Python interpreter was found!"
  exit 1
fi

exec $PYTHON "$0" "$@"
" """

# The rest of the file is run by the Python interpreter.
__doc__ = """This string is treated as the module docstring."""

print "Hello world!"
When this script is run:
  • The first line is a shebang that causes the script to be interpreted using /bin/sh; i.e., as a standard shell script.
  • The shell interpreter ignores the second line as a comment.
  • The line """:" is interpreted as the shell command : (yes, the colon character is the name of a shell command) with some funny quoting around it. The colon command does nothing, so interpretation continues on the next line.
  • The following lines are interpreted, one by one, as shell code. When interpretation reaches the line exec $PYTHON "$0" "$@", it causes the shell process to end and the selected Python interpreter to be executed to interpret the same script with the original command-line arguments. Because the shell interprets lines as they are read, it does not matter that the rest of the script is not valid shell syntax.
  • The Python interpreter starts executing the script. It ignores the shebang line, which just looks like a comment.
  • It interprets the second line as per PEP 0263 as an encoding declaration. It wouldn't be wise to pick a funky encoding, but utf-8 should be safe. This line also tells emacs to edit the file in Python mode even though it has a shell shebang line.
  • The lines between """:" and " """ are seen as a Python multiline string and ignored. (Actually, they are used as the module docstring, but we will overwrite that in a moment.) The only constraint is that the string cannot contain three consecutive quotation marks.
  • The line __doc__ = """...""" overwrites the module docstring. This is where you should document the module.
  • The rest of the file can be arbitrary Python code. This is where you put your main Python script.
This is the tidiest way that I could come up with for making a single file that is both valid shell code and valid Python code. If you know of a better way, please leave a comment.