Python recipe: Functional config file parsing
12 December 2009
Sometimes one has to parse programmatically some file containing
key=value
pairs. In the world of systems administration this means
configuration files most of the time. Also, one thing I like is
functional programming, but in the real world one ends up making
almost all of the code in imperative style. Python allows some
functional constructs, and sometimes I like to use them to make code
most concise, because it express better what the code is trying to do or
just because I wanted to melt my mind doing some functional tricks.
First, let me introduce the code:
from itertools import imap, ifilter
config_items = lambda iterable:
imap(lambda (k, v): (k.strip(), v.strip()),
imap(lambda s: s.split("=", 1),
ifilter(lambda s: s and not s.startswith("#"),
imap(lambda s: s.strip(), iterable))))
Neat, huh? As promised in the title, this is functional. And yes, I am aware of ConfigParser, but I do not need its full power, and also I have found some problems with files containing Unicode strings.
I think this is one of the most beautiful snippets of code I have ever written in Python: it makes just one thing well, and it is terse and concise. Moreover, it is quite easy to explain.
How does it work
I have just written that it is easy to explain how this works. Okay, I
will dissect this beast one line at a time, starting at the innermost.
But first, a quick introduction to imap()
and ifilter
:
imap()
: Works likemap()
, which returns a list whose contents are the results of applying a function (first argument) to each of the elements of another list (the second argument). The difference is thatimap()
uses generators instead.ifilter()
: This one works likefilter()
and will also return a list, whose contents are the items of another list (second argument) for which the result of calling the given function (first argument) isTrue
. This one also works with generators.
Now, let us start with the first one:
imap(lambda s: s.strip(), iterable)
This picks each line, and removes whitespace sitting at the left and and the right of the string.
ifilter(lambda s: s and not s.startswith("#"),
We want to keep interesting lines: empty lines and comment-lines
starting with a hash mark (#
) must be thrashed away. We check for
lines which both are not empty and that do not start with a hash-mark.
imap(lambda s: s.split("=", 1),
That one picks each string and splits it at the first =
character,
thus separating the key from the value. This is what converts each
string into a (key, value)
tuple.
imap(lambda (k, v): (k.strip(), v.strip()),
This is the last remaining detail: Removes extra leading and trailing
whitespace from the keys and values of the generated tuples. This is
needed for removing the spaces around the =
character.
How to use it
Fire in the interpreter, type in (or copy-and-paste) the above code and guess by yourself:
>>> text = """a = 1
... b = this is b"""
>>> tuple(config_items(text.splitlines()))
(('a', '1'), ('b', 'this is b'))
>>> dict(config_items(text.splitlines()))
{'a': '1', 'b': 'this is b'}
>>>
So you pass it a list an iterable which yields lines, and it will
return another iterable, which yields (key, value)
tuples. Thanks to
how dict()
is defined, we can directly pass the result to it and get a
dictionary.
But it would be useful as well to use it on files, so here we go:
>>> file("test.conf", "w").write(text)
>>> dict(config_items(file("test.conf")))
{'a': '1', 'b': 'this is B'}
>>>
For your convenience, you may want to define a helper function if it makes you feel more comfortable:
>>> def config_file_items(path):
... with file(path, "rU") as f:
... return config_items(f)
...
>>> dict(config_file_items("test.conf"))
{'a': '1', 'b': 'this is B'}
>>>
Extra niceties
I have already mentioned that this code uses generators in its
entirety. What is passed from one function to another in the chain of
imap()
and ifilter()
calls are always generators. This means that
if config_items()
is used to read a big file (e.g. some hundreds of
megabytes) only one line is in memory at a given time. This is why I
did not use map()
and filter()
but their “incremental” counterparts
from the itertools
module. So the bottom line is that this may not be
the most efficient implementation out there, but it is good and is
capable of working over arbitrarily long sequences of data while the
function remains small and understandable.
Error Handling
Whenever the input is not well formed, then this function will raise
ValueError
, or when a =
character is not found in some line. This
means that you can do something like this:
import sys
try:
items = dict(config_items(sys.stdin))
except ValueError:
raise SystemExit("Malformed 'key=value' input in standard input")
Of course more elaborate error checking could be done i.e. to be able of showing to the user the exact offending line number, but the goal is to keep things as simple as possible. Also the syntax of those simple configuration files is so simple that it should be fairy simple for the user to spot typos.
Final words and advice
My advice is that if you have the possibility, make your Python code in such a way that it uses generators, unless you are sure that it will always handle reasonably small amounts of data.
I hope that things are explained well enough, and (who knows!) maybe this can help someone to better understand why generators are a good idea. I will also be happy if you came here looking for some code to parse simple configuration files and this did the trick for you.