This “post” is actually just the pyporter2 homepage, reformatted as a blog post. This is mainly so that the link doesn’t disappear on anybody.
This is an implementation of the Porter2 (english) stemming algorithm in Python. It was born out of some academic work I did on clustering algorithms in the spring of 2008. The Porter Stemming Algorithm was first published in this 1979 paper – it is now one of the most widely known and used stemming algorithms. An implementation of the Porter stemmer already existed in Python, but not of the updated Porter2 stemmer. I decided to implement a Python version of Porter2 as an exercise.
Python bindings for the official C version of the Porter2 stemmer exist here. If using these bindings is an option, it will probably be much more efficient than using the pure Python implementation here. pyporter2 is useful when the C bindings are not an option (like in Jython, IronPython, Babble or App Engine).
$ git clone git://github.com/mdirolf/pyporter2.git
The new API matches that of PyStemmer. Here is an example of how to use pyporter2:
>>> import Stemmer >>> print Stemmer.algorithms() ['english'] >>> stemmer = Stemmer.Stemmer('english') >>> print stemmer.stemWord('cycling') cycl >>> print stemmer.stemWords(['cycling', 'cyclist']) ['cycl', 'cyclist'] >>> print stemmer.stemWords(['cycling', u'cyclist']) ['cycl', u'cyclist']
pyporter2 includes a test suite written using unittest. To run the tests, do:
$ python Stemmer.py
Feel free to leave a comment with any questions. It’d also be cool to let me know if you find pyporter2 useful for anything.