Python 2 versus Python 3ΒΆ

The biggest difference between Python 2 and Python 3 is in their string handling, and this is particularly relevant to Patsy since it parses user input. We follow a simple rule: input to Patsy should always be of type str. That means that on Python 2, you should pass byte-strings (not unicode), and on Python 3, you should pass unicode strings (not byte-strings). Similarly, when Patsy passes text back (e.g. DesignInfo.column_names), it’s always in the form of a str.

In addition to this being the most convenient for users (you never need to use any b”weird” u”prefixes” when writing a formula string), it’s actually a necessary consequence of a deeper change in the Python language: in Python 2, Python code itself is represented as byte-strings, and that’s the only form of input accepted by the tokenize module. On the other hand, Python 3’s tokenizer and parser use unicode, and since Patsy processes Python code, it has to follow suit.

