Saturday, October 11, 2008

Debugging the Jython compiler

Debugging the Jython compiler (or most compilers for that matter) isn't the most straightforward process, so here I'll describe the typical steps I go through to figure it out.

The Problem

I've found a bug while trying out the SymPy project:

(jython)pjenvey@golgo13:~/src/python/sympy$ jython -c "import sympy"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "sympy/__init__.py", line 18, in <module>
from polys import *
File "sympy/polys/__init__.py", line 4, in <module>
from polynomial import Poly, PolynomialError, SymbolsError, \
java.lang.VerifyError: (class: sympy/polys/polynomial$py, method: __mul__$52 signature: (Lorg/python/core/PyFrame;)Lorg/python/core/PyObject;) Accessing value from uninitialized register 9

java.lang.VerifyError means the JVM's bytecode verifier found a problem in the bytecode you've attempted to load. In this case the __mul__ method defined in the sympy.polys.polynomial module is the culprit.

Steps

First I open up the polynomial Python module to see what this __mul__ function is doing. Luckily there is only one __mul__ defined. Unluckily it's a large method, almost 100 lines. Usually I'd take a good look at the Python code to flag likely problematic areas in the Jython compiler (like all the brand new 2.4 and 2.5 features we've been working on). Since this is a larger method I do a very quick skim and nothing catches my eye -- no funky with statements or generator expressions.

Then I take a quick look at the bytecode the compiler generated via javap, the JDK's bundled java class disassembler. Just to double check that I'm dealing with the same method the verifier is talking about. This 100 lines of Python code is translated to about 2000 lines of javap output.

So next I narrow down the test case: copy and paste the __mul__ function into its own module and cut it down until I have the smallest piece possible that reproduces the problem. Bisect, bisect, remove that and dedent, bisect, etc. Turns out to be this line of code:
q[sum([ x*y for x, y in zip(monom, degs) ])] = coeff
Interesting -- this is an old list comprehension call inside of an old method call inside of an old item assignment? Old as in the compiler has handled these expressions for a long time now, and they likely haven't been touched lately.

I can probably narrow this statement down even further, for example I doubt the method call to zip has any relevance. Finally I'm down to this:
a[[b for b, c in d]] = e
Removing the lisp comp or even the tuple unpack won't reproduce the bug. I've renamed the variables to make them easily identifiable to me in the bytecode.

This one line of code translates to roughly this javap output:
Compiled from "bad2.py"
public class bad2$py extends org.python.core.PyFunctionTable implements org.python.core.PyRunnable{
static final bad2$py self;

static final org.python.core.PyString _0;

static final org.python.core.PyCode f$0;

public org.python.core.PyObject f$0(org.python.core.PyFrame);
Code:
0: aload_1
1: ldc #4; //String __file__
3: getstatic #10; //Field _0:Lorg/python/core/PyString;
6: invokevirtual #16; //Method org/python/core/PyFrame.setglobal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
9: aload_1
10: iconst_1
11: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
14: aload_1
15: ldc #22; //String e
17: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
20: astore_2
21: aload_1
22: ldc #28; //String a
24: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
27: new #30; //class org/python/core/PyList
30: dup
31: invokespecial #34; //Method org/python/core/PyList."":()V
34: dup
35: ldc #36; //String append
37: invokevirtual #41; //Method org/python/core/PyObject.__getattr__:(Ljava/lang/String;)Lorg/python/core/PyObject;
40: astore_3
41: aload_1
42: ldc #43; //String _[1_3]
44: aload_3
45: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
48: aconst_null
49: astore_3
50: aload_1
51: iconst_1
52: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
55: aload_1
56: ldc #48; //String d
58: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
61: invokevirtual #52; //Method org/python/core/PyObject.__iter__:()Lorg/python/core/PyObject;
64: astore_3
65: goto 131
68: aload 4
70: iconst_2
71: invokestatic #58; //Method org/python/core/Py.unpackSequence:(Lorg/python/core/PyObject;I)[Lorg/python/core/PyObject;
74: astore 5
76: aload 5
78: iconst_0
79: aaload
80: astore 6
82: aload_1
83: ldc #60; //String b
85: aload 6
87: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
90: aconst_null
91: astore 6
93: aload 5
95: iconst_1
96: aaload
97: astore 6
99: aload_1
100: ldc #62; //String c
102: aload 6
104: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
107: aconst_null
108: astore 6
110: aload_1
111: iconst_1
112: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
115: aload_1
116: ldc #43; //String _[1_3]
118: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
121: aload_1
122: ldc #60; //String b
124: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
127: invokevirtual #66; //Method org/python/core/PyObject.__call__:(Lorg/python/core/PyObject;)Lorg/python/core/PyObject;
130: pop
131: aload_1
132: iconst_1
133: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
136: aload_3
137: invokevirtual #69; //Method org/python/core/PyObject.__iternext__:()Lorg/python/core/PyObject;
140: astore 4
142: aload 4
144: ifnonnull 68
147: aload_1
148: iconst_1
149: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
152: aload_1
153: ldc #43; //String _[1_3]
155: invokevirtual #73; //Method org/python/core/PyFrame.dellocal:(Ljava/lang/String;)V
158: aload 6
160: invokevirtual #77; //Method org/python/core/PyObject.__setitem__:(Lorg/python/core/PyObject;Lorg/python/core/PyObject;)V
163: aconst_null
164: astore_2
165: aload_1
166: iconst_m1
167: putfield #81; //Field org/python/core/PyFrame.f_lasti:I
170: getstatic #85; //Field org/python/core/Py.None:Lorg/python/core/PyObject;
173: areturn

}
Tangent

"Accessing value from uninitialized register 9"

By the way JVM 1.5 -- you're a stack based virtual machine and you're bitching to me about unitialized registers? Sure I have bad bytecode, but registers are your problem buddy. I digress..

Tools of the Trade

I've been using javap to look at the actual bytecode we produce, but now I'm also going to look at what that bytecode looks like as Java code with jad, a Java decompiler. I usually start this process with javap because it's more reliable on bad byte code.

Luckily jad handles this one; here's what it looks like:
public PyObject f$0(PyFrame pyframe)
{
pyframe.setglobal("__file__", _0);
pyframe.setline(1);
PyObject pyobject = pyframe.getname("e");
PyList pylist = new PyList();
PyObject pyobject1 = pylist.__getattr__("append");
pyframe.setlocal("_[1_3]", pyobject1);
pyobject1 = null;
pyframe.setline(1);o
pyobject1 = pyframe.getname("d").__iter__();
PyObject pyobject3;
do
{
pyframe.setline(1);
pyobject2 = pyobject1.__iternext__();
if(pyobject2 != null)
{
PyObject apyobject[] = Py.unpackSequence(pyobject2, 2);
pyobject3 = apyobject[0];
pyframe.setlocal("b", pyobject3);
pyobject3 = null;
pyobject3 = apyobject[1];
pyframe.setlocal("c", pyobject3);
pyobject3 = null;
pyframe.setline(1);
pyframe.getname("_[1_3]").__call__(pyframe.getname("b"));
} else
{
break;
}
} while(true);
pyframe.setline(1);
pyframe.dellocal("_[1_3]");
pyframe.getname("a").__setitem__(pylist, pyobject3);
pyobject = null;
pyframe.f_lasti = -1;
return Py.None;
}
There's an oddity in the jad output: pyobject2 doesn't have any type information. Is that related to the bug I'm tracking down or is that a jad 'bug', or maybe both?

So with jad and javap (or even just soley javap) I myself can become a Java byte code verifier by walking through each instruction line by line and eventually (hopefully) noticing a problem.

Fortunately that isn't necessary with a good software byte code verifier. In this context the JVM's own bytecode verifier is *not* good. The error messages from the java.lang.VerifyErrors it throws are always pretty vague.

The ASM project has a great one called CheckClassAdapter. It can't detect every problem, but when it can, it's incredibly helpful. Here it goes:
(jython)pjenvey@golgo13:~/src/java/jython-trunk-clean3/asm-3.1/lib$ java -cp ~/src/java//jython-trunk-clean3/dist/jython.jar:asm-3.1.jar:asm-tree-3.1.jar:asm-analysis-3.1.jar:asm-util-3.1.jar  org.objectweb.asm.util.CheckClassAdapter /tmp/bad2\$py.class
org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 90: Expected an object reference, but found .
at org.objectweb.asm.tree.analysis.Analyzer.analyze(Unknown Source)
at org.objectweb.asm.util.CheckClassAdapter.verify(Unknown Source)
at org.objectweb.asm.util.CheckClassAdapter.main(Unknown Source)
Caused by: org.objectweb.asm.tree.analysis.AnalyzerException: Expected an object reference, but found .
at org.objectweb.asm.tree.analysis.BasicVerifier.copyOperation(Unknown Source)
at org.objectweb.asm.tree.analysis.Frame.execute(Unknown Source)
... 3 more
f$0(Lorg/python/core/PyFrame;)Lorg/python/core/PyObject;
00000 Lbad2$py; PyFrame . . . . . : : FRAME FULL [] []
00001 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00002 Lbad2$py; PyFrame . . . . . : PyFrame : LDC "__file__"
00003 Lbad2$py; PyFrame . . . . . : PyFrame String : GETSTATIC bad2$py._0 : Lorg/python/core/PyString;
00004 Lbad2$py; PyFrame . . . . . : PyFrame String PyString : INVOKEVIRTUAL org/python/core/PyFrame.setglobal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00005 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00006 Lbad2$py; PyFrame . . . . . : PyFrame : ICONST_1
00007 Lbad2$py; PyFrame . . . . . : PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00008 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00009 Lbad2$py; PyFrame . . . . . : PyFrame : LDC "e"
00010 Lbad2$py; PyFrame . . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00011 Lbad2$py; PyFrame . . . . . : PyObject : ASTORE 2
00012 Lbad2$py; PyFrame PyObject . . . . : : ALOAD 1
00013 Lbad2$py; PyFrame PyObject . . . . : PyFrame : LDC "a"
00014 Lbad2$py; PyFrame PyObject . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00015 Lbad2$py; PyFrame PyObject . . . . : PyObject : NEW org/python/core/PyList
00016 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList : DUP
00017 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList : INVOKESPECIAL org/python/core/PyList. ()V
00018 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList : DUP
00019 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList : LDC "append"
00020 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList String : INVOKEVIRTUAL org/python/core/PyObject.__getattr__ (Ljava/lang/String;)Lorg/python/core/PyObject;
00021 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyObject : ASTORE 3
00022 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 1
00023 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame : LDC "_[1_3]"
00024 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame String : ALOAD 3
00025 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00026 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ACONST_NULL
00027 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList Lnull; : ASTORE 3
00028 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList : ALOAD 1
00029 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame : ICONST_1
00030 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00031 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList : ALOAD 1
00032 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame : LDC "d"
00033 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00034 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyObject : INVOKEVIRTUAL org/python/core/PyObject.__iter__ ()Lorg/python/core/PyObject;
00035 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyObject : ASTORE 3
00036 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : GOTO L0
00037 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : L1
00038 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : FRAME FULL [bad2$py org/python/core/PyFrame org/python/core/PyObject org/python/core/PyObject org/python/core/PyObject] [org/python/core/PyObject org/python/core/PyList]
00039 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 4
00040 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : ICONST_2
00041 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject I : INVOKESTATIC org/python/core/Py.unpackSequence (Lorg/python/core/PyObject;I)[Lorg/python/core/PyObject;
00042 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : ASTORE 5
00043 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList : ALOAD 5
00044 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject : ICONST_0
00045 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject I : AALOAD
00046 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject : ASTORE 6
00047 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ALOAD 1
00048 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame : LDC "b"
00049 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String : ALOAD 6
00050 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00051 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ACONST_NULL
00052 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList Lnull; : ASTORE 6
00053 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 5
00054 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ICONST_1
00055 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject I : AALOAD
00056 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ASTORE 6
00057 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ALOAD 1
00058 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame : LDC "c"
00059 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String : ALOAD 6
00060 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00061 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ACONST_NULL
00062 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList Lnull; : ASTORE 6
00063 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 1
00064 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame : ICONST_1
00065 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00066 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 1
00067 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame : LDC "_[1_3]"
00068 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00069 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ALOAD 1
00070 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyFrame : LDC "b"
00071 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00072 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyObject : INVOKEVIRTUAL org/python/core/PyObject.__call__ (Lorg/python/core/PyObject;)Lorg/python/core/PyObject;
00073 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : POP
00074 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : L0
00075 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : FRAME FULL [bad2$py org/python/core/PyFrame org/python/core/PyObject org/python/core/PyObject] [org/python/core/PyObject org/python/core/PyList]
00076 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 1
00077 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame : ICONST_1
00078 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00079 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 3
00080 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyObject : INVOKEVIRTUAL org/python/core/PyObject.__iternext__ ()Lorg/python/core/PyObject;
00081 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyObject : ASTORE 4
00082 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 4
00083 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : IFNONNULL L1
00084 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 1
00085 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame : ICONST_1
00086 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00087 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 1
00088 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame : LDC "_[1_3]"
00089 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.dellocal (Ljava/lang/String;)V
00090 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 6
00091 ? : INVOKEVIRTUAL org/python/core/PyObject.__setitem__ (Lorg/python/core/PyObject;Lorg/python/core/PyObject;)V
00092 ? : ACONST_NULL
00093 ? : ASTORE 2
00094 ? : ALOAD 1
00095 ? : ICONST_M1
00096 ? : PUTFIELD org/python/core/PyFrame.f_lasti : I
00097 ? : GETSTATIC org/python/core/Py.None : Lorg/python/core/PyObject;
00098 ? : ARETURN
The first column shown is the instruction number, the second the contents of the local variable table, the third the contents of the operand stack, then finally the instruction.

#90 is that last ALOAD 6 instruction. This tells me the problem is in the prepartion for invoking the __setitem__ method (the following instruction), which is Jython's method for actually doing item assignment, the final step in the order of operations of this line of code.

Basically an item assignment of a[c] = e results in a Java method call of a.__setitem__(c, e).

The ALOAD 6 instruction pushes the local variable at index 6 onto the stack. That's preparing the second argument to the __setitem__ method -- which should be the right hand side of the statement, variable e.

However at that point there's nothing in the local variable 6 (represented by a '.' in CheckClassAdapter's output) -- not even a null, thus it is uninitialzed. Which explains the error message, though not totally the 'register 9' part.

My point earlier about the word 'register' is that the JVM is not a register based virtual machine, so any notion of machine registers should be abstracted away from a user of it. Though it's obvious that at least some local variables would be mapped onto the underlying real machine's registers for performance, so I'll cut that message some slack.

Comparing CheckClassAdapters' output vs jad's I see that jad's pyobject3 is (was) at index 6. That's not even the value we want to assign -- pyobject3 was a temporary variable used for collecting the items in the list comprehension.

To the compiler!

So what does this mean the compiler is doing? In particular, what is the item assignment code doing when calling __setitem__? Here's the method responsible for genearting the item assignment bytecode, from org.python.compiler.CodeCompiler. Ignore the yellow sections, they're not called in this case:
 @Override
public Object visitSubscript(Subscript node) throws Exception {
if (node.slice instanceof Slice) {
return Slice(node, (Slice) node.slice);
}

expr_contextType ctx = node.ctx;
if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Store) {
restoreAugTmps(node, 2);
ctx = expr_contextType.Store;
} else {
visit(node.value);
visit(node.slice);

if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Load) {
saveAugTmps(node, 2);
ctx = expr_contextType.Load;
}
}

switch (ctx) {
case Del:
code.invokevirtual("org/python/core/PyObject", "__delitem__", "(" + $pyObj + ")V");
return null;
case Load:
code.invokevirtual("org/python/core/PyObject", "__getitem__", "(" + $pyObj + ")" + $pyObj);
return null;
Store:
code.aload(temporary);
code.invokevirtual("org/python/core/PyObject", "__setitem__", "(" + $pyObj + $pyObj + ")V");
return null;
}
return null;
}
The sole argument passed to visitSubscript is an AST node. So the AST for this assignment is visited as:
item assignment (visitSubscript):
value being e (visit(node.value) in visitSubscript),
key being (visit(node.slice) in visitSubscript) ->
list comp ->
tuple unpack
Our astview.py utility can show you the real AST nitty gritty:
(jython)pjenvey@golgo13:~/src/java/jython-trunk-clean3$ jython ast/astview.py ~/src/python/sympy/bad2.py
('Module',
('body',
('Assign (1,0)',
('targets',
('Subscript (1,0)',
('value', ('Name (1,0)', ('id', 'a'), ('ctx', ('Load',)))),
('slice',
('Index',
('value',
('ListComp (1,3)',
('elt', ('Name (1,3)', ('id', 'b'), ('ctx', ('Load',)))),
('generators',
('comprehension',
('target',
('Tuple (1,5)',
('elts',
('Name (1,9)', ('id', 'b'), ('ctx', ('Store',))),
('Name (1,12)', ('id', 'c'), ('ctx', ('Store',)))),
('ctx', ('Store',)))),
('iter',
('Name (1,17)', ('id', 'd'), ('ctx', ('Load',)))),
('ifs',))))))),
('ctx', ('Store',)))),
('value', ('Name (1,23)', ('id', 'e'), ('ctx', ('Load',)))))))
This means that some interaction with the list comp/tuple unpack has caused the item assignment to lose track of its state for __setitem__.

The bad ALOAD is generated by the line: code.aload(temporary);. Meaning the variable temporary is set to 6. The value we should be assigning, the e object, is one of the first things created in the generated method. We can see from the CheckClassAdapter output that e is stored at local 2:
00009 Lbad2$py; PyFrame . . . . .  : PyFrame  :     LDC "e"
00010 Lbad2$py; PyFrame . . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00011 Lbad2$py; PyFrame . . . . . : PyObject : ASTORE 2
00012 Lbad2$py; PyFrame PyObject . . . . : : ALOAD 1
(Index 2 being the PyObject after PyFrame in the last line)

So 6 should be 2. I print out the value of temporary from both before and after we visit the child list comp/tuple unpack nodes -- the lines visit(node.value); visit(node.slice); and as I guessed the values are 2 and 6, respectively.

The Fix


I'm not that familliar with this 'temporary' instance variable in the compiler. It doesn't seem to be the best design as a few different methods use it to store a local variable index which they'll reference later, or which another method will grab later (in this case temporary is already setup for our visitSubscript likely by visitAssign). With multiple methods using it a clash was bound to happen at some point in a Visitor like our compiler.

I briefly look at the lisp comp/tuple unpack bytecode generation code and wonder why it's overwriting the temporary variable on us, but I don't think I'm going to fix this there. Since temporary is correct before visiting the child nodes, I opt to save the correct value of temporary immediately at the beginning of visitSubscript, so I can use it later.

Cleaning up the temporary variable business would be preferable, but this will do for now. Our compiler could use a lot of cleanup but that's for the future and has already started. I do at least double check the other uses of this variable to ensure this isn't a problem elsewhere.

the fix:
Index: src/org/python/compiler/CodeCompiler.java
===================================================================
--- src/org/python/compiler/CodeCompiler.java (revision 5379)
+++ src/org/python/compiler/CodeCompiler.java (working copy)
@@ -1604,6 +1604,7 @@
return Slice(node, (Slice) node.slice);
}

+ int value = temporary;
expr_contextType ctx = node.ctx;
if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Store) {
restoreAugTmps(node, 2);
@@ -1627,7 +1628,7 @@
return null;
case Param:
case Store:
- code.aload(temporary);
+ code.aload(value);
code.invokevirtual("org/python/core/PyObject", "__setitem__", "(" + $pyObj + $pyObj + ")V");
return null;
}
Finally

Since this is old code, I double checked if the bug was present in the 2.2 release of Jython. Turns out it is, so this is probably a really old bug, possbily going back 8 years to Jython 2.0 when list comprehensions were first introduced.

Tuesday, May 13, 2008

Jython @ JavaOne 2008

I attended a couple days of JavaOne last week and luckily avoided the norovirus outbreak. However I did notice something else spreading through Moscone Center; interest in languages on the JVM. For example:
  • The number of sessions/BoFs covering Dynamic Language topics: around 8 on Groovy, 5 on JRuby, 2 on Jython and also a couple on Scala and Rhino each, with some very good turnouts. That's not even including CommunityOne. Also featured was a Scripting language bowl: a faceoff between JRuby, Groovy, Scala and Jython.
  • The JavaOne book store was stocked with just about every JRuby and Groovy book out there. Of the top 10 selling books at JavaOne, 3 were about languages on top of the JVM (Groovy and JavaFX). Wednesday's top 10 sellers also included JRuby committer Ola Bini's Practical JRuby on Rails as well as another Rails book. I'm hoping to see a whole lot of Python books on the shelf next year.
  • A number of Ruby/JRuby folks (and not just the ones employed by Sun either) told me that they are quite happy with NetBeans' support for Ruby. It has code completion and even some refactoring support. Tor Norbye and the NetBeans crew are now working on JavaScript support (Look Mom, JavaScript type inference) and are slated to add Python support next. Ted Leung has already begun talking to them about the details.
  • Ruby is still on the rise, and JRuby is a big contributing factor. JRuby definitely had the attention of many attendees and definitely has a growing userbase in Java land.
  • Groovy and the Groovy on Grails combo is also on the rise. linkedin.com, Sky TV and SAP's Composition on Grails Product are a few notable users of Grails. IBM's Project Zero (aka the WebSphere sMash product) is also utilizing Groovy, as well as their own PHP on the JVM implementation.
  • Sun's own new scripting lanuguage on top of the JVM, JavaFX, was of course all over the place.
  • John Rose: "I think we are on the right track here, letting the JVM grow independently of the Java language. (The language, if it has room to grow, will catch up.) James Gosling expressed a similar sentiment at his February Java Users Group talk “The Feel of Java, Revisited”, when he said he sometimes felt more interested in the future of the JVM than that of the Java language. “I don’t really care about the Java language. All the magic is in the JVM specification.” (Yes, I think that is hyperbolic. No, he is not abandoning Java.) I love both Java and the JVM, and I am pushing on the JVM this year."
  • The potential Java 7 new features. They're still potential because the Java 7 JSR isn't out yet. While some are nice (like John Rose's JSR 292, which will be a big help for JVM languages like Jython), some are arguably not nice. I hear more disdain than ever over where Java the language is going, which makes other languages on top of the JVM even more desirable.
  • I met many new people (more than I expected) who have used Jython at some point in their career, and some who are using it now (like pushToTest's Frank Cohen). They're all eager about its recent progress and can't wait for the 2.5 release.
  • Reminder: the Java world is large. Over 10,000 attendees, probably 10 times as many as this year's PyCon. All potential Python converts, right?
Ok, I mostly covered what I observed about different JVM languages at JavaOne, not just Jython. What I'm getting at here is that there's a lot of potential for the Jython 2.5 release (and all that code waiting around on Pypi to run on it) in this large Java ecosystem.

Sunday, May 4, 2008

You Must Construct Additional Pylons (on other VMs)

Occasionally on the Pylons IRC Channel or mailing list someone will reference the game Starcraft, where "Pylons" are structures that act as a power source. Sometimes the game vocally instructs you to construct additional Pylons.

You could say that last month we've constructed a couple additional Pylons, and by that I mean made Pylons run on different environments:

o Pylons on Jython trunk!
o Pylons on Google App Engine via Ian Bicking's appengine-monkey and mako trunk

With the development versions of both Pylons and its dependencies (and the Mako jython branch) I'm now able to create the flicksearch tutorial app from the Pylons Official Docs on Jython.

The Mako jython branch uses Python 2.5's _ast module, which Jython now supports thanks to Frank Wierzbicki's recent work. Mako's also using Armin Ronacher's handy ast module that'll hopefully find its way into the stdlib. Frank and I are still working on smoothing out a couple of our _ast's rough edges, hence the separate Mako jython branch, but Mako is in good shape with 95% of its 214 tests passing (with most of those failures due to the lack of PEP 263 source code encodings, which we'll get to supporting on Jython soon).

According to the Pylons on Jython buildbot, only a few of the Pylons dependency's tests fully pass -- but most of the failing tests are due to unicode issues (again, lack of PEP 263), CPython dict ordering assumed in doctests, and a few other issues that aren't important enough to cover here. Pylons own test failures are only due to unicode issues and lack of support for the Kid and Cheetah templating engines, which Pylons is only testing for the sake of testing alternative templating engines anyway.

The biggest problem with Pylons on Jython so far is running it in development mode with paster serve --reload, which hot reloads your WSGI app whenever a change to one of its files is made. The paster reloader is working but the reloading process is very slow on Jython. Reloading is done by spawning 2 processes; when a file changes, the child process serving your app exits and the parent creates it anew (last time I looked Django and CherryPy's reloaders also work this way). Since CPython startup time, including loading of the entire WSGI app, is pretty quick, this is the safest and probably easiest way to accomplish a reload.

Unfortunately Jython suffers from a lenghty startup time. It takes about 1.5 seconds best case scenario just to get the >>> prompt in Jython on my 2.33ghz MacBook Pro. It takes almost 10 seconds to get paster serve --reload to fully load a simple Pylons app; that's the parent process startup time plus the child process startup time plus the app load time.

I've been experimenting with using Nailgun to improve startup time, but it actually isn't helping Pylons' startup time as much as I thought it might. Jython must have a startup time bottleneck or two that we need to find here.

Nailgun also poses other problems, such as when System.exit()'ing resources like sockets and files are left open unless they're explicitly cleaned up. Worse yet, threads are left running -- and Pylons' paster serve by default loads 10 listener threads.

If we can't drastically improve the startup time issue in Jython sooner rather than later, we can probably come up with a reloader for Jython for the time being that doesn't require a full restart. These can be tricky and in general are avoided, though.

Here's a couple screenshots of Pylons on Jython, showing off a handy feature of dynamic languages on the JVM; just having the ability to play around with Java in an interactive interpreter:

o Pylons-dev WebError on Jython
o Pylons paster shell on Jython

Pylons is also making progress towards an additional release, 0.9.7, which will utilize the work done by the PyCon sprinters (particularly on WebHelpers), WebOb, WebError (and maybe WebTest) and will also include optional support for a basic SQLAlchemy configuration for new projects. As well as Alpha Jython support.

Friday, February 8, 2008

setuptools on Jython

A couple weeks ago setuptools added support for Jython trunk. This was made possible with a number of additions and fixes to Jython for setuptools, including distutils and file descriptor support.

A working setuptools is the first big step in getting Pylons working on Jython. Django on Jython has had some recent success; luckily for them Django doesn't rely on/use setuptools at all. So Leo Soto, Jim Baker and the rest of the Django on Jython folks were able to skip directly to kicking around the Django code on Jython.

Now that we have setuptools, the next step is to go through each of Pylons dependencies and ensure their tests pass. Pylons as well as some of its dependencies use the nose test runner. Nose is almost fully working on Jython (after I submit a few upstream patches to nose).

Other dependencies (such as Paste and Beaker) use py.test for their test runners, but there's been some talk of moving some of these over to nose -- which would actually be a good thing right now for Jython. Though technically nose could run through these projects' tests as they already are, they rely on some py.test functionality, such as py.test.raises.

Not to bad mouth py.test -- it has some nice features, like distributed testing -- but I attempted get it working on Jython during the TurboGears 2 sprint a few weeks ago and it wasn't a small task. Part of the problem is the fact that py.test is part of the larger py package. py isn't just a test runner, it also includes other packages that are intertwined with test portion -- all of which need to work before you're able to use just py.test.