Saturday, October 11, 2008

Debugging the Jython compiler

Debugging the Jython compiler (or most compilers for that matter) isn't the most straightforward process, so here I'll describe the typical steps I go through to figure it out.

The Problem

I've found a bug while trying out the SymPy project:

(jython)pjenvey@golgo13:~/src/python/sympy$ jython -c "import sympy"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "sympy/__init__.py", line 18, in <module>
from polys import *
File "sympy/polys/__init__.py", line 4, in <module>
from polynomial import Poly, PolynomialError, SymbolsError, \
java.lang.VerifyError: (class: sympy/polys/polynomial$py, method: __mul__$52 signature: (Lorg/python/core/PyFrame;)Lorg/python/core/PyObject;) Accessing value from uninitialized register 9

java.lang.VerifyError means the JVM's bytecode verifier found a problem in the bytecode you've attempted to load. In this case the __mul__ method defined in the sympy.polys.polynomial module is the culprit.

Steps

First I open up the polynomial Python module to see what this __mul__ function is doing. Luckily there is only one __mul__ defined. Unluckily it's a large method, almost 100 lines. Usually I'd take a good look at the Python code to flag likely problematic areas in the Jython compiler (like all the brand new 2.4 and 2.5 features we've been working on). Since this is a larger method I do a very quick skim and nothing catches my eye -- no funky with statements or generator expressions.

Then I take a quick look at the bytecode the compiler generated via javap, the JDK's bundled java class disassembler. Just to double check that I'm dealing with the same method the verifier is talking about. This 100 lines of Python code is translated to about 2000 lines of javap output.

So next I narrow down the test case: copy and paste the __mul__ function into its own module and cut it down until I have the smallest piece possible that reproduces the problem. Bisect, bisect, remove that and dedent, bisect, etc. Turns out to be this line of code:
q[sum([ x*y for x, y in zip(monom, degs) ])] = coeff
Interesting -- this is an old list comprehension call inside of an old method call inside of an old item assignment? Old as in the compiler has handled these expressions for a long time now, and they likely haven't been touched lately.

I can probably narrow this statement down even further, for example I doubt the method call to zip has any relevance. Finally I'm down to this:
a[[b for b, c in d]] = e
Removing the lisp comp or even the tuple unpack won't reproduce the bug. I've renamed the variables to make them easily identifiable to me in the bytecode.

This one line of code translates to roughly this javap output:
Compiled from "bad2.py"
public class bad2$py extends org.python.core.PyFunctionTable implements org.python.core.PyRunnable{
static final bad2$py self;

static final org.python.core.PyString _0;

static final org.python.core.PyCode f$0;

public org.python.core.PyObject f$0(org.python.core.PyFrame);
Code:
0: aload_1
1: ldc #4; //String __file__
3: getstatic #10; //Field _0:Lorg/python/core/PyString;
6: invokevirtual #16; //Method org/python/core/PyFrame.setglobal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
9: aload_1
10: iconst_1
11: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
14: aload_1
15: ldc #22; //String e
17: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
20: astore_2
21: aload_1
22: ldc #28; //String a
24: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
27: new #30; //class org/python/core/PyList
30: dup
31: invokespecial #34; //Method org/python/core/PyList."":()V
34: dup
35: ldc #36; //String append
37: invokevirtual #41; //Method org/python/core/PyObject.__getattr__:(Ljava/lang/String;)Lorg/python/core/PyObject;
40: astore_3
41: aload_1
42: ldc #43; //String _[1_3]
44: aload_3
45: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
48: aconst_null
49: astore_3
50: aload_1
51: iconst_1
52: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
55: aload_1
56: ldc #48; //String d
58: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
61: invokevirtual #52; //Method org/python/core/PyObject.__iter__:()Lorg/python/core/PyObject;
64: astore_3
65: goto 131
68: aload 4
70: iconst_2
71: invokestatic #58; //Method org/python/core/Py.unpackSequence:(Lorg/python/core/PyObject;I)[Lorg/python/core/PyObject;
74: astore 5
76: aload 5
78: iconst_0
79: aaload
80: astore 6
82: aload_1
83: ldc #60; //String b
85: aload 6
87: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
90: aconst_null
91: astore 6
93: aload 5
95: iconst_1
96: aaload
97: astore 6
99: aload_1
100: ldc #62; //String c
102: aload 6
104: invokevirtual #46; //Method org/python/core/PyFrame.setlocal:(Ljava/lang/String;Lorg/python/core/PyObject;)V
107: aconst_null
108: astore 6
110: aload_1
111: iconst_1
112: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
115: aload_1
116: ldc #43; //String _[1_3]
118: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
121: aload_1
122: ldc #60; //String b
124: invokevirtual #26; //Method org/python/core/PyFrame.getname:(Ljava/lang/String;)Lorg/python/core/PyObject;
127: invokevirtual #66; //Method org/python/core/PyObject.__call__:(Lorg/python/core/PyObject;)Lorg/python/core/PyObject;
130: pop
131: aload_1
132: iconst_1
133: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
136: aload_3
137: invokevirtual #69; //Method org/python/core/PyObject.__iternext__:()Lorg/python/core/PyObject;
140: astore 4
142: aload 4
144: ifnonnull 68
147: aload_1
148: iconst_1
149: invokevirtual #20; //Method org/python/core/PyFrame.setline:(I)V
152: aload_1
153: ldc #43; //String _[1_3]
155: invokevirtual #73; //Method org/python/core/PyFrame.dellocal:(Ljava/lang/String;)V
158: aload 6
160: invokevirtual #77; //Method org/python/core/PyObject.__setitem__:(Lorg/python/core/PyObject;Lorg/python/core/PyObject;)V
163: aconst_null
164: astore_2
165: aload_1
166: iconst_m1
167: putfield #81; //Field org/python/core/PyFrame.f_lasti:I
170: getstatic #85; //Field org/python/core/Py.None:Lorg/python/core/PyObject;
173: areturn

}
Tangent

"Accessing value from uninitialized register 9"

By the way JVM 1.5 -- you're a stack based virtual machine and you're bitching to me about unitialized registers? Sure I have bad bytecode, but registers are your problem buddy. I digress..

Tools of the Trade

I've been using javap to look at the actual bytecode we produce, but now I'm also going to look at what that bytecode looks like as Java code with jad, a Java decompiler. I usually start this process with javap because it's more reliable on bad byte code.

Luckily jad handles this one; here's what it looks like:
public PyObject f$0(PyFrame pyframe)
{
pyframe.setglobal("__file__", _0);
pyframe.setline(1);
PyObject pyobject = pyframe.getname("e");
PyList pylist = new PyList();
PyObject pyobject1 = pylist.__getattr__("append");
pyframe.setlocal("_[1_3]", pyobject1);
pyobject1 = null;
pyframe.setline(1);o
pyobject1 = pyframe.getname("d").__iter__();
PyObject pyobject3;
do
{
pyframe.setline(1);
pyobject2 = pyobject1.__iternext__();
if(pyobject2 != null)
{
PyObject apyobject[] = Py.unpackSequence(pyobject2, 2);
pyobject3 = apyobject[0];
pyframe.setlocal("b", pyobject3);
pyobject3 = null;
pyobject3 = apyobject[1];
pyframe.setlocal("c", pyobject3);
pyobject3 = null;
pyframe.setline(1);
pyframe.getname("_[1_3]").__call__(pyframe.getname("b"));
} else
{
break;
}
} while(true);
pyframe.setline(1);
pyframe.dellocal("_[1_3]");
pyframe.getname("a").__setitem__(pylist, pyobject3);
pyobject = null;
pyframe.f_lasti = -1;
return Py.None;
}
There's an oddity in the jad output: pyobject2 doesn't have any type information. Is that related to the bug I'm tracking down or is that a jad 'bug', or maybe both?

So with jad and javap (or even just soley javap) I myself can become a Java byte code verifier by walking through each instruction line by line and eventually (hopefully) noticing a problem.

Fortunately that isn't necessary with a good software byte code verifier. In this context the JVM's own bytecode verifier is *not* good. The error messages from the java.lang.VerifyErrors it throws are always pretty vague.

The ASM project has a great one called CheckClassAdapter. It can't detect every problem, but when it can, it's incredibly helpful. Here it goes:
(jython)pjenvey@golgo13:~/src/java/jython-trunk-clean3/asm-3.1/lib$ java -cp ~/src/java//jython-trunk-clean3/dist/jython.jar:asm-3.1.jar:asm-tree-3.1.jar:asm-analysis-3.1.jar:asm-util-3.1.jar  org.objectweb.asm.util.CheckClassAdapter /tmp/bad2\$py.class
org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 90: Expected an object reference, but found .
at org.objectweb.asm.tree.analysis.Analyzer.analyze(Unknown Source)
at org.objectweb.asm.util.CheckClassAdapter.verify(Unknown Source)
at org.objectweb.asm.util.CheckClassAdapter.main(Unknown Source)
Caused by: org.objectweb.asm.tree.analysis.AnalyzerException: Expected an object reference, but found .
at org.objectweb.asm.tree.analysis.BasicVerifier.copyOperation(Unknown Source)
at org.objectweb.asm.tree.analysis.Frame.execute(Unknown Source)
... 3 more
f$0(Lorg/python/core/PyFrame;)Lorg/python/core/PyObject;
00000 Lbad2$py; PyFrame . . . . . : : FRAME FULL [] []
00001 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00002 Lbad2$py; PyFrame . . . . . : PyFrame : LDC "__file__"
00003 Lbad2$py; PyFrame . . . . . : PyFrame String : GETSTATIC bad2$py._0 : Lorg/python/core/PyString;
00004 Lbad2$py; PyFrame . . . . . : PyFrame String PyString : INVOKEVIRTUAL org/python/core/PyFrame.setglobal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00005 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00006 Lbad2$py; PyFrame . . . . . : PyFrame : ICONST_1
00007 Lbad2$py; PyFrame . . . . . : PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00008 Lbad2$py; PyFrame . . . . . : : ALOAD 1
00009 Lbad2$py; PyFrame . . . . . : PyFrame : LDC "e"
00010 Lbad2$py; PyFrame . . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00011 Lbad2$py; PyFrame . . . . . : PyObject : ASTORE 2
00012 Lbad2$py; PyFrame PyObject . . . . : : ALOAD 1
00013 Lbad2$py; PyFrame PyObject . . . . : PyFrame : LDC "a"
00014 Lbad2$py; PyFrame PyObject . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00015 Lbad2$py; PyFrame PyObject . . . . : PyObject : NEW org/python/core/PyList
00016 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList : DUP
00017 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList : INVOKESPECIAL org/python/core/PyList. ()V
00018 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList : DUP
00019 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList : LDC "append"
00020 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyList String : INVOKEVIRTUAL org/python/core/PyObject.__getattr__ (Ljava/lang/String;)Lorg/python/core/PyObject;
00021 Lbad2$py; PyFrame PyObject . . . . : PyObject PyList PyObject : ASTORE 3
00022 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 1
00023 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame : LDC "_[1_3]"
00024 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame String : ALOAD 3
00025 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00026 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ACONST_NULL
00027 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList Lnull; : ASTORE 3
00028 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList : ALOAD 1
00029 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame : ICONST_1
00030 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00031 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList : ALOAD 1
00032 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame : LDC "d"
00033 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00034 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyObject : INVOKEVIRTUAL org/python/core/PyObject.__iter__ ()Lorg/python/core/PyObject;
00035 Lbad2$py; PyFrame PyObject Lnull; . . . : PyObject PyList PyObject : ASTORE 3
00036 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : GOTO L0
00037 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : L1
00038 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : FRAME FULL [bad2$py org/python/core/PyFrame org/python/core/PyObject org/python/core/PyObject org/python/core/PyObject] [org/python/core/PyObject org/python/core/PyList]
00039 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 4
00040 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : ICONST_2
00041 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject I : INVOKESTATIC org/python/core/Py.unpackSequence (Lorg/python/core/PyObject;I)[Lorg/python/core/PyObject;
00042 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : ASTORE 5
00043 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList : ALOAD 5
00044 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject : ICONST_0
00045 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject I : AALOAD
00046 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject . : PyObject PyList PyObject : ASTORE 6
00047 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ALOAD 1
00048 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame : LDC "b"
00049 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String : ALOAD 6
00050 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00051 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ACONST_NULL
00052 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList Lnull; : ASTORE 6
00053 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 5
00054 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ICONST_1
00055 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject I : AALOAD
00056 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ASTORE 6
00057 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ALOAD 1
00058 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame : LDC "c"
00059 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String : ALOAD 6
00060 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList PyFrame String PyObject : INVOKEVIRTUAL org/python/core/PyFrame.setlocal (Ljava/lang/String;Lorg/python/core/PyObject;)V
00061 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList : ACONST_NULL
00062 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject PyObject : PyObject PyList Lnull; : ASTORE 6
00063 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 1
00064 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame : ICONST_1
00065 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00066 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList : ALOAD 1
00067 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame : LDC "_[1_3]"
00068 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00069 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : ALOAD 1
00070 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyFrame : LDC "b"
00071 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00072 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject PyObject : INVOKEVIRTUAL org/python/core/PyObject.__call__ (Lorg/python/core/PyObject;)Lorg/python/core/PyObject;
00073 Lbad2$py; PyFrame PyObject PyObject PyObject PyObject Lnull; : PyObject PyList PyObject : POP
00074 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : L0
00075 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : FRAME FULL [bad2$py org/python/core/PyFrame org/python/core/PyObject org/python/core/PyObject] [org/python/core/PyObject org/python/core/PyList]
00076 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 1
00077 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame : ICONST_1
00078 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00079 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList : ALOAD 3
00080 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyObject : INVOKEVIRTUAL org/python/core/PyObject.__iternext__ ()Lorg/python/core/PyObject;
00081 Lbad2$py; PyFrame PyObject PyObject . . . : PyObject PyList PyObject : ASTORE 4
00082 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 4
00083 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyObject : IFNONNULL L1
00084 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 1
00085 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame : ICONST_1
00086 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame I : INVOKEVIRTUAL org/python/core/PyFrame.setline (I)V
00087 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 1
00088 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame : LDC "_[1_3]"
00089 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.dellocal (Ljava/lang/String;)V
00090 Lbad2$py; PyFrame PyObject PyObject PyObject . . : PyObject PyList : ALOAD 6
00091 ? : INVOKEVIRTUAL org/python/core/PyObject.__setitem__ (Lorg/python/core/PyObject;Lorg/python/core/PyObject;)V
00092 ? : ACONST_NULL
00093 ? : ASTORE 2
00094 ? : ALOAD 1
00095 ? : ICONST_M1
00096 ? : PUTFIELD org/python/core/PyFrame.f_lasti : I
00097 ? : GETSTATIC org/python/core/Py.None : Lorg/python/core/PyObject;
00098 ? : ARETURN
The first column shown is the instruction number, the second the contents of the local variable table, the third the contents of the operand stack, then finally the instruction.

#90 is that last ALOAD 6 instruction. This tells me the problem is in the prepartion for invoking the __setitem__ method (the following instruction), which is Jython's method for actually doing item assignment, the final step in the order of operations of this line of code.

Basically an item assignment of a[c] = e results in a Java method call of a.__setitem__(c, e).

The ALOAD 6 instruction pushes the local variable at index 6 onto the stack. That's preparing the second argument to the __setitem__ method -- which should be the right hand side of the statement, variable e.

However at that point there's nothing in the local variable 6 (represented by a '.' in CheckClassAdapter's output) -- not even a null, thus it is uninitialzed. Which explains the error message, though not totally the 'register 9' part.

My point earlier about the word 'register' is that the JVM is not a register based virtual machine, so any notion of machine registers should be abstracted away from a user of it. Though it's obvious that at least some local variables would be mapped onto the underlying real machine's registers for performance, so I'll cut that message some slack.

Comparing CheckClassAdapters' output vs jad's I see that jad's pyobject3 is (was) at index 6. That's not even the value we want to assign -- pyobject3 was a temporary variable used for collecting the items in the list comprehension.

To the compiler!

So what does this mean the compiler is doing? In particular, what is the item assignment code doing when calling __setitem__? Here's the method responsible for genearting the item assignment bytecode, from org.python.compiler.CodeCompiler. Ignore the yellow sections, they're not called in this case:
 @Override
public Object visitSubscript(Subscript node) throws Exception {
if (node.slice instanceof Slice) {
return Slice(node, (Slice) node.slice);
}

expr_contextType ctx = node.ctx;
if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Store) {
restoreAugTmps(node, 2);
ctx = expr_contextType.Store;
} else {
visit(node.value);
visit(node.slice);

if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Load) {
saveAugTmps(node, 2);
ctx = expr_contextType.Load;
}
}

switch (ctx) {
case Del:
code.invokevirtual("org/python/core/PyObject", "__delitem__", "(" + $pyObj + ")V");
return null;
case Load:
code.invokevirtual("org/python/core/PyObject", "__getitem__", "(" + $pyObj + ")" + $pyObj);
return null;
Store:
code.aload(temporary);
code.invokevirtual("org/python/core/PyObject", "__setitem__", "(" + $pyObj + $pyObj + ")V");
return null;
}
return null;
}
The sole argument passed to visitSubscript is an AST node. So the AST for this assignment is visited as:
item assignment (visitSubscript):
value being e (visit(node.value) in visitSubscript),
key being (visit(node.slice) in visitSubscript) ->
list comp ->
tuple unpack
Our astview.py utility can show you the real AST nitty gritty:
(jython)pjenvey@golgo13:~/src/java/jython-trunk-clean3$ jython ast/astview.py ~/src/python/sympy/bad2.py
('Module',
('body',
('Assign (1,0)',
('targets',
('Subscript (1,0)',
('value', ('Name (1,0)', ('id', 'a'), ('ctx', ('Load',)))),
('slice',
('Index',
('value',
('ListComp (1,3)',
('elt', ('Name (1,3)', ('id', 'b'), ('ctx', ('Load',)))),
('generators',
('comprehension',
('target',
('Tuple (1,5)',
('elts',
('Name (1,9)', ('id', 'b'), ('ctx', ('Store',))),
('Name (1,12)', ('id', 'c'), ('ctx', ('Store',)))),
('ctx', ('Store',)))),
('iter',
('Name (1,17)', ('id', 'd'), ('ctx', ('Load',)))),
('ifs',))))))),
('ctx', ('Store',)))),
('value', ('Name (1,23)', ('id', 'e'), ('ctx', ('Load',)))))))
This means that some interaction with the list comp/tuple unpack has caused the item assignment to lose track of its state for __setitem__.

The bad ALOAD is generated by the line: code.aload(temporary);. Meaning the variable temporary is set to 6. The value we should be assigning, the e object, is one of the first things created in the generated method. We can see from the CheckClassAdapter output that e is stored at local 2:
00009 Lbad2$py; PyFrame . . . . .  : PyFrame  :     LDC "e"
00010 Lbad2$py; PyFrame . . . . . : PyFrame String : INVOKEVIRTUAL org/python/core/PyFrame.getname (Ljava/lang/String;)Lorg/python/core/PyObject;
00011 Lbad2$py; PyFrame . . . . . : PyObject : ASTORE 2
00012 Lbad2$py; PyFrame PyObject . . . . : : ALOAD 1
(Index 2 being the PyObject after PyFrame in the last line)

So 6 should be 2. I print out the value of temporary from both before and after we visit the child list comp/tuple unpack nodes -- the lines visit(node.value); visit(node.slice); and as I guessed the values are 2 and 6, respectively.

The Fix


I'm not that familliar with this 'temporary' instance variable in the compiler. It doesn't seem to be the best design as a few different methods use it to store a local variable index which they'll reference later, or which another method will grab later (in this case temporary is already setup for our visitSubscript likely by visitAssign). With multiple methods using it a clash was bound to happen at some point in a Visitor like our compiler.

I briefly look at the lisp comp/tuple unpack bytecode generation code and wonder why it's overwriting the temporary variable on us, but I don't think I'm going to fix this there. Since temporary is correct before visiting the child nodes, I opt to save the correct value of temporary immediately at the beginning of visitSubscript, so I can use it later.

Cleaning up the temporary variable business would be preferable, but this will do for now. Our compiler could use a lot of cleanup but that's for the future and has already started. I do at least double check the other uses of this variable to ensure this isn't a problem elsewhere.

the fix:
Index: src/org/python/compiler/CodeCompiler.java
===================================================================
--- src/org/python/compiler/CodeCompiler.java (revision 5379)
+++ src/org/python/compiler/CodeCompiler.java (working copy)
@@ -1604,6 +1604,7 @@
return Slice(node, (Slice) node.slice);
}

+ int value = temporary;
expr_contextType ctx = node.ctx;
if (node.ctx == expr_contextType.AugStore && augmode == expr_contextType.Store) {
restoreAugTmps(node, 2);
@@ -1627,7 +1628,7 @@
return null;
case Param:
case Store:
- code.aload(temporary);
+ code.aload(value);
code.invokevirtual("org/python/core/PyObject", "__setitem__", "(" + $pyObj + $pyObj + ")V");
return null;
}
Finally

Since this is old code, I double checked if the bug was present in the 2.2 release of Jython. Turns out it is, so this is probably a really old bug, possbily going back 8 years to Jython 2.0 when list comprehensions were first introduced.

3 comments:

Anonymous said...

Bravo, bravo.

Anonymous said...

deep breaths ......... feeling a little better, you sir are a brave one

Anonymous said...

Well, this may not be a very encouraging comment, but it strikes me as a good illustration of the principle that high quality routines pay for themselves in saved debugging time later on. It is rather concerning that a well-known project like Jython would have a class that makes such a rookie mistake...but in defense of the project, I must say that I've found that some of the smartest people (who can do things like program compilers in the first place) are also some of the worst programmers.