Grant Report: Changing the Perl 5 optree build process into a Abstract Syntax Tree generation and a code generation step
Mon, 05-Apr-2010 by
Alberto Simões
edit post
TPF is proud to announce the completion of Gerard Goossen grant. As Gerard sent a big report, please check the complete report on the extended new page. I would like to thank you both Gerard for completing the grant, and Jeff Horwitz for managing it.
Below are my results of the "Changing the Perl 5 optree build process into a Abstract Syntax Tree generation and a code generation step" grant. For the original grant request see http://news.perlfoundation.org/2009/08/2009q3-grant-proposal-changing.html
In short the idea was to have a new code generation step instead of
the linked op nodes. This would allow the optree to become more of an
Abstract Syntax Tree.
The code is available at github at
http://github.com/ggoossen/perl/tree/codegen
blead is currently frozen for the 5.12 release and thus it can't be
merged yet.
The new code for the code generation step is in
codegen.c. instruction.c and instruction.h contain the code for the
instructions.
Optree
The optree no longer also functions as a instruction sequence, and as
a result has been simplified. As can be seen by the example below
derived from the output of:
/*./perl -Dx -e 'for (@ARGV) { print "arg: $_\n"; }'
BEFORE:
1 TYPE = leave ===> DONE
2 TYPE = enter ===> 3
3 TYPE = nextstate ===> 4
20 TYPE = leaveloop ===> 1
8 TYPE = enteriter ===> 18
TYPE = null ===> (4)
TYPE = null ===> (7)
4 TYPE = pushmark ===> 5
6 TYPE = rv2av ===> 7
5 TYPE = gv ===> 6
7 TYPE = gv ===> 8
TYPE = null ===> (20)
19 TYPE = and ===> 20
18 TYPE = iter ===> 19
21 TYPE = lineseq ===> 0
9 TYPE = nextstate ===> 10
16 TYPE = print ===> 17
10 TYPE = pushmark ===> 11
TYPE = null ===> (16)
TYPE = null ===> (11)
15 TYPE = concat ===> 16
13 TYPE = concat ===> 14
11 TYPE = const ===> 12
TYPE = null ===> (13)
12 TYPE = gvsv ===> 13
14 TYPE = const ===> 15
17 TYPE = unstack ===> 18
AFTER:
1 TYPE = leave
2 TYPE = enter
3 TYPE = nextstate
4 TYPE = foreach
5 TYPE = list
6 TYPE = rv2av
7 TYPE = gv
8 TYPE = gv
9 TYPE = lineseq
10 TYPE = nextstate
11 TYPE = print
12 TYPE = stringify
13 TYPE = concat
14 TYPE = concat
15 TYPE = const
16 TYPE = rv2sv
17 TYPE = gv
18 TYPE = const
Code generation
How the code generation works is documented in codegen.c. Here I will
only show the result of the code generation step as obtained using the
-DG flag:
./perl -DG -e 'for (@ARGV) { print "arg: $_\n"; }'
Instructions of codeseq (0x96439b8):
0x9648270: enter
0x9648280: nextstate
0x9648290: pushmark
0x96482a0: gv
0x96482b0: rv2ax
0x96482c0: go
0x96482d0: enteriter redo=label1 next=label2 last=label3
label5:
0x96482e0: iter
0x96482f0: instr_cond_jump label4
label1:
0x9648300: nextstate
0x9648310: pushmark
0x9648320: const
0x9648330: gvsv
0x9648340: concat
0x9648350: const
0x9648360: concat
0x9648370: print
label2:
0x9648380: unstack
0x9648390: instr_jump label5
label4:
0x96483a0: leaveloop
label3:
0x96483b0: leave
0x96483c0: instr_end
0x96483d0: (finished)
Memory consumption
The change causes the memory consumption to go down. For example the
maximum amount of memory used by 'perldoc perlfunc' is after 2.91MB
compared to 2.98 that (a 2% reduction), or for example 'perlcritic
autodie/exception.pm' now uses 13.5MB compared to 14.4 before (a 6%
reduction). The memory usage reduction is caused by a smaller optree, the
intermediate code is never generated as long as the code isn't
run. Thus amount of memory reduction would be large for large program
with many modules of which most parts are not used.
Performance
There doesn't appear to be any significant performance difference
as measured by perlbenchapps. perlbench gives mixed result and is
probably too artificial to be indicative of anything.
I find this a bit strange because I would expect it to perform better
given that it uses less memory, and that the instructions are
sequential in memory, which should result in better cache performance,
but apparently this doesn't matter.
A = /home/gerard/perl/inst/blead-compile/bin/perl5.12.0
B = /home/gerard/perl/inst/blead/bin/perl
lcov
A: 4.42(3)*100
B: 4.392(15)*100
lwp-request
A: 1.791(14)*100
B: 1.807(13)*100
net_http_get
A: 1.13(2)*100
B: 1.11(2)*100
net_http_get_config
A: 6.08(3)*10-2
B: 6.25(2)*10-2
perlcritic
A: 1.0875(13)*100
B: 1.084(3)*100
perldoc
A: 9.72(10)*10-1
B: 9.65(4)*10-1
spamassassin
A: 1.726(3)*100
B: 1.773(9)*100
spamd
A: 2.733(16)*101
B: 2.76(2)*101
Easier optimisations
Optimisation which require a different instruction sequence are easier
to make. Previously
this would require manipulating the optree and its linked list, which
is rather difficult. Now it can simply be done by generating a
different instruction list. For making the sort work in-place is
reduced from about 40 lines of code to about 10 lines.
Influence on existing code
Bugs
There are no known new bugs, caused by the change. Possible area where
new bugs are introduced is that compiling happens, when the subroutine
is first needed, and part of the environment might "leak" into the
compiling, for example initially the PL_taiting wasn't set to false,
causing constant folded expression to be tainted if the first time the
subroutine was called was inside a tainted expression.
B modules
Of course things which analyse the opcode tree using B::OP will notice
some changes in the tree, but the new tree should be easier to
analyse. Some of the B modules which can be used to output things
about the optree in execution order are obviously gone/broken.
Incompatible changes
The only change concerning Perl code is the removal of '"goto" into a
construct', but that is already deprecated in 5.12 thus that shouldn't be
problem.
The ops are no longer made into a linked list. Also the optree is a
bit simplified. Also the peephole analyser is gone, instead after the
optree is build finish_optree is called which does some additional
checking and makes the tree thread-safe. When a sub is called if it
doesn't already have a instruction list in CvCODESEQ the optree is
compiled using the compile_cv/compile_op function. The pp_xxx
functions now have two arguments ppflags and pparg (most functions
ignore them and still use PL_op for arguments) and instead of
returning the next op it should set the next instruction if it isn't
the default RUN_SET_NEXTINSTRUCTION.
Future improvements
There are many more things which can be moved from semantic analysis
step to the intermediate code generation step (i.e. from op.c to
codegen.c), for example a 'scope' still generate an 'enter' and a
'leave' op, this could be merged to only one op which is split into
two when generating the instruction list, another example would be the
sort code, where determining whether a optimised comparison can be
used is determined in the intermediate code generation. Doing this
makes the syntax tree more simple and thus easier to analyse, also it
would probably make the code simpler.
Use of instruction flags
To pass arguments to the instruction the op is still used, as an
experiment I changed the pp_helem instruction to use the instruction
arguments instead of the op, with the idea that it should perform
better because it wouldn't have to read from the op memory. But my
benchmarks don't show any significant difference. But if this would be
combined with LLVM the improvement might be significant because the
compiler could optimize away the passed constant.
LLVM
The idea with LLVM was that the instruction list could be compiled by
the LLVM compiler on the fly to one real function, thus optimising
away the overhead of function calls, and possibly optimising the away
code based on the constant values (flags) passed to the instruction
functions. Unfortunately I couldn't figure out how to do this.
Opcode/instruction list
The list of opcode in opcode.pl functions both for the opcode and for
instruction list. Although there are a lot of operations where have
exactly one opcode and one instruction (for example 'add', 'helem',
'open'), there are also instruction-only opcodes and op-only opcodes
(for example 'instr_jump', 'instr_end' which are instruction-only and
'foreach' which op-only). It might be a good idea to separate these
two (especially since I would expect the difference to grow).
Comments (1)
It doesn't even pass its own tests. How on on earth is this considered complete?
Test Summary Report
-------------------
re/qr.t (Wstat: 0 Tests: 5 Failed: 1)
Failed test: 5
op/stash.t (Wstat: 0 Tests: 31 Failed: 0)
TODO passed: 26
porting/manifest.t (Wstat: 0 Tests: 9463 Failed: 1)
Failed test: 9463
../cpan/B-Debug/t/debug.t (Wstat: 65280 Tests: 4 Failed: 3)
Failed tests: 1-3
Non-zero exit status: 255
Parse errors: Bad plan. You planned 7 tests but ran 4.
../cpan/B-Lint/t/lint.t (Wstat: 0 Tests: 29 Failed: 22)
Failed tests: 1-7, 12-20, 22-27
../dist/B-Deparse/t/deparse.t (Wstat: 256 Tests: 85 Failed: 2)
Failed tests: 61, 85
Non-zero exit status: 1
Parse errors: Bad plan. You planned 84 tests but ran 85.
../ext/B/t/concise-xs.t (Wstat: 65024 Tests: 1658 Failed: 774)
Failed tests: 1, 4-6, 10, 17, 21-22, 24, 26, 37, 41-43
272-281, 283-653, 655-659, 661-673, 676-743
797, 801, 804-810, 812-831, 833-842, 849
861-867, 869-870, 872-878, 881-882, 887-902
905, 909-911, 917-921, 925, 928-930, 932
934-935, 950-954, 960, 962-964, 968, 975
980, 984-986, 988, 990-992, 996-1000, 1006-1009
1022-1024, 1030, 1032-1034, 1038-1039, 1045
1050, 1054-1055, 1060-1069, 1071, 1075-1076
1078, 1080, 1089, 1094, 1096-1103, 1105-1107
1109-1117, 1119, 1121, 1124, 1126, 1131
1133-1142, 1144-1151, 1153, 1156, 1158
1163-1168, 1172, 1174, 1176-1177, 1179-1181
1184-1185, 1195, 1198-1214, 1216-1224, 1226-1231
1233-1234, 1237-1256, 1259, 1264-1265, 1267
1269, 1272-1275, 1277-1278, 1280-1284, 1286
1289, 1292, 1352, 1379, 1398-1399, 1460
1469, 1472, 1474, 1538, 1657
Non-zero exit status: 254
../ext/B/t/concise.t (Wstat: 65280 Tests: 30 Failed: 5)
Failed tests: 1-5
Non-zero exit status: 255
Parse errors: Bad plan. You planned 157 tests but ran 30.
../ext/B/t/f_map.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/f_sort.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_check.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_concise.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_constants.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_misc.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_samples.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_sort.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_specials.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/optree_varinit.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/showlex.t (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
../ext/B/t/terse.t (Wstat: 65280 Tests: 1 Failed: 1)
Failed test: 1
Non-zero exit status: 255
Parse errors: Bad plan. You planned 16 tests but ran 1.
../ext/POSIX/t/posix.t (Wstat: 0 Tests: 66 Failed: 0)
TODO passed: 11
../lib/utf8.t (Wstat: 65280 Tests: 46 Failed: 0)
Non-zero exit status: 255
Parse errors: Bad plan. You planned 150 tests but ran 46.
Files=1803, Tests=349370, 701 wallclock secs (119.20 usr 30.02 sys + 644.36 cusr 179.50 csys = 973.08 CPU)
Result: FAIL
*** Error code 41
Some of those failures are quite serious. It also fails a slew of assertions when assertions are enabled.