Gerard Goossen
[hidden email]
$ 5,000
This project is to change the Perl 5 optree build process into a step where the parser generates an Abstract Syntax Tree (AST) and a step where the executable code in the form of a list of function pointers (probably with arguments in some form) is generated from the AST.
Because this project is refactoring of a major part of the Perl 5 core, a good benchmark of Perl 5 to measure the effect on performance of the changes would be required.
Changing the Perl 5 optree build process into two steps does have the following advantages:
Changing the target for the code generation will be easier. Especially using LLVM could benefit from the more flexible code generation step.
The Abstract Syntax Tree will be simpler than the current optree, and could be easier analysed.
The separation of the generation of the Abstract Syntax Tree and the code generation increases the maintainability.
Having two steps separate steps for the code generation might introduce some overhead, but the increased flexibility should compensate for this and might even improve performance.
Optimisations which don't fit very will into the current model of one function per OP would be possible.
code generation could be just-in-time, i.e. at the first time a sub is called. Saving the cost of generating code for not-used subs, which is might also make it advantages to do more aggressive optimisations.
The change does not require any change to Perl code (except of course some B:: modules) nor would it require changes to XS code.
The Perl 5 optree build process is changed to an Abstract Syntax Tree and code generation step.
=item
Documentation of the Abstract Syntax Tree.
Documentation of the code generation from the Abstract Syntax Tree.
A benchmark for Perl 5
Investigate/try using LLVM as backend for Perl 5.
For example the current optree for program for(@ARGV) { print "arg: $_\n"; }
is:
{ 1 TYPE = leave ===> DONE TARG = 1 FLAGS = (VOID,KIDS,PARENS) PRIVATE = (REFCOUNTED) REFCNT = 1 { 2 TYPE = enter ===> 3 } { 3 TYPE = nextstate ===> 4 FLAGS = (VOID) LINE = 1 PACKAGE = "main" } { 20 TYPE = leaveloop ===> 1 FLAGS = (VOID,KIDS) { 8 TYPE = enteriter ===> 18 FLAGS = (LIST,KIDS,STACKED) { TYPE = null ===> (4) (was pushmark) FLAGS = (SCALAR) } { TYPE = null ===> (7) (was list) FLAGS = (LIST,KIDS,MOD) { 4 TYPE = pushmark ===> 5 FLAGS = (SCALAR,MOD) } { 6 TYPE = rv2av ===> 7 TARG = 1 FLAGS = (SCALAR,KIDS,REF,MOD) { 5 TYPE = gv ===> 6 FLAGS = (SCALAR) } } } { 7 TYPE = gv ===> 8 FLAGS = (SCALAR) } } { TYPE = null ===> (20) FLAGS = (VOID,KIDS) { 19 TYPE = and ===> 20 FLAGS = (VOID,KIDS) OTHER ===> 9 { 18 TYPE = iter ===> 19 FLAGS = (SCALAR) } { 21 TYPE = lineseq ===> 0 FLAGS = (VOID,KIDS) { 9 TYPE = nextstate ===> 10 FLAGS = (VOID) LINE = 1 PACKAGE = "main" } { 16 TYPE = print ===> 17 FLAGS = (VOID,KIDS) { 10 TYPE = pushmark ===> 11 FLAGS = (SCALAR) } { TYPE = null ===> (16) (was stringify) FLAGS = (SCALAR,KIDS) { TYPE = null ===> (11) (was pushmark) FLAGS = (SCALAR) } { 15 TYPE = concat ===> 16 TARG = 3 FLAGS = (SCALAR,KIDS,STACKED) { 13 TYPE = concat ===> 14 TARG = 2 FLAGS = (SCALAR,KIDS) { 11 TYPE = const ===> 12 FLAGS = (SCALAR) SV = PV("arg: "\0) } { TYPE = null ===> (13) (was rv2sv) FLAGS = (SCALAR,KIDS) { 12 TYPE = gvsv ===> 13 FLAGS = (SCALAR) } } } { 14 TYPE = const ===> 15 FLAGS = (SCALAR) SV = PV("\n"\0) } } } } { 17 TYPE = unstack ===> 18 FLAGS = (VOID) } } } } } }
This would be changed to an Abstract Syntax Tree of:
{ TYPE = block FLAGS = (VOID) { TYPE = forloop FLAGS = (VOID) { TYPE = null } { TYPE = rv2av FLAGS = (SCALAR,REF,MOD) { TYPE = gv FLAGS = (SCALAR) REF = *ARGV } } { TYPE = block { TYPE = print FLAGS = (VOID) { TYPE = stringify FLAGS = (SCALAR) { TYPE = const FLAGS = (SCALAR) SV = PV("arg: "\0) } { TYPE = rv2sv FLAGS = (SCALAR,KIDS) { TYPE = gvsv ===> 13 FLAGS = (SCALAR) } } { TYPE = const FLAGS = (SCALAR) SV = PV("\n"\0) } } } } } }
Which would be converted to the code:
pp_enter() pp_nextstate(state) pp_pushmark() pp_gv(gv) pp_rv2av(SCALAR|REF|MOD) pp_gv(gv) pp_enteriter() loop: pp_iter() pp_and(&leaveloop) pp_nextstate(state) pp_pushmark() pp_const("arg: ") pp_gvsv(gvsv) pp_concat() pp_const("\n") pp_concat() pp_print(VOID) pp_unstack() goto loop leaveloop: pp_leaveloop() pp_leave()
But for example the print could be more aggressively compiled into something like:
... pp_pushmark() res1 = concat(PV("arg: "), gvsv) res2 = concat(res1, "\n") pp_push(res2) pp_print(VOID) ...
There is a lot of flexibility in this project, because a lot of time has to be spent benchmarking/profiling/optimizing the code.
Writing a benchmark for Perl 5.
Leave the optree building process as it currently is (including linked-list
generation and peephole optimisation). But add a code generation step, which
just uses the op->op_next
linked-list to convert them to pp_xxx(op)
calls
Adjust B modules and other modules which use the hook into runops and similar things.
Directly generate the code without the intermediate op->op_next
linked-list.
Move the peephole optimisations and constant folding to the code generation step.
Simplify the Abstract Syntax Tree, by moving things the logic to the
code generation step, like the for
loop in the example above.
Optimise everything using the benchmark.
Use LLVM as backend.
One month full-time, starting about a month after acceptance of this proposal.
About Two and a halve years ago I start Perl Kurila, an experimental fork of Perl 5, through it I have any about every part of the Perl internals, specifically I have done some optree rewriting. And some optree analysis as part of the Perl5-to-Perl5 converter using the annotated optree generated by Perl with Misc Attribute Decoration (MAD).