I have been working on a new YAML Pure Perl Parser, already on CPAN as YAML::PP. It aims to parse YAML 1.2.
The existing YAML frameworks in Perl all lack important features and don't support YAML 1.2. I will continue development of YAML::PP so that it's able to parse all valid syntaxes (with some minor exceptions). I will complete the Loader to support tags. I will add a Dumper and Emitter.
I will add test cases to the cross framework YAML Test Suite and continue developing the YAML Test Matrix to compare all frameworks.
While JSON has become popular as a simple format to exchange data, there are still a lot of use cases for YAML. Imagine how verbose an Ansible Playbook would look in JSON. Comments are an important feature, and Aliases come in very handy sometimes.
While PyYAML still only supports YAML 1.1, it has at least support for safe loading (See below). Python Ruamel aims to support 1.2.
I find it very unfortunate for Perl, that there is no support for YAML 1.2, for safe loading and for booleans.
It would be a nice opportunity for Perl to have a framework that supports all that.
Since the YAML Test Suite is supposed to become the number one source to write tests in any language, it can promote the new Perl framework.
Since I'm aiming for a portable implementation, this framework might also be easily ported to Perl 6, which currently has no full support for YAML, although there is some development going on.
The current state of YAML in Perl is as follows:
Based on YAML 1.0. It can't do trailing comments and has problems with a lot of valid 1.1 and 1.2 syntaxes.
Based on libyaml and the most recommended module. It supports YAML 1.1. It diverges from the spec for several edge cases.
Supports YAML 1.0. It has problems with a lot of valid YAML 1.1 and 1.2 syntaxes.
YAML.pm and YAML::XS have no possibility to disable loading into objects. That means if you load an untrusted YAML file, it can be a security hole. YAML::Syck supports disabling that via "LoadBlessed".
The three mentioned modules don't support booleans. If you need to dump your data into JSON or let it be validated, booleans get lost (turned into 1 or 0). Only YAML::XS provides a limited way of keeping booleans when roundtripping.
The mentioned modules can only be used as complete Loaders. There is no possibility to put your own Loader on top of a parser.
You can check which test cases these modules are passing or failing: YAML Test Matrix
I have been going over a number of RT tickets for YAML.pm at the end of 2016, creating and merging Pull Requests from patches and writing Pull Requests myself.
I'm working a lot with Ingy döt Net, one of the creators of YAML, and Felix Krause, developer of NimYAML, on the YAML Test Suite and on RFCs for creating YAML 1.3.
I created the YAML Test Matrix to show the results of the tests for a growing number of YAML frameworks, based on Ingy's Docker image for YAML Editor.
I started to implement my own parser YAML::PP in 2017, and it currently passes most of the tests with the exception of Flow Style. The loader can already load YAML documents that the parser can parse. It supports booleans and aliases, but no tags yet.
I'm currently transforming it into a tokenizer which allows correct syntax highlighting, making it also easier to spot errors.
I want it to be able to do roundtrips including comments at some point.
At the Perl Toolchain Summit 2017 in Lyon I have been working together with Ingy to create a concept of a new API for YAML loading. The goal is to integrate YAML::PP into that API.
Ingy and I started to implement the API in YAML::Perl, using YAML::PP as a backend.
I also started to implement the new Loader API in Perl 6, currently using the libyaml binding originally written by Curt Tilmes as a backend.
A couple of features are still missing from the parser
This is the biggest part. Flow Style is not indent based, and some rules are different than in block style. (I estimate 40h.)
This is also a major part, because stacking of parser events is necessary until the parser knows if it's a mapping key or a node. (30h)
Currently no information about line and column is saved. (20h)
Implement the current parser in a way that makes it easy to add support for YAML 1.3
My talk and my published slides will explain why YAML currently is difficult to implement. I also gave this talk at the German Perl Workshop in Hamburg.
I can start to work on this immediately and almost full time over the next two months.
I release YAML::PP with the features implemented I mentioned above. The parser shall pass most of the tests in YAML Test Suite, with the exception of edge cases. Since the spec is often not very clear, there are some cases where it is unclear what should be the correct behavior, or what behavior actually makes sense. These edge cases are usually not relevant for real use cases and are easy to avoid. I will look at other frameworks and find out the most common behavior.
The Emitter should be able to transform every test input into valid YAML. The style (quotes/block scalar, spaces/newlines etc.) might still differ from the test suite.
The Loader/Dumper API, and especially the Parser and Emitter API, might not be completely fixed at the end of this grant. Ingy can me help me out here, supposed he's got time, and I need potential user feedback.
Ingy also offered to review the work.
I appreciate new test cases, bug reports, patches and co-maintainers, and I want to keep maintaining this module in the future.
I wrote my first Perl code in 1998 and have been in touch with the Perl Community since about 2001.
I already have two parsing modules on CPAN.
One is HTML::Template::Compiled, one of the fastest (and still feature rich) pure perl templating modules that gains its speed from compiling to perl code.
The other is Parse::BBCode, which is unique among the Perl BBCode modules, in that it provides a parse tree, it allows addition of own tags, it tries to correct invalid BBCode instead of simply dying, and it's fast.
YAML is a bit more complicated to parse, because it's indentation based, but I like solving programming puzzles.
I do a lot of pair programming with Ingy and I'm also in contact with Felix Krause, so I have two people available who know the Spec.
If you are wondering about terminology, here is a short explanation:
Loading YAML can be divided into two steps.
The Parser parses a Stream and returns a list of parsing events. The Constructor then takes these events, decides about numbers, tags, booleans and aliases/anchors and constructs a data structure.
Vice versa, Dumping YAML can be divided into deconstructing and emitting. The Deconstructor creates a list of emitter events from a data structure. The Emitter creates a YAML Stream from these events.
If you keep these things separate, it allows you to use the language independent Test Suite to test your parser. It also makes debugging and maintaining easier. Also you can use a different parser backend, for example a libyaml based one.