Samantha McVey has made progress on her grant to improve the robustness of Unicode support in Rakudo. She is working in the following repos: https://github.com/samcv/UCD, https://github.com/samcv/Unicode-Grant.
Here are a few highlights from her complete blog post.
The script tests the contents of each grapheme individually from the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.
Previously we only checked the total number of ‘.chars’ each for the string as a whole. Obviously we want something more precise than that, since the test specifies the location of each of the breaks between codepoints. The new code checks that codepoints are put in the correct graphemes in the proper order. In addition we also check the string length as well.
This new test uses a grammar to parse the file and generally is much more robust than the previous script.
I have some currently unmerged tests which need to wait to be merged, although sections of it are complete and are being incorporated into the larger Unicode Database Retrofit, reusing this code.
I have written grammars and modules to process and provide data on the PropertyValueAliases and PropertyAliases. They will be used for testing that all of the canonical property names and all the property values themselves properly resolve to separate property codes, as well as that they are usable in regex.
As part of my grant work I am working on making Unicode property values distinct per property, and also on allowing all canonical Unicode property values to work.
I've also started adding some documentation to my Unicode-Grant wiki with information about what is enclosed in each Unicode data files; there are a few other pages as well. This wiki is planned to be expanded to have many more sections than it does currently."
MAJ