Grant Report : Robust Perl 6 Unicode Support - June 2017
Tue, 06-Jun-2017 by
Mark A Jensen
edit post
Samantha McVey has made progress on her
[grant](http://news.perlfoundation.org/2017/04/grant-proposal.html) to
improve the robustness of Unicode support in Rakudo. She is working in
the following repos: [https://github.com/samcv/UCD](https://github.com/samcv/UCD),
[https://github.com/samcv/Unicode-Grant](https://github.com/samcv/Unicode-Grant).
Here are a few highlights from
[her complete blog post](https://cry.nu/perl6/grant-status-update-1/).
* "In Roast there is a
[new version of GraphemeBreakTest.t](https://github.com/perl6/roast/pull/267).
> The script tests the contents of each grapheme individually from the
GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.
> Previously we only checked the total number of ‘.chars’ each for the
string as a whole. Obviously we want something more precise than that,
since the test specifies the location of each of the breaks between
codepoints. The new code checks that codepoints are put in the correct
graphemes in the proper order. In addition we also check the string
length as well.
> This new test uses a grammar to parse the file and generally is much
more robust than the previous script.
* I have some currently unmerged tests which need to wait to be
merged, although sections of it are complete and are being
incorporated into the larger Unicode Database Retrofit, reusing this
code.
* I have written grammars and modules to process and provide data on
the [PropertyValueAliases](ftp://ftp.unicode.org/Public/9.0.0/ucd/PropertyValueAliases.txt)
and [PropertyAliases](ftp://ftp.unicode.org/Public/9.0.0/ucd/PropertyAliases.txt).
They will be used for testing that all of the canonical property names and all the property
values themselves properly resolve to separate property codes, as well
as that they are usable in regex.
* As part of my grant work I am working on making Unicode property
values distinct per property, and also on allowing all canonical
Unicode property values to work.
* I've also started adding some documentation to my Unicode-Grant wiki
with information about what is enclosed in each Unicode data files;
there are a few other pages as
well. [This wiki](https://github.com/samcv/Unicode-Grant/wiki/All-Unicode-Files)
is planned to be expanded to have many more sections than it does
currently."
MAJ
Comments (0)