Spanish Localization of the Perl Core Documentation - Grant Report #3
Fri, 07-Sep-2012 by
Alan Haggai Alavi
edit post
*Enrique Nell and Joaquin Ferrero reported:*
> Project status: [https://docs.google.com/spreadsheet/ccc?key=0AkmrG_9Q4x15dC1MNWloU0lyUjhGa2NrdTVTOG5WZVE](https://docs.google.com/spreadsheet/ccc?key=0AkmrG_9Q4x15dC1MNWloU0lyUjhGa2NrdTVTOG5WZVE)
>
> CPAN distribution: [http://search.cpan.org/~enell/POD2-ES-5.16.1.02/](http://search.cpan.org/~enell/POD2-ES-5.16.1.02/)
>
> Project host: [https://github.com/zipf/perldoc-es](https://github.com/zipf/perldoc-es)
>
> This month we updated `POD2::ES` from v5.16.0 to v5.16.1. It was a swift operation, since only one document changed (`perlhist.pod`). Translated files not included in the distribution were also updated to v5.16.1.
>
> As mentioned in a previous report, v5.16.0 fixed the issues that prevented displaying correctly the extended characters in UTF-8 encoded files using `perldoc` in the console, so we have switched back to UTF-8. In order to do so, we configured our translation tool (OmegaT) to generate utf-8-encoded output, and modified the post-processing script to check if the `=encoding utf8` command is present in the pod documents, and add it in case it is missing.
>
> Code changes were implemented in `ES.pm` to fix some issues related to `POD2::Base`:
>
> * **`search_perlfunc_re()`**. Since `perldoc` does not decode the text string returned by this method, it couldn't filter the `perlfunc.pod` introduction. As a result, it couldn't find the documentation section requested by the user when using `perldoc` with the `-f` switch. To fix it, the offending characters (those with diacritic marks) were removed from the string returned by this method.
>
> * **`print_pod()`**. To align the actual behavior of `POD2:Base` with the functionality described in this module's documentation we had to cover the case where this method is called as a class method and the case where it is called as an object method.
>
> * **`print_pods()`**. As for `print_pod()`, we covered the two possible call types for this method (class method and object method). `print_pods()` is used only to call `POD2::Base`'s `print_pods()` method.
>
>
> New files added this month:
>
> * **`perlmod`**
> * **`perlmodinstall`**
> * **`perlhacktut`**
> * **`perlclib `**
>
>
> Reported source pod bugs:
>
> 2012/07/25 : [rt.cpan.org #78577] [RT #114260] `perlfaq2.pod` internal link error
> 2012/08/14 : [perl #114486] `perlvar.pod`, line 1337, bad filehandle
>
> ## Stats
>
> The word count increased because we discovered 5 additional pod files that are generated automatically during setup (`perlapi`, `perlintern`, `perlmodlib`, `perltoc`, and `perluniprops`). We only added four of them, since `perltoc.pod` is generated automagically from the source pods that are being translated.
> Status of our v5.16 track (currently v5.16.1):
>
> * Total documents: 167 (100 in core docs)
> * Total words: 945,786 (495,813 in core docs)
> * % translated: 31.21% (46.46% of core docs)
> * % reviewed: 11.60%
>
>
> ## Tools
>
> We added more functionality to the post-processing script (see below), new utilities, and renamed some scripts for consistency.
>
> ### New scripts
>
> * **`get_pods.pl`**
> Gets all the pod files from the Perl distribution and adds them to the OmegaT project source folder
>
> * **`test_pod2es_setup.pl`**
> Checks the `POD2::ES` setup
>
> * **`compare_pods.sh`**
> Shows source and target pods side-by-side to easily spot formatting differences
>
>
> ### Changes in `postprocess.pl`
>
> Since we changed the output encoding to UTF-8, now the script checks if the `=encoding utf8` command (or an equivalent command for a different encoding) is present. It adds the command if it's not present, or updates it if the specified encoding is different from UTF-8.
>
> We also added a switch to generate HTML diff files that show word-oriented differences.
>
> These reports provide a clear view of the changes made by the reviewer, and can be useful to learn the style and terminology used in the project. Here is an example (not exactly the same view, since we had to translate the HTML to Google Docs format):
>
> [https://docs.google.com/document/d/1wIzsIk9PS1OPz7ixbdG8KhVLvnRS9fnn4K-mpLDIvi4/edit](https://docs.google.com/document/d/1wIzsIk9PS1OPz7ixbdG8KhVLvnRS9fnn4K-mpLDIvi4/edit)
>
> Hopefully, this will help to improve global consistency.
>
> On the other hand, these changes can be collected to generate a list of frequent errors/changes in order to do an automated first pass before delivering the files to the reviewers.
>
> ### Other actions
>
> * Apertium Offline is now available in our server OmegaT setup. This provides an alternative machine translation engine (the other one is Google Translate) that can be used to get a first draft.
>
> * During this update we filed the issues found in the distribution files after the post-processing stage in a spreadsheet added to our project status document. We will use this as a checklist for subsequent updates. All these issues fall in two categories:
>
> * Double-spaces (e.g., after question mark, between full stop and opening parenthesis, etc.). In some cases the problem stems from a segmentation error; our customized segmentation settings cover most of the cases quite well, but not all of them. We should be able to fix most of these issues by adding a few regular expressions to postprocess.pl.
>
> * Broken links: Links with long names are split in two or three lines. This issue has to do with Pod::Tidy. We must check if there is any way to prevent it.
>
> ## Future work
>
> * `perlcheat` didn't change in v5.16.1 (i.e., still contains the bug we reported), but we will include an amended ES version in an upcoming release, later this month.
>
> * We are working on a terminology extraction tool. It will be ready in the next few weeks. A new module will be added to CPAN.
>
> * Add to the tools section the code that generates our project status spreadsheet and a `Readme` containing tool usage guidelines.
>
> * Check how to get an ES `perltoc.pod` generated automatically.
Comments (0)