Spanish Localization of the Perl Core Documentation - Grant Report #3

No Comments

Enrique Nell and Joaquin Ferrero reported:

Project status: https://docs.google.com/spreadsheet/ccc?key=0AkmrG_9Q4x15dC1MNWloU0lyUjhGa2NrdTVTOG5WZVE

CPAN distribution: http://search.cpan.org/~enell/POD2-ES-5.16.1.02/

Project host: https://github.com/zipf/perldoc-es

This month we updated POD2::ES from v5.16.0 to v5.16.1. It was a swift operation, since only one document changed (perlhist.pod). Translated files not included in the distribution were also updated to v5.16.1.

As mentioned in a previous report, v5.16.0 fixed the issues that prevented displaying correctly the extended characters in UTF-8 encoded files using perldoc in the console, so we have switched back to UTF-8. In order to do so, we configured our translation tool (OmegaT) to generate utf-8-encoded output, and modified the post-processing script to check if the =encoding utf8 command is present in the pod documents, and add it in case it is missing.

Code changes were implemented in ES.pm to fix some issues related to POD2::Base:

  • search_perlfunc_re(). Since perldoc does not decode the text string returned by this method, it couldn't filter the perlfunc.pod introduction. As a result, it couldn't find the documentation section requested by the user when using perldoc with the -f switch. To fix it, the offending characters (those with diacritic marks) were removed from the string returned by this method.

  • print_pod(). To align the actual behavior of POD2:Base with the functionality described in this module's documentation we had to cover the case where this method is called as a class method and the case where it is called as an object method.

  • print_pods(). As for print_pod(), we covered the two possible call types for this method (class method and object method). print_pods() is used only to call POD2::Base's print_pods() method.

New files added this month:

  • perlmod
  • perlmodinstall
  • perlhacktut
  • perlclib

Reported source pod bugs:

2012/07/25 : [rt.cpan.org #78577] [RT #114260] perlfaq2.pod internal link error
2012/08/14 : [perl #114486] perlvar.pod, line 1337, bad filehandle

Stats

The word count increased because we discovered 5 additional pod files that are generated automatically during setup (perlapi, perlintern, perlmodlib, perltoc, and perluniprops). We only added four of them, since perltoc.pod is generated automagically from the source pods that are being translated. Status of our v5.16 track (currently v5.16.1):

  • Total documents: 167 (100 in core docs)
  • Total words: 945,786 (495,813 in core docs)
  • % translated: 31.21% (46.46% of core docs)
  • % reviewed: 11.60%

Tools

We added more functionality to the post-processing script (see below), new utilities, and renamed some scripts for consistency.

New scripts

  • get_pods.pl
    Gets all the pod files from the Perl distribution and adds them to the OmegaT project source folder

  • test_pod2es_setup.pl
    Checks the POD2::ES setup

  • compare_pods.sh
    Shows source and target pods side-by-side to easily spot formatting differences

Changes in postprocess.pl

Since we changed the output encoding to UTF-8, now the script checks if the =encoding utf8 command (or an equivalent command for a different encoding) is present. It adds the command if it's not present, or updates it if the specified encoding is different from UTF-8.

We also added a switch to generate HTML diff files that show word-oriented differences.

These reports provide a clear view of the changes made by the reviewer, and can be useful to learn the style and terminology used in the project. Here is an example (not exactly the same view, since we had to translate the HTML to Google Docs format):

https://docs.google.com/document/d/1wIzsIk9PS1OPz7ixbdG8KhVLvnRS9fnn4K-mpLDIvi4/edit

Hopefully, this will help to improve global consistency.

On the other hand, these changes can be collected to generate a list of frequent errors/changes in order to do an automated first pass before delivering the files to the reviewers.

Other actions

  • Apertium Offline is now available in our server OmegaT setup. This provides an alternative machine translation engine (the other one is Google Translate) that can be used to get a first draft.

  • During this update we filed the issues found in the distribution files after the post-processing stage in a spreadsheet added to our project status document. We will use this as a checklist for subsequent updates. All these issues fall in two categories:

  • Double-spaces (e.g., after question mark, between full stop and opening parenthesis, etc.). In some cases the problem stems from a segmentation error; our customized segmentation settings cover most of the cases quite well, but not all of them. We should be able to fix most of these issues by adding a few regular expressions to postprocess.pl.

  • Broken links: Links with long names are split in two or three lines. This issue has to do with Pod::Tidy. We must check if there is any way to prevent it.

Future work

  • perlcheat didn't change in v5.16.1 (i.e., still contains the bug we reported), but we will include an amended ES version in an upcoming release, later this month.

  • We are working on a terminology extraction tool. It will be ready in the next few weeks. A new module will be added to CPAN.

  • Add to the tools section the code that generates our project status spreadsheet and a Readme containing tool usage guidelines.

  • Check how to get an ES perltoc.pod generated automatically.

Leave a comment

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.

About this Entry

This page contains a single entry by Alan Haggai Alavi published on September 7, 2012 8:53 PM.

Adding tests to and refactoring the perl debugger - Grant Report #2 was the previous entry in this blog.

Fixing Perl5 Core Bugs: Report for Months 29 & 30 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.38