Enrique Nell - Language Lead Joaquin Ferrero - Tech Lead
Spanish is the third most commonly used language on the Internet after English and Mandarin. It is also the second most studied language and second language in international communication, after English, in the world. Currently, there are 400 million native speakers and Spanish is the official language in 21 countries. However, the number of contributions to CPAN from the Spanish-speaking community is much lower than expected, considering these figures.
Our goal is to translate the Perl core documentation into Spanish, in order to make it available to a wider public through the POD2::ES distribution. In this process, we are using and developing sustainable procedures that reuse previous translations and provide a quick update for each new Perl release.
We are requesting a grant to boost our work on POD2::ES.
The availability of translated Perl documentation will bring more Perl programmers to the community and will increase the number of CPAN contributions.
The tools and procedures developed for this project can be used to translate Perl into other languages.
The resulting materials (e.g., translation memories, glossaries, style guides) can be used as a starting point for related projects, like the translation of Perl books, Perl 6 docs and the documentation of CPAN modules into Spanish.
Increase of the percentage of translated & reviewed documents, targeting 60% of translated docs and 25% of reviewed docs (current figures are 43% translated & 3% reviewed)
Documented procedures and tools that can be reused in other projects.
At the time of this writing, the latest version of Perl 5 is Perl v5.14.2. Its documentation is comprised of 189 documents, with a global word-count of 924,435 words. This translation volume, at a typical freelance translation rate (much cheaper than that of a translation agency), would cost well over 120,000 EUR (not counting tasks like DTP, project management, etc.), and it would take approx. 2 man-year (including revision).
Since this is volunteer-work, it's not as fast-paced as it would be desirable (after 16 months we have reached a translation status of more than 40% of the total documentation), but it is the best you will get while waiting for a real improvement of the available machine translation technology.
We use Computer-Assisted Translation (i.e., translation memory) technology since the beginning of the project, having in mind project sustainability and reusability: Each time a new Perl version is released, translators update the pod files and only have to work on new/changed strings. This reuse strategy ensures that the translated documentation will follow closely the Perl English documentation as it evolves. We use the Perl version numbering for each release, to state unambiguously the correspondence of the version of the original documents and that of the translated documents.
After evaluating several tools, we finally decided to use OmegaT, a convenient CAT tool that is actively developed, but we also follow current industry standards (e.g. TMX, the standard translation memory format), so contributing to the project does not require using a particular tool.
We have split the documentation in core documents on one hand, and perldeltas & readmes on the other, to give priority to the most popular documents.
The published POD2::ES distributions only include fully revised documents, which can be viewed using the following command:
perldoc -L ES 'document'
Translated (and unpublished) documents are available in the project's github repository: PerlDoc-ES at Github.
Back in 2006, Joaquín Ferrero was involved in a previous effort that was later abandoned, as many other attempts for different languages. During YAPC::EU 2009 in Lisbon, Enrique Nell proposed relaunching the project. The authors of the present grant application met with the goal of launching a translation project of the Perl core documentation, and kept discussing the idea for some time.
The first release (5.12.3.01) of POD2::ES was published on CPAN in February 4, 2011. On July 16th of that same year, we released the first 5.14.1 version of POD2::ES, one month after the release of Perl 5.14.1, after updating the translated documents to the new Perl version. The first 5.14.2 version was released on October 6, 2011, only 10 days after the release of Perl 5.14.2. For each version upgrade, we were able to reuse easily the work done for previous versions.
Current status: 42% translated. The statistics are available in the following public Google Docs spreadsheet: PerlDoc-ES.Traducción
As long as new Perl versions are released, the project will be alive. For each new Perl version, the corresponding translation percentage will be higher (after the update process, of course).
Reaching the percentages mentioned above will take ~6 months (rough estimate).
Currently, we are 12 to 24 months behind the source (English) documentation, but two new members joined the team recently and we expect to increase the speed in the coming months.
Enrique Nell (aka zipf, aka @blasgordon) has a degree in Physics from Universidad Autónoma de Madrid, but has been working in the software localization industry since 1994. His main interests are natural language processing, data mining and statistics. He has contributed several modules to CPAN and regularly attends Perl events. Enrique translated Act into Spanish and he is the current maintainer of the Spanish translations of Padre and Kephra. He also contributed to Google Code-in 2011 as a mentor for translation tasks issued by The Perl Foundation.
Joaquín Ferrero (aka explorer) studied Software Engineering at Universidad de Valladolid. He has been using Perl since 2003, while working as a programmer in companies and public organizations. During these years he has reported bugs in several CPAN modules. He attends regularly Madrid.pm meetings. Since 2005 Joaquín is the main moderator of the PerlenEspanol.com website, a forum that provides support to the worldwide Spanish-speaking Perl community. Back in 2006 he was a member of the second attempt of translating the Perl documentation into Spanish (the perlspanish project hosted on SourceForge, now abandoned). During YAPC::EU 2009, Joaquín joined Enrique Nell's BOF to kick-off a new PerlDoc-ES project.
Manuel Gómez received his MS degree in Computer Science in 1991 from Universidad Politécnica de Madrid. After over 10 years of professional experience in Research and Development departments, he received a PhD in Computer Science in 2002 from Universidad Politécnica de Madrid. He is now an Associate Professor of Computer Science at Universidad de Granada. His research interests are probabilistic graphical models and decision analysis. Some of the journals where he has published his research papers are Computers and Operations Research, European Journal of Operational Research, Statistics & Computing, International Journal of Approximate Reasoning, Medical Decision Making, Decision Support Systems and Omega. His teaching interests include Programming Fundamentals, Simulation Systems, Data Mining and Bayesian Networks inference algorithms.
Every time someone proposes a translation grant, the same question arises. How will this translation be maintained?
It really doesn't seem useful to translate the docs once and then have *that* be the canonical doc source for Spanish users, even after they've long gone stale.
We always follow the current version.
Whenever a new version rolls out, we use the translation memory to update, complete new strings and fuzzy matches, and keep working on the new version (and stop working on the previous one; i.e., we don't plan to do a complete translation of 5.14.2). This way, in each version we get a little closer to 100% translation, and eventually we will have a parallel documentation in Spanish.
As chair of the Grants Committee I try not to comment or ask any questions. Nevertheless, I will abstain regarding this one (as I know Enrique), so I feel more comfortable commenting.
Is the TMX already available? Will it be available? Is there any terminology behind this? Is it available already? Some TBX file or so?
Also, looking to the repository, you are using OmegaT for the translation task. Are you developing any software that might help other languages for a similar task, or just translating as a common translator would do?
The TMX is available. In fact, we have several translation memories in the repository. The clean one contains the fully reviewed translations that are already published on CPAN; we also keep work memories from the team members in the github repository, and the Padre and Kephra memories as a reference.
Regarding terminology, for the time being we've been using the clean memory to check terms that have been validated already, but will add a glossary soon (we are working on a terminology extraction tool more focused to machine translation explorations). We discuss new terms using perlglossary.pod as our guide, so perhaps we should give priority to that document.
As for the software that might help other languages, we have already published on github a couple of scripts for postprocessing and memory merge tasks (which will be improved soon), and plan to develop a program to compare the documentation of different Perl versions (i.e., current stable and current development) that shows new documents, documents removed, documents with the most/least changes, etc., to guide us in our translation strategy. This is a long term project and, who knows, more goodies may appear some day in our repository.
We would like to stress the fact that all these procedures and tools can be applied to any language. So far we haven't worked on Spanish-specific tools (although we may do that as well, of course).
A small side note - perlglossary.pod got some updates recently, but the English version really could do with someone reviewing/updating and cleaning out old references.
If anyone is interested please fork:
In general, I like the idea and particularly the CAT approach, as I can see how that would help keep thing up to date (assuming volunteers to respond to deltas).
However, I'm not sure a grant is the right vehicle for a project that is effectively open-ended and ongoing. If a grant is necessary now to boost the project, what happens when the grant is over? Or put differently, what will be different about the sustainability after the grant is over?
The deliverables seem fairly arbitrary. I'd be more supportive if they more clearly achieved some critical mass or milestone. What does the 18% increase from 42% to 60%? E.g. I care a lot more about the new OO docs than I do about perlhist or perldeltas.
(I do think that translating perldeltas is useful, but they are about a quarter of the pod directory in the perl source and not what I would prioritize.)
Without more clarity about what specifically is getting translated and why that will position the project for long term success without grant support, I can't support this grant.
We have been working on this project without any support for some time now, so work will not cease after eventually completing a grant.
The purpose of applying for this grant is to commit ourselves to spend more time on the project for a few months in order to gain momentum and produce a larger translation memory (i.e., more reviewed segments and approved terms for reference), and better materials (procedures, guidelines, etc.) for the team, which also started to grow.
You can check our status spreadsheet referenced in the grant application to see which documents we are working on.
The new doc on object-oriented programming (perlootut) will be one of our top priorities as soon as Perl 5.16.0 hits CPAN (we would rather wait until a final version is released). We have already translated a couple of the old OO docs, however, so we may publish them anyway in our 5.14 track.
As we mention in our application "We have split the documentation in core documents on one hand, and perldeltas & readmes on the other, to give priority to the most popular documents."
Yes, this is as open-ended project. As we all know, Perl 5 is here to stay and Perl 6 is growing healthy and strong, so our goal is developing a framework to create/maintain localized versions as Perl evolves.