2012Q2: Grant Proposal: Perl module for Linked Data

Thu, 03-May-2012 by Alberto Simões edit post

<dl> <dt id="Name:">Name:</dt> <dd> Tope Omitola Kjetil Kjernsmo will assist Tope with Perl and CPAN </dd> <dt id="Amount-Requested:">Amount Requested:</dt> <dd> How much is your project worth? $500 </dd> </dl> <h2 id="Synopsis">Synopsis</h2> The Semantic Web community has developed a vocabulary named VoID, which is a World Wide Web Consortium Interest Group Note, a de facto standard: <a href="http://www.w3.org/TR/void/">http://www.w3.org/TR/void/</a>. The goal of this project is to generate such descriptions, partly automatically, partly by hand-maintained descriptions, using <a>RDF::Trine</a>. <h2 id="Benefits-to-the-Perl-Community">Benefits to the Perl Community</h2> There is an active community around Semantic Web-technologies with Perl. This community believes that generating VoID descriptions is a very important undertaking, as it is an important part of the Linked Data technology stack and should be deployable in Linked Data services. A module such as this will be highly useful for the users of Perl in web development, and is likely to drive new users to Perl as it becomes a viable alternative amongst few for deploying a comprehensive Linked Data service. Despite its importance, the Perl+RDF community has not found the manpower to write this module. The community has therefore felt that the assistance from Tope Omitola is very welcome as he is a researcher in this field and has been involved, with other researchers from the University of Southampton, with the creation of similar modules for PHP. The module will be supported by the Perl+RDF community (see <a href="http://www.perlrdf.org/">http://www.perlrdf.org/</a>) and integrated with <a>RDF::LinkedData</a>. The VoID descriptions themselves can be exposed by existing modules <a>RDF::LinkedData</a> and <a>RDF::Endpoint</a> or by other systems that expose RDF. Dataset publishers can use VoID descriptions for datasets' maintenance, administration, and hosting. Clients can use VoID descriptions to discover, query, crawl, and index datasets, navigate them, get an idea of the type of data available, and optimize queries on them. <h2 id="Deliverables">Deliverables</h2> A module named <code>RDF::Generator::Void</code>. The module has been started on Github <a href="https://github.com/tope/RDF-Generator-Void">https://github.com/tope/RDF-Generator-Void</a> by the primary proposer and has received patches from two members of the Perl+RDF community. This module will be uploaded to CPAN. <h2 id="Project-Details">Project Details</h2> With the rise in the usage and deployment of Semantic Web, especially of Linked Data, in organisations, industries, and governments, a good service is needed that can be used to construct, automatically, service level descriptions of these Linked Data modules. This service should also be useful for Linked Data developers and maintainers to help them add additional data of these services, manually. This project aims to build a Perl module <code>RDF::Generator::Void</code> that can be used to set up such services, and Perl+RDF community members are already committed to integrating this module with existing modules. The module will generate the following automatically, of a dataset or a sparql endpoint using <a>RDF::Trine</a>: <dl> <dt id="void:triples">void:triples</dt> <dd> The total number of triples contained in the dataset. </dd> <dt id="void:entities">void:entities</dt> <dd> The total number of entities that are described in the dataset. </dd> <dt id="void:classes">void:classes</dt> <dd> The total number of distinct classes in the dataset. </dd> <dt id="void:properties">void:properties</dt> <dd> The total number of distinct properties in the dataset. </dd> <dt id="void:distinctSubjects">void:distinctSubjects</dt> <dd> The total number of distinct subjects in the dataset. </dd> <dt id="void:distinctObjects">void:distinctObjects</dt> <dd> The total number of distinct objects in the dataset. </dd> <dt id="void:documents-void:subset-void:Linkset">void:documents, void:subset, void:Linkset</dt> <dd> express the set of foreign links in a dataset. </dd> <dt id="void:feature">void:feature</dt> <dd> used for expressing certain technical features of a dataset, such as its supported RDF serialization formats. </dd> <dt id="void:sparqlEndpoint-void:dataDump-void:exampleResource">void:sparqlEndpoint void:dataDump void:exampleResource</dt> <dd> Further descriptions of the dataset. </dd> </dl> A full VoID description also contains other properties that in most cases must be hand-maintained, i.e. in a practical application, added from a file or through configuration. We will not enumerate this properties, but the module must be able to accept such statements. Furthermore, the module must have methods to prompt a regeneration of the description, both a forced update and an update that will first check if the data has changed. <h2 id="Inch-stones">Inch-stones</h2> <ul> <li>Learn sufficient Perl </li> <li>Get an overview of the API provided by <a>RDF::Trine</a> </li> <li>Get an idea of what can be done with <a>Any::Moose</a> </li> <li>Write a set of test suites </li> <li>Create a constructor that can take a <a>RDF::Trine::Model</a> as the basis of computing the description and a model to add the description to. </li> <li>Add a method to use to add hand-maintained statements. </li> <li>Add a method to return an <a>RDF::Trine::Model</a> with the description. </li> <li>Create test data </li> <li>Write more tests </li> <li>Create the code to generate the description based on triple counts. </li> <li>Create a method to unconditionally regenerate the description. </li> <li>Create a method to regenerate the description only if the model's etag has changed. </li> <li>Package the module for CPAN distribution </li> </ul> <h2 id="Project-Schedule">Project Schedule</h2> How long will the project take? When can you begin work? It will take 2 months. Begin work middle of 16 May 2012 Work will happen between other projects, so active time spent is much shorter. Learning to use Perl and auxillary tools are expected to take two weeks. Then, constructor and initial methods is expected to take one week. Writing tests is then one week. Generation of the description is the main effort and can take up to two weeks. Wrapping up and review the code is finally expected to take two weeks before release. <h2 id="Completeness-Criteria">Completeness Criteria</h2> The module will be released to CPAN, either by the proposer or by other members of the Perl+RDF community. The completeness will be judged by the ability of the module to fulfil the goals stated in the detailed project description and by passing a test suite developed. <h2 id="Bio">Bio</h2> Who are you? What makes you the best person to work on this project? Tope Omitola: Research Fellow at the University of Southampton. Experienced Semantic Web / Linked Data developer, experienced in developing semantic web dicovery services. Amongst his research interests are provenance tracking of Linked Data, which involves an extension of VoID called voidp of which he is the primary author. Has previously been involved with writing a PHP module similar to the one proposed in this project. He is new to Perl. Kjetil Kjernsmo: Ph.D. Research Fellow at the University of Oslo. Has 16 years of Perl experience and several modules on CPAN. Is an active member of the Perl+RDF community and organized the first International Semantic Web with Perl hackathon in London in March 2011. Active member of Dahut.pm and deputy board member of Oslo.pm. Will help Tope get up to speed with Perl and help with packaging and the test suite.

Category: Grants

Comments (2)

[ 1067 ] | Tue, 08-May-2012 by davidgolden

I'm glad to hear that there is an active Perl community for Semantic Web technologies and I hope that Perl continues to be at the forefront.

That said, I see two yellow flags with this grant request.

First up are the opening inchstones ('Learn sufficient Perl' .. 'Get an idea of what can be done with Any::Moose'). That isn't what I think the grant fund should be used for -- that should be foundational pre-work before applying for a grant.

The second yellow flag is that an "active" community can't find manpower to accomplish the inch stones independently and feel a very modest grant will make the difference. If this is important and useful and in demand, I don't understand why it hasn't already been written, particularly if there is already a PHP module to use as a basis for the work.

[ 1068 ] | Mon, 21-May-2012 by kjetilkjernsmo

David,

Thanks for the important comments. Basically, Semantic Web is by many seen as an academic endeavor, and to some extent it is, as the community here is driven by academics on their spare time. There's not a lot of users beyond ourselves at this time, even though many aspects of Semantic Web is taking off in the Enterprise. Enterprisey stuff doesn't make the Semantic Web however, what we really want is the wide deployment amongst a large number of developers. The way I see it, completing a useable Perl stack is a key factor in making the Semantic Web take off outside of academia and the Enterprise. So, there's only so much we can do as academics, this is not research, it is spare time work. Doing this is pretty straightforward, it could be done in a few days, the trouble is that none of us has a few days to spare.

That's why every hand matters, and that's why we are so happy Tope stepped up to do this bit. The Perl he needs to learn is not a lot, so it is not a risk factor. I think of it as a good opportunity to recruit another good person to our community. I hope TPF can help us do that.

2012Q2: Grant Proposal: Perl module for Linked Data

Comments (2)

Sign in to add comment

About TPF

Categories

Popular Tags

Recent Tags

Get Perl

Links