Wed, 26-Apr-2023 by
We have another grant application from [John Napiorkowski](https://metacpan.org/author/JJNAPIORK) you may recall being involved in getting Perl Bindings for Tensor Flow, and the inimitable [Will Braswell](https://metacpan.org/author/WBRASWELL) also involved in TF Perl, amongst many other projects integral to Perl and RPerl. This time they have paired up to create an AI that speaks in Perl: Perl-GPT.
John Napiorkowski & Will Braswell
The budget for this project is $8,800 USD
This grant proposal is for *phase 1* only of the development of PerlGPT, a large language model (LLM) comparable to ChatGPT 3.5/4.0 or Stanford Alpaca, and trained on Perl-related content only.
PerlGPT will be based on Meta's LLaMa language models, with all new components implemented in Perl where possible and released as 100% free-and-open-source software (FOSS), unlike ChatGPT and other proprietary LLM systems.
Phase 1 consists of training a 7B input language model using Perl-related request/response pairs curated from Perl Monks, Stack Overflow, GitLab, GitHub, and other public Perl-specific data sources. Phase 1 will deliver an LLM capable of generating pure-Perl source code in collaboration with a Perl programmer.
Optionally, MetaCPAN may be upgraded to include a live running instance of the phase 1 PerlGPT LLM. In addition to the traditional keyword search query producing a list of CPAN distributions, PerlGPT will enable MetaCPAN to also accept free-form search queries and plain-English written questions, which then spawns a new interactive chat session with specific Perl module suggestions and custom source code examples.
Each phase includes additional request/response training pairs encompassing the modern best practices related to that phase's specific tasks.
* Phase 1 7B model; train on Perl; generate new CPAN dists
* Phase 2 13B model; train on XS, C, C++, FFI::Platypus; generate new optimized CPAN dists
* Phase 3 33B model; train on software development; generate refactored existing CPAN dists
* Phase 4 65B model; train on computer science & Perl internals; generate refactored interpreter, new language features
**Benefits to the Perl Community**
Benefits for each of the 4 project phases are listed below; please be aware this grant proposal is for phase 1 development ONLY.
*Phase 1* implements PerlGPT v1.0 and benefits the Perl community by enabling the creation of new pure-Perl libraries and applications on CPAN. PerlGPT v1.0 is trained on pure-Perl source code examples and high-quality POD documentation from CPAN, GitLab, GitHub, and Bitbucket. All versions of PerlGPT are further trained on plain-English technical discussions pertaining to their respective feature set, gathered from Perl Monks and Stack Overflow. For example, a programmer may want to create a new Perl API for some 3rd party web platform such as the Amazon cloud. The programmer can write a plain-English description of their desired API features and functionality for accessing the Amazon cloud. They can also specify design decisions such as whether or not to utilize an MVC framework like Catalyst or Mojolicious, and they can even start stubbing out some Perl classes and subroutines with comments included where source code should be added. PerlGPT v1.0 will work with the programmer to iteratively implement their desired Amazon cloud API in pure Perl, including a full-coverage test suite and POD documentation, etc. Once the API is working well enough for public release, the PerlGPT v1.0 LLM can even help the programmer execute the correct Dist::Zilla commands to build and upload the software to CPAN. Finally, many new independent Perl projects can be created with access to the Amazon cloud, thanks to the Perl API created and uploaded to CPAN with the help of PerlGPT v1.0! The same benefits apply to any other non-Amazon API which somebody may want to create in Perl, or to any pure-Perl library or application that a programmer can think up. The sky is the limit! PerlGPT v1.0 dramatically increases the effectiveness and efficiency of creating new pure-Perl software.
Optionally, MetaCPAN may be upgraded to PerlGPT v1.0, which will benefit the Perl community by allowing anyone on the Internet to utilize the power of interactively generating Perl source code as described above. PerlGPT v1.0 is trained on CPAN's high-quality POD, as described above. For example, a user with little or no previous Perl experience may visit MetaCPAN and type a query asking ""how can I swap all the rows and columns in an Excel spreadsheet?"" The MetaCPAN website launches an interactive chat session, where the PerlGPT LLM starts out by selecting one of the most popular and stable CPAN distributions for Excel spreadsheet manipulation, and quickly writes a well-commented prototype application as the user watches in real time. PerlGPT also provides a plain-English explanation of the generated source code, and continues to chat with the user if they have any further questions or requests to change the source code in any way. PerlGPT offers to run the custom Perl source code via the user's platform of choice, including PerlBanjo, WebPerl, or Perl on a cloud system such as Amazon. PerlGPT also offers to help the user download and install their own local copy of Perl, if they so choose. In the matter of just a few short minutes, a new Perl user has succeeded in implementing their first custom Perl application and solving their own real-world problem! This use of PerlGPT on MetaCPAN can serve as both a recruiting tool for new-to-Perl developers, as well as a retention tool for experienced Perl developers who want to leverage the power of CPAN to the fullest extent possible.
*Phase 2* implements PerlGPT v2.0 and will benefit the Perl community by enabling the creation of new optimized Perl libraries and applications on CPAN. PerlGPT v2.0 is trained on source code and documentation related to XS, C, C++, and FFI::Platypus. For example, a programmer may want to create a Perl video game, however pure Perl is not fast or efficient enough for the shading and texturing subroutines to run during gameplay. Similar to the work flow of PerlGPT v1.0, the programmer writes a plain-English description of exactly how they want the video game to be implemented, including use of specific graphical libraries, etc. After the entire project is written and running slowly in pure Perl, then the programmer and PerlGPT v2.0 can work together to determine the specific shading and texturing subroutines which need to be rewritten into XS and/or C or C++. The programmer instructs the PerlGPT v2.0 LLM if there are any design preferences such as whether or not to use FFI::Platypus, etc. PerlGPT generates the new performance-optimized shading and texturing subroutines as directed, and runs the already-existing test suite to ensure the optimized code is functionally equivalent to the un-optimized pure-Perl code. Once the optimized code is working as desired, the video game can be released onto CPAN as with PerlGPT v1.0. The same benefits apply to any Perl project which values performance optimization, such as scientific algorithms, machine learning, video rendering, etc.
*Phase 3* implements PerlGPT v3.0 and will benefit the Perl community by enabling the refactoring and upgrading of already-existing CPAN distributions based on software engineering principles and modern best practices. PerlGPT v3.0 is trained on educational texts related to software development principles and modern Perl best practices. For example, a programmer may want to upgrade one of their existing CPAN distributions to utilize the Amazon cloud API described in phase 1 above. In order to achieve this, several outdated or obsolete Perl software components will need to be either removed or substantially rewritten. Similar to the work flows of PerlGPT v1.0 and v2.0, the programmer writes a plain-English description of precisely how they want their code to be upgraded, starting with the desired new behavior of which Amazon cloud features should be utilized in what ways. The programmer goes on to specify which Amazon cloud API calls should be utilized in which subroutines, wherever possible, to assist PerlGPT v3.0 in most effectively achieving the upgrade and refactor goals. They can even add comments into their existing Perl source code, directing PerlGPT v3.0 to focus on certain software components or design strategies. PerlGPT generates the requested upgrades, including all appropriate changes to the existing test suite to reflect the new Amazon cloud features. The upgraded test suite is executed, to ensure both the new features and the remaining old features are all working correctly. Once the upgraded code is working as desired, the new Amazon-enabled Perl distribution can be released onto CPAN. The same benefits apply to any Perl project in need of refactoring or upgrading, such as converting old Perl code to use the new Corinna object-oriented framework, or adopting somebody else's abandoned CPAN distribution and fixing a back-log of bug reports.
*Phase 4* implements PerlGPT v4.0 and will benefit the Perl community by enabling the Perl Steering Council (PSC) and the Perl 5 Porters (P5P) to introduce major new features and upgrades into the Perl interpreter itself. PerlGPT v4.0 is trained on Perl internals documentation, P5P technical discussions, and the Perl interpreter's own source code, written in C89 and C macros. For example, the PSC may request a P5P developer to refactor the Perl interpreter's threading subsystem and thereby introduce new Perl language features pertaining to parallel and asynchronous programming. PSC requests these changes to be made without breaking the Perl interpreter's long-venerated backward compatibility with already-existing Perl software. In order to achieve this, several other subsystems of the Perl interpreter will be affected, and will need to be upgraded accordingly. Similar to the work flows of PerlGPTv3.0 and earlier, the P5P developer writes a plain-English description of the new desired threading behavior, along with source code samples showing the new threading keywords or other syntax. The programmer can also add comments into the Perl interpreter source code, indicating refactoring design decisions or stubbing out new threading behaviors. PerlGPT v4.0 generates the requested upgrades to the Perl interpreter, including modifications to both the normal C source code as well as the C macro source code. PerlGPT v4.0 also adds new tests to the Perl interpreter test suite. PerlGPT does not modify existing tests, in order to maintain backward compatibility with previous versions of the Perl interpreter and already-existing Perl software. The upgraded test suite is executed, to ensure the new threading features work correctly and all previously-existing Perl features are unchanged. Once complete, the new Perl interpreter may be released as a testing version or prototype for PSC and P5P to review, and eventually released as a stable Perl version. The same benefits apply to any change or upgrade to the Perl interpreter, such as expanding the new Corinna OO framework, upgrading 'use constant' to allow arrays & hashes, or adding native exceptions.
1. An implementation of the PerlGPT v1.0 large language model based on the LLaMa language model, configured and built using Dist::Zilla.
2. A comprehensive Perl test suite with automatically-provable coverage for 100% of the PerlGPT v1.0 LLM, using Test2 from CPAN.
3. A carefully-written and explanatory collection of documentation with coverage for 100% of the PerlGPT v1.0 LLM, using normal POD fully compatible with CPAN.
4. A small collection of user-friendly example Perl applications, using PerlGPT v1.0 LLM components to effectively showcase this project.
5. A public GitLab repository with all source code and components of the PerlGPT v1.0 LLM, including unstable or experimental components.
6. A public CPAN distribution with all stable source code and components of the PerlGPT v1.0 LLM.
7. A public DockerHub (or equivalent) repository with all stable source code and components of the PerlGPT v1.0 LLM, along with all dependencies, ready to run out-of-the-box.
The PerlGPT v1.0 LLM does NOT yet support anything other than pure Perl source code. These features will be addressed in future grant proposals.
This grant proposal specifically does NOT include PerlGPT phase 2 or beyond, such as XS or C or Perl internals, which is far beyond the scope of a single grant and will be addressed in future proposals.
We will generate the PerlGPT language model by training a LLaMa foundational language model. This training will be done using a combination of both manually-curated and automatically-selected request/response pairs, collected from public websites and data sources. We will not utilize any proprietary data or request/response training sets taken from other proprietary language models, such as OpenAI's ChatGPT, etc.
Most of the technical details of how to train the PerlGPT language model can be found in the following papers:
[Training Language Models to Follow Instructions with Human Feedback, 3-4-2022](https://arxiv.org/pdf/2203.02155.pdf)
[Teaching Large Language Models to Self-Debug, 4-11-2023](https://arxiv.org/pdf/2304.05128.pdf)
[LLaMA: Open and Efficient Foundation Language Models, 2-27-2023](https://arxiv.org/pdf/2302.13971.pdf)
Total development time is estimated at 3 to 6 month, with the normal disclaimer about the difficulty of predicting software project durations.
During the first work cycle of approximately 1 to 2 months, curate and implement the initial PerlGPT v1.0 training data set.
During the second work cycle, run the LLM training procedure and implement the Perl test suite.
During the third work cycle, write the Perl documentation and implement the Perl example applications.
If a fourth work cycle is required, continue until the public releases on CPAN and DockerHub are complete.
This grant is deemed complete when all the above-listed deliverables are reviewed and accepted by the official TPF-assigned grant manager.
We are both professional CPAN authors ( [John Napiorkowski](https://metacpan.org/author/JJNAPIORK) and [Will Braswell](https://metacpan.org/author/WBRASWELL) ) , with a current total of 106 CPAN distributions between the two of us.
I would be very interested to see this project take off. Googling information will only get you so far, having the decades worth of knowledge and experience from various online resources accessible through standard language would be invaluable. Plain English searches that can understand intent and context as well as the memory to "bounce ideas" and concepts off of in a back and forth dialog would be a huge boon to any programmer. The same tools would be very useful in housekeeping efforts for perl repositories. This seems like the logical evolution for the perl language given its roots in natural language constructs. I’m very excited to see where this goes.
I too would be interested in seeing this develop. I'm just starting to play around with these tools, and having something in Perl would be very friendly.
I think this would be a good project. And makes us relevant to new things.
I support this project being funded.
I believe this will be very useful when I think of all the times I'm searching for an answer to a problem and finding it difficult to condense a complex query down to a small set of carefully crafted keywords for searching. With typical searches, the more detail one provides as keywords usually ends up resulting in a high number of useless matches.
A system that could understand my intent from a plain English description would be extremely valuable, along with the further dialog to refine and guide me to the desired solution.
This would be an amazing project! It would be extremely useful to speed up Perl development dramatically, and it would help move Perl back to the forefront. This would make Perl more accessible to new users, and really turbocharge our toolchain. The GPT technology is a fresh opportunity for Perl to leap forward in usefulness and awareness.
This is a super interesting project that projects an application with a greater scope than the famous chat-gpt. It is not only interesting from the point of usability but also from the learning point of amateur programmers like me, who basically learn by replicating the codes of the available codes of free-software applications. The most determining point for me is the fact that the project adopts the free software philosophy. I am already looking forward to seeing the evolution of this great project.
This is exciting to be able to have a chat-gpt tool for perl and I
could use this in my daily work to help get work done quicker.
This proves perl never died!
This is greatly needed in the Perl community. I would love to see this come to fruition! Please, let's make this a reality.
This is a critically important genre for Perl to be dabbling!
Great project!! Hope it gets funded! Looking forward to seeing it working
Its very unclear if training an AI on copyleft code (GPL for example) is permissible. AGPL is almost certainly not.
As such i would be more specific about which licenses are being used when training the AI.
This is such an important project for Perl. Well done, both applicants, for putting together the Grant Application and very much hope you achieve the necessary funding!
it is important to complete this kind of projects in Perl, I can not help so much but maybe i will send my little help to the necessary funding.
I would love to see funding for this!
This would be an amazing project! I believe this will be very useful and also would make Perl more accessible to new users!!!
I agree with the general idea for this and think a usable code writing module is worth the $8k. I think it’d be more interesting in the form of an LSP that reads pods to implement code vs a plain chatbot. I also question any OSS tie in with AWS vs basing an interface on docker or even qemu (which seems to be popular on osx). In the end, I suppose whatever github action can be used to test the code would be a good enough implementation for an initial stage though.
I'd like to see this project move forward. I've only poked at ChatGPT a couple times for programming questions and found it useful. I would love to see the results from a GPT project with a strong focus on Perl. I would break grateful if the project could augment and speed up areas from coding to documentation and help identify modules that already do something I might be currently developing for my business or personal use.
This is huge!!!
Awesome project! It could be very beneficial for the Perl community and I'd like to see it's funded.
This sounds awesome. I hope you guys get the grant!
Hope to see this project up soon and excited as to how this will benefit PERL overall.
Although I am a newbie to Perl programming, I am always alerted by latest developments accompanying Perl since I love Perl so much. I utilized chatGPT for few times (not related to programming) ,and am waiting for what Perl has to offer in the future 🙂.
That's an amazing idea! Perl is such a good language to work with text, we need more tools to interact with large language models.
As an oldschool bioinformatics programmer who has always loved using Perl, I'm incredibly excited about the prospect of PerlGPT. The fact that it will be based on Meta's LLaMa language models and will be 100% free-and-open-source software (FOSS) is truly impressive and demonstrates the power of collaboration.
I think Phase 1 of the project, which involves training a 7B input language model using Perl-related request/response pairs from public Perl-specific data sources, is a fantastic start. I can't wait to see how PerlGPT will generate pure-Perl source code in collaboration with Perl programmers.
The possibility of upgrading MetaCPAN to include a live running instance of the phase 1 PerlGPT LLM is simply amazing. As someone who has spent countless hours searching through CPAN for the perfect module, the idea of using free-form search queries and plain-English written questions to spawn an interactive chat session with specific Perl module suggestions and custom source code examples is truly revolutionary.
Overall, I'm incredibly excited about this proposal and can't wait to see how PerlGPT will transform the Perl ecosystem. The future looks bright for Perl and bioinformatics, and I'm honored to be a part of such an innovative and collaborative community.
Looking forward to getting this funded.
I am Happy to hear that PerlGPT project is developing for perl programmers who's project running on perl.
I support this proposal because Perl needs to make inroads into the machine learning and AI space. The proposers appear to have excellent credentials and history for tackling this task. It would be nice to see this be but only the first of many such projects leveraging Perl.
Expanding into AI is an obviously a good utility for Perl. The integration into MetaCPAN would certainly help in searching for existing modules, helping new to Perl programmers be more productive. Both these points increase the attractiveness of Perl.
Sounds like a very useful project to me. I've asked Perl-programming questions of ChatGPT, but its responses are far from optimal, as it wasn't really trained in Perl programming. Having PerlGPT would be a boon for those seeking to avoid "reinventing the wheel" by finding what existing Perl modules are truly applicable to their problem.
That's an amazing idea! This project could be a great way to explore the
capabilities of Perl in working with large language models, Looking forward to getting this funded.
This project is incredibly intriguing as it has a much wider scope than the well-known chat-gpt. It is not only valuable in terms of its practical use but also from an educational perspective for beginners.
It also helps someone who has spent countless hours searching through CPAN for the perfect module, this feature would be a game-changer.
I wish this project will be completed soon.
Anything that can help me implement Perl faster.
I find this project especially valuable for intermediate Perl developers like myself who commonly use other programming languages.
Perl has proven to be a great supporting tool for my work. However, I can’t spend as much time exploring Perl in-depth as I wish.
I look forward to this project as a great learning opportunity.
This is a great idea! This could help introduce people to Perl in a way that they (maybe) wouldn’t have otherwise been. ++
This is a brilliant idea! Keeping perl on the cutting edge. John Napiorkowski & Will Braswell are the correct people to make this happen.
I would love to see this been implemented in the Perl community! This could help introduce people to Perl in a way that they (maybe) wouldn’t have otherwise been. +++
This seems to be an amazing project!Perl is a beautiful language and this project will make Perl more interesting than ever before.Hope to see it getting funded.
I question whether all the comments here are from genuine (non-bot) people, given they are all very similar, some word-for-word identical to others.
well that's a rather tacky thing to post on someone's grant application...
As a budding Perl developer, I find the idea of the PerlGPT to be an invaluable asset for the community. The potential of having such a model like GPT-4 adapted for Perl is both exciting and intriguing. I strongly believe that this tool can greatly improve the learning curve for beginners, and at the same time, provide seasoned developers with a smart assistant for their tasks. I wholeheartedly support this initiative and am looking forward to its development. Best of luck with the proposal!
Kindly grant as early as possible. We are waiting!!!!!
I am so excited about the promise of dramatically increasing the effectiveness and efficiency of creating new
pure-Perl software. Truth be told, the "TIMTOWDI" principle is sometimes if not
oftentimes bringing me in frustrations because I have to spend time weighing if something is better than another way.
It seems to me that I can offload bulk of the thinking or weighin gpart to PerlGPT.
The suggestion of MetaCPAN being upgraded to PerlGPT is also promising. With a myriad of modules that often have overlapping
functionalities but come with their own nuances about their respective strengths and weaknesses, it looks like having a GPT-like
functionality for it would help immediately narrow down the selections on which modules should we be choosing for our respective
The combination of AI and Perl holds immense potential to advancing technology and driving Innovation. I think it will be a great asset for the perl community, really excited to see this.