Grant Proposal: Raku Ecosystem

Thu, 30-Jul-2020 by Jason A. Crome edit post

## Raku Ecosystem - Name: tony o'dell - Amount Requested: $12,000 ## Synopsis Redesign the raku/zef ecosystem to be robust and to make easier the distribution submission for the raku ecosystem. ## Benefits to the Raku Community Currently the process for maintaining the ecosystem in raku is either uploading to cpan, which comes with its own set of limitations as cpan was not designed to handle the way raku uses distributions (the same distribution name can be used multiple times and pared down by the consumer by using :auth and/or :ver). The other way this is handled right now is through a github repo containing a file that the distribution authors must update in order keep the ecosystem fresh, which comes with its own set of challenges and barrier to entry for the user. This project would create an ecosystem that is both friendly to the end user, provides secure access and storage to distribution consumers, and promote the development of distribution from raku distribution authors. ## Deliverables - Fault tolerant ecosystem - Provide hooks for author tooling ## Project Details The primary deliverable of this project would be a fault tolerant ecosystem that is both consumable via zef (so, a zef plugin) and a website similar to metacpan for browsing and finding packages. Guidelines for distribution authors and tooling for distribution authors to test the quality of their upload. The secondary deliverable would be to create an expandable API for further development (testing, quality checks, health checks, etc). It is limited in scope to the design accomodation and not the implementation of what the hooks will or could be used for. The operating costs of this project will be paid for by donations from the community or out of my pocket. Donations exceeding operating costs will be used to further develop tooling and expand on the secondary deliverable. The domain for this project has been suggested to use zef.pm but this is not set in stone and this proposal is open for using a different domain. ## Project Schedule The primary deliverable of this project is reckoned to take about two months complete. This includes the administration and automation of server maintenance, storage space, and writing the necessary code to store/save/retrieve these distributions both from a zef plugin (or another CLI) and from the WWW. The secondary deliverable is likely to take closer to a month to design and build. ## Bio I am Tony O'Dell. I have written a good number of raku modules and have been writing in perl for about 15 years. I have sys admin experience and am initial commiter and co-author of zef. Prior to writing software as a full time job my primary area of expertise was in data warehouse management and design, and statistics. # Addendum The ecosystem in the context of this grant is meant to mean a repository where packages can be downloaded/consumed by raku package managers and processes. The structure of the repository will be similar to CPAN, maintaining the ability to be mirrored, with modifications to handle the added complexity of `:ver`, `:auth`, `:api` in raku. The API in context of this grant is meant to enable some process that allows authors to write to the repository and upload their dists (more analogous to PAUSE). The functions of the API will be around user creation and uploading distributions to the ecosystem. This project is in no way meant to replace modules.rakudo.org or metacpan. It will make modules.rakudo.org's job much easier in displaying/searching for modules. ## Milestones 1. Design and architecture of dist storage on servers, permissions, indexing, and mirroring 2. Build out for distribution with a `sandbox` environment and tests 3. Writing tooling around the API to manage users and allowing users to manage their dists available in the repository. This milestone includes the zef plugin mentioned below 1. Registering a user 2. Manage user information including user deletion (GDPR) 3. Managing packages 4. Uploading via zef plugin ## Reasons for not mending what's there - CPAN 1. .. isn't designed to handle multiple modules of the same name. Something that very easily happens in Raku 2. .. indexing names doesn't allow you to find the `ACME-Test-0.0.0.tar` module that is uploaded by 35 different authors tried to upload `ACME` 3. The amount of code surrounding mending is the same regardless of "rewrite" or "mend," divorcing the two because the specs and needs are different makes sense. - Github 1. .. the repo with the index in it is a list of github repos that may or may not exist 2. .. this is clunky and is a poor experience for both the consumers and authors 3. .. in order for this to be fully indexed you need something running to maintain a translation from repo name to dist name 4. The level of effort to make this a pleasant experience for both consumers and authors is large. It involves a somewhat fragile architecture of indexing services, retrieving repositories for display on things like modules.raku.org has a lot of overhead, searching for modules in a package manager is tedious and it's not for a fault in the package manager or the modules.raku.org design. ## Out of Scope This is a list of things that are out of scope for this grant. ### Search Engine This grant is not meant to be a search engine. It is not meant to replace metacpan or modules.rakudo.com. The ecosystem file structure will be indexed and searchable by any front end or package manager. ### Package Management Zef will persist and tooling will be built around enabling this gateway through the zef cli as part of milestone 3. ## Expected Costs None of what is below is designed or set in stone but is merely meant to roughly show that the cost of the redesign can and will be made to be minimal both for the managing body and the operating costs moving forward. ### Ecosystem The expected costs for hosting the ecosystem itself are minimal. If S3 were to be used and we calculate costs from that: A mirror of the current ecosystem in github is roughly 65MB without supporting prior versions. The cost for S3 per GB is is $0.023. So, the cost of hosting on S3 is less than $1/mo. Using the S3 calculator the ecosystem could be downloaded 157 times per month and still have operating costs of less than $1/mo. S3 may be found unsuitable but the point being made here is that there are inexpensive options. Another option is to use a service user on CPAN and create a separate index file that can index the SHA1 hashed modules.tar.gz. This is hacky, less secure, and difficult to consume for things like modules.rakudo.org. ### API AWS Lambda is an explorable option. With _very_ conservative numbers and guesses: The compute price is roughly $0.0000002083 / GB-s with 128MB of memory allocated. If it takes 20s to upload and index a package then the operating costs for running this function 100k/mo can be kept below $1/mo. (the calculation from https://aws.amazon.com/lambda/pricing/ - .0000002083 * 20 * 100000 * (128/1024) = $0.05)

Category: Grants

Comments (10)

[ 2202 ] | Thu, 30-Jul-2020 by dakkar

sounds good, but please try to build something where the logic is in the client / interfaces, not the package repository itself.
the best aspect of CPAN is that the package repository itself is just a directory tree, so setting up a custom one is trivial, and tools don't need special logic to deal with it

[ 2207 ] | Thu, 30-Jul-2020 by tonyo

hi dakkar - responding here too (after we've chatted in irc) just to provide some extra details.

the base directory structure has to change a little bit to help accommodate raku's use of :ver:auth on module names but for the most part providing a mirror to the ecosystem should be very similar to cpan.

the api is not a required component for downloading or searching for modules and, as you pointed out, will be more analogous to PAUSE.

[ 2203 ] | Thu, 30-Jul-2020 by jjmerelo

I think it's an excellent idea. The split nature of the ecosystem is a lot of trouble, and anything with an API that helps the creation of an ecosystem of tools is going to be awesome. I wholeheartedly support this.

[ 2204 ] | Thu, 30-Jul-2020 by dakkar

forgot to write in my previous comment: having a repository that's just a bunch of files makes it also trivial to mirror, which helps lower upkeep costs

[ 2205 ] | Thu, 30-Jul-2020 by coke

This proposal seems short on details; I've got several questions: what specifically do you mean by ecosystem? How will you address fault tolerance? Would the web site reuse code from metacpan or be a group up rewrite? If rewrite, what tooling? You mention that the API is limited - what specific endpoints will be available as part of this grant? What are the milestones (we just have the 2 month estimation)? What are the estimated costs for maintenance? Have you spoken with anyone at TPF outside of the grant process regarding maintenance costs or donations "exceeding operating costs" (Any discussion of donations outside this grant process needs to include at least the Treasurer)? Why "zef.pm" given the rename from Perl 6 to Raku? If this is intended to be the primary ecosystem, why does this need a plugin to zef? (Wouldn't it be a core feature?).

[ 2206 ] | Thu, 30-Jul-2020 by coke

"ground up rewrite"

[ 2208 ] | Thu, 30-Jul-2020 by codesections

This is a very interesting proposal – that you very much.

I have a few big-picture questions about the goals for this proposal. The past few years have seen a *lot* of new language package managers, from the rise of NPM, to Golang's package manager, to Rust's Cargo. Many of these have attempted to fix perceived flaws in past package managers (though of course they could have introduced new flaws of their own). Do any existing package managers/language ecosystems serve as positive or negative examples for this project? You briefly discussed the limitations of relying on cpan for Raku packaging, but I'm curious about your thoughts about lessons we can learn from existing language ecosystems.

Another notable development in the last few years is the dramatic rise in the number of malicious packages uploaded to package repositories – including, just within the past few days, cpan (https://news.perlfoundation.org/post/malicious-code-found-in-cpan-package). I recognize that their are real limits to how much a package manager can do to protect the ecosystem from malicious packages. At the same time. most existing package managers were developed before the scale of the problem was widely recognized and I wonder if there's a way for us to lay the groundwork for better security practices while our ecosystem is still small enough for best practices to spread.

In particular, it strikes me that much of the risk comes from large dependency graphs, with many transitive dependencies, In many language ecosystems, it's easy for a developer to transitively depend on many, many packages – but it's very hard for a developer to have much visibility into what they are depending on. Could Raku do something that encourages more awareness of transitive dependencies? (Raku has a strong advantage here – we have a deep standard library and an *extremely* expressive language. Together, this means that many features that would be supplied by a package in a different ecosystem are either built-in or are trivial to hand code. Could we build on that advantage and further encourage low-dependency development?)

In any event, I imagine you have thought about all of the above more deeply than I have. I look forward to hearing your thoughts. Thanks again!

[ 2209 ] | Thu, 30-Jul-2020 by jdv

I would like to see more details about why the existing solutions either can't be fixed or aren't worth fixing as well as why this new solution is worthwhile.

On CPAN:

How will this new "dist repo" part compare to cpan in terms of: fault tolerance and simplicity. Those are the main strengths, in my mind, of the cpan option. For example: anyone can do a cpan mirror (quite efficiently with rsync), anyone can setup a private "cpan" for various reasons, and when a particular mirror is unavailable the clients have plenty of others to choose from.

I believe the only significant (read perhaps unfixable) issue with cpan is the manual pause approval process.

Another, I believe fixable, issue related to cpan is that the raku indexes in cpan are not useful and/or aren't being used. Zef is instead using an index alongside the p6c one at https://github.com/ugexe/Perl6-ecosystems. This breaks, at least, the fault tolerance aspect of cpan.

Also, exactly what about "auth and ver" does cpan not support?

On modules.raku.org:

Why can't this solution be used as a base to build on? It already has the basic functionality of a "metacpan" (docs/code view/browse, search, etc...). Perhaps it could be used to also more easily manage the p6c index.

Don't get me wrong - I'm all for more choice and improvements and such. But what is being proposed here carries along with it significant additional costs, both initial and recurring, in both funds and man hours to tpf/the raku community. And, as addressed above - at least in my opinion, isn't yet justified in sufficient detail.

[ 2215 ] | Mon, 03-Aug-2020 by moritzlenz

I think Raku does need ecosystem improvements, and needs them badly.

This grant proposal, however, leaves me with more questions than answers.

For example, the fault tolerant ecosystem, which faults can it handle?

The website "similar to metacpan", which features are included? metacpan has (off the top of my hat, so likely incomplete) search, rendered documentation, hilighted source code view, raw source code view, stars, dependency explorer, links to issue trackers and so on.

If users are allowed to upload files, it'll need some kind of authentication, and thus either access to some sort of identity management, or implement its own. Which one will it be?

Will there be any kind of moderation features (like the options for admins to remove malicious code or spam uploads)?

What are the plans for involving the raku infrastructure operators, or really anybody else from the community, to ensure that in the end it doesn't become a platform with a single point of failure?

I ask these questions partly because depending on the answers, I might want or not want to support this project, and partly because clearly defining the scope of a project is tremendously important for a grant that is payed on completion. In fact, if you find anyway to split this grant into milestones that are useful on their own, I'd highly encourage you to structure it so that reaching one milestone can trigger a partial payout of the grant.

[ 2216 ] | Tue, 04-Aug-2020 by cromedome

At Tony's request, I have amended the original post with an addendum he submitted in hopes that it will clear up some things that have been brought up (here and privately). Please check it out.

Grant Proposal: Raku Ecosystem

Comments (10)

Sign in to add comment

About TPF

Categories

Popular Tags

Recent Tags

Get Perl

Links