Some thoughts on NDS Labs


(I hope you didn't get this twice -- I had an email hiccup and it might have gone out twice. My apologies.)

Hi all,

John Towns suggested we write to this mailing list with our thoughts, and what we'd like to see in "an NDS Labs," so here are some (somewhat organized) thoughts, following on the productive discussions at the meeting last week in DC. I've tried to think through "What would *I* like to see" without favoring the things I'm interested in too heavily.  ÂI'd also emphasize that I'm just one voice and I'd really like to hear from others their thoughts, or feedback on what I've put below.

I'd also like to encourage you to check out jujucharms.com/sidebar/ which is a demonstration of the Ubuntu "Juju Charm" method of describing and deploying.

What I'd Like To See In NDS Labs
================================

Below I've outlined the various things that I would like to see in an NDS Labs first pass. Many of these describe abstract services, and support for those services, that would provide infrastructure for building sets of services operating in concert.

The entire remainder of this outline is designed such as to develop an NDS Labs setup that would be fully transportable, independent of the location or hardware running it. All that is pre-supposed is an OpenStack cluster. This would mean that the cyberinfrastructure for NDSLabs could run essentially anywhere -- supercomputing centers, institutional clusters, commercial clusters, and the like. However, the value-add will be in the place to collaborate and to explore.

I think the most important things we can provide in an NDSLabs infrastructure will be:

Â* Pre-built VMs and containers, ready to spin up, and the ability to modify and contribute modifications to these (root access for individuals on their own VMs would be allowed)
Â* A place to deploy these and experiment with these
Â* Support for integrating services
Â* Community infrastructure for collaboration -- maybe an "NDS Labs dash board" or "app store" model

This provides both a playground and an "incubation" method, to provide a gradual phasing of apps from experiment, to demo, to production. We may wish to consult with Chris Mattman or others with the Apache Software Foundation, as they have considerable experience in the area of incubation and project stewardship.

Base Infrastructure
-------------------

I believe there are a few things we can build on top of:

Â* VMs
Â* Containers
Â* Orchestration systems
Â* VM image repository
Â* Docker image repository
Â* Block storage devices (these are limited at some relatively small size)
Â* NFS mounts of large block devices (this will require more effort and work, and may only be initially exposed as read-only to circumvent permission issues)

Note that I am describing two separate types of execution models, that of the Virtual Machine (VM) similar to traditional cloud execution as well as the "container" (similar to docker, sandstorm.io, and so on). I believe that running these on OpenStack is the simplest way to proceed; containers may eventually be supported at the level of the OpenStack compute engine ("Nova") in a future release, but for now there are probably two modes of operation to explore:

Â* Provision and execute on VMs directly using Nova
Â* Provision an orchestration system (kubernetes, deis, mesos/mesosphere, stackato) and dispatch containers within that system

This should accomodate both "full-stack" systems such as existing applications like Dataverse, SEAD, Medici2 and the like, while still allowing for rapid prototyping of new applications that could be suited to container-based deployment.

To enable rapid deployment, we will provide vagrant, chef and/or ansible files that provision a common set of items -- orchestration systems as well as VMs drawn from the VM image repository.

The workflow for VM deployment would look something like:

Â* Build VM, upload to NDSLabs Glance
Â* Provision using NDSLabs vagrant, ansible or chef setup

The workflow for container deployment would look something like:

Â* Build docker image, upload to NDSLabs docker hub
Â* Provision set of nodes on which containers will deploy, using NDSLabs vagrant, ansible or chef setup, deploying a container orchestration system
Â* Deploy containers on this infrastructure

Software Infrastructure
-----------------------

OpenStack and AWS both provide, or promise to provide, a set of services. Both provide these at two levels, enabling individual apps (run in groups of containers or VMs) to utilize them independently; two NDS Labs projects running at the same time will likely need separate instances.

So what we will be providing are pre-configured, pre-built containers and VMs running these services, given some input for configuration. This will enable "clean slate" operation as well as isolation from other NDSLabs projects.

Â* Event system such as RabbitMQ
Â* Authentication system, such as either "fake" (prepopulated or not) LDAP server or generic user registry supporting OAuth2, PAM, etc etc.Â
Â* Relational databases
Â* Non-relational databases (MongoDB / SimpleDB-like)
Â* Storage infrastructure and coordinators; examples would be Globus Online endpoints and iRODS.

Note that production systems would approach this in a considerably different way, and we should have a second level of systems available as such. In the case of iRODS, we may wish to set up an "NDSLabs" zone, which individual servers can federate. In that case, however, we would run into the issue of admin rights, specifying iRODS rule engine rules, and so on, which can be limiting. This may need to be considered carefully for later. For a Globus Online endpoint, we'd probably also want to consolidate in a single service, but I would defer on that topic to others more knowledgable.

Support Infrastructure
----------------------

In addition to developing the cyberinfrastructure that supports developing applications, we should provide the support infrastructure for particular pain points:

Â* How to develop horizontally scalable apps?Â
Â* What is the best way to store and retrieve data?Â
Â* What kind of security models are necessary?Â
Â* How to implement authorization and authentication?Â
Â* How to execute tests?

And perhaps most importantly, guidelines and in depth support to harden apps and increase their robustness sufficiently for production deployment. These roles would be fulfilled by individuals as well as the community, and they would provide a method of consultancy. This is an opportunity for industry connection, as well as drawing on the resources at various SC centers.

Community Infrastructure
------------------------

The community support structures will be crucial to ensuring that NDSLabs is not simply "another academic cloud." ÂNDSLabs *must* provide mechanisms for:

Â* Communication
Â* Collaboration
Â* Direct technology transfer

For communication, having communication channels that are both *open* and have different levels of latency will be quite valuable (i.e., like we use in yt http://arxiv.org/abs/1301.7064 ). I would suggest, but am not tied to, something like:

Â* IRC (low-latency)
Â* Mailing list (discuss@xxxxxxxxxxxxxxxxxxxxxxx is one option, but it's perhaps not as easy as something like ndslabs at google groups)
Â* In-person workshops or "Office Hours" via Google hangouts

We may also consider something like Discourse, which provides a nice email/webforum gateway that shares characteristics with Stack Overflow.

The communication channels must be populated not only by app developers, but experienced individuals from collaborating locations -- people who can provide support on issues like "How do I pass authentication tokens?" or "What's the best way to get my filesystem passed into my docker container?" or "Are there any VM images in glance that do [...]?" ÂFor questions about specific services (i.e., "How do I add Globus Nexus authentication?") individuals would probably want to go to the particular project mailing lists or avenues of discussion.

We may or may not want to explore a hosted Kallithea instance (kallithea-scm.org), which could provide some interesting functionality for push-to-deploy workflows. However, at least encouraging individuals to utilize the existing BB/GH locations, as well as having some planning space or "advertising" space, would be nice.

Hardware and Software Necessary
-------------------------------

Looking over what I've placed above, I think even a modest (but real) set of hardware and software could be quite useful.

Â* OpenStack cluster, perhaps 256 cores total, and connected to allocated storage space divided between object storage and block storage.Â
Â* Persistent docker repository
Â* Space in Glance (the OpenStack VM catalog) for VMs
Â* Pre-built or pre-configured docker images and VMs for the various items enumerated above

The first three of these will need to be done at the outset; the fourth is the one that need neither be in place at the outset nor developed from scratch. Â


-Matt



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index