Microsoft has started external developer testing of a number of interrelated parallel/distributed technologies for Windows Server that are part of the codename “Dryad” family.
According to a December 17 blog post on the Windows HPC (High Performance Computing) Team Blog, Microsoft is making available to testers via its Connect test site the first Community Technology Preview (CTP) test builds of its Dryad, DSC and DryadLINQ technologies.
Dryad is Microsoft’s competitor to Google MapReduce and Apache Hadoop. In the early phase of its existence, Dryad was a Microsoft Research project dedicated to developing ways to write parallel and distributed programs that can scale from small clusters to large datacenters. There’s a DryadLINQ compiler and runtime that is related to the project. Microsoft released builds of Dryad and DryadLINQ code to academics for noncommercial use in the summer 2009. Microsoft moved Dryad from its research to its Technical Computing Group this year.
According to a presentation from August, the team’s plan was to deliver a first CTP build of the stack in November 2010 and to release a final version of it running on Windows Server High Performance Computing servers by 2011.
This initial preview is intended for “developers who are exploring data-intensive computing,” according to the Softies. The prerequisite for the CTP is HPC Pack 2008 R2 Enterprise-based cluster, with Service Pack 1 installed.
As I noted in a previous blog post, there are a number of interesting components that comprise Dryad, including a new distributed filesystem (codenamed “TidyFS”), a set of related data-management tools (codenamed Nectar”) and a scheduler for distributed clusters (codenamed “Quincy”).
link:
Dryad is an ongoing Microsoft Research project dedicated to developing ways to write parallel and distributed programs that can scale from small clusters to large datacenters. There’s a DryadLINQ compiler and runtime that is related to the project. Microsoft released builds of Dryad and DryadLINQ code to academics for noncommercial use in the summer 2009.
It looks like Dryad is ready to take the next step. Microsoft is planning to move the Dryad stack from Microsoft Research to Microsoft’s Technical Computing Group. The plan is to deliver a first Community Technology Preview (CTP) test build of the stack in November 2010 and to release a final version of it running on Windows Server High Performance Computing servers by 2011, according to a slide from an August 2010 presentation by one of the principals working on Dryad.
But wait, there’s one more thing. (Actually, there are three more things.)
The Dryad stack is getting more detailed as the researchers continue to work on it. Here’s the existing Dryad stack diagram:
Here’s an updated version of the stack diagram from the aforementioned August 2010 presentation by one of the Dryad team members:
The Dryad layer of the stack handles scheduling and fault-tolerance, while the DryadLINQlayer is more about parallelization of programs.
The latest Dryad stack diagram includes mention of a new distributed filesystem, codenamed TidyFS, for parallel computation with Dryad. This file system “provides fault tolerance and data replication similar to GFS (the Google File System) or the Cosmos store.” (Cosmo, according to the previous stack diagram, was the codename for the Dryad file system which complemented the NT File System. TidyFS is either the new name for Cosmos or its successor, I’d say.)
There’s also a set of related data-management tools, codenamed “Nectar.” I found a white paper from Microsoft Research on Nectar, which explains its purpose this way:
“In a Nectar-managed data center, all access to a derived dataset is mediated by Nectar. At the lowest level of the system, a derived dataset is referenced by the LINQ program fragment or expression that produced it. Programmers refer to derived datasets with simple pathnames that contain a simple indirection (much like a UNIX symbolic link) to the actual LINQ programs that produce them.”
There’s one more new Dryad-related codename worth noting: “Quincy.” Quincy is a scheduling system for distributed clusters. (Quincy, Wash., also happens to be the location of one of Microsoft’s major datacenters.)
Microsoft is continuing to step up its work in the HPC space, hoping to ice out Linux in that arena. The Softies are seemingly counting on Dryad to keep up their momentum both on premises, with Windows Server, and in the cloud with Windows Azure in its datacenters.