Contact Us Search Site Index About This Site Edit Decrease text size Increase text size Georgetown University main web site Contact Us Search Site Index About This Site
spacer spacer spacer
University Information Services at Georgetown University
Faculty Help Staff Help Student Help About UIS

E-NOTES, SEPTEMBER-OCTOBER 2002 -- BEOWULF CLUSTER

 September-October 2002
 Home Page

 E-Notes Home Page

spacer spacer

The UIS Beowulf Cluster: High Performance Computing

The Advanced Research Computing Group

Imagine you're in charge of shelving new books in a library. In this library, books aren't shelved by subject, or author, or even by title; they're shelved according to which pages have the word "got" printed on them. Each time a new book comes in, before you can tell where it belongs, you must list all the places that the word "got" appears in the text. Then you must compare that list to the "got lists" for all the other books in the library and find the closest matches. Only then will you know where to shelve it.

Now imagine that you're classifying proteins. Each protein consists of hundreds of amino acid sequences; you've got to compare all these sequences to the known sequences of proteins in your database and find the closest matches. But you've already got 900,000 proteins in your database and each month thousands of new proteins requiring classification come pouring in. This makes shelving books at our fantasy library look like child's play. Even after you write software to automate the tasks of identifying and matching the proteins, how do you get it to check through those 900,000 entries in a timely manner? Baris Suzek of the Protein Information Resource (PIR) is answering these questions, and a large part of the answer is Abba, the Department of Chemistry/UIS Beowulf cluster.

What is a Beowulf cluster?
Originally conceived at NASA, a Beowulf cluster is a high performance computer system consisting of several small, fast computers (nodes) harnessed together and controlled by a single (master node) computer. In this way, all the nodes can work in tandem, generating tremendous computing power. All the individual nodes and the master node are referred to collectively as a Beowulf cluster.

According to Woonki Chung of Georgetown's Advanced Research Computing (ARC) program, a Beowulf cluster provides great power at relatively low cost by using "commodity off-the-shelf " components: cheap and fast PCs; affordable, high-bandwidth internal networking; a Linux operating system; and parallel programming software.

The important thing to remember is that while each individual node may not be particularly powerful, hooking eight or more of them together does create a very powerful system.

The Georgetown Abba cluster
Georgetown's Abba cluster was built when Professor Miklos Kertesz of the chemistry department apprised UIS of the need for a local high performance computer cluster. Partnering with the Department of Chemistry and Dell Computers, UIS obtained the necessary components. 

Composed of eight 800MHz Pentium III dual-processor Dell PowerEdge 1550 rack-mountable servers, the Abba cluster runs the Scyld Beowulf Cluster Operating System. It was created by Woonki Chung and is maintained by Arnie Miles, systems administrator at ARC.

Using the Abba cluster
The Abba cluster will be used primarily for Department of Chemistry computation but it can also be available for other research projects. Baris Suzek was able to test the system and reported that Abba vastly shortened his processing times: jobs that could have taken weeks on a stand-alone computer took just days on Abba. If you are interested in using a Beowulf cluster for  your research, please visit the Abba Web site, and contact moores@georgetown.edu.

 

spacer

spacer