The UCSF Computer Graphics Laboratory, home to the Resource for Biocomputing, Visualization and Informatics (RBVI), operates a cluster of high-performance servers to provide for the compute- and data-intensive needs of our user community. The cluster appears to users as a single computing environment, and is comprised of both hardware and system software as described below.Our server hardware is based on Hewlett Packard's (HP) AlphaServer family of computers and includes a 32-processor GS1280 and four 4-processor ES45s. These systems are high-end multi-processor servers organized in a symmetrical multiprocessor (SMP) architecture. These are the same types of servers used at the Pittsburgh Supercomputing Center to implement their terascale computing system. The GS1280 system has thirty-two 1.15 GHz Alpha EV7 processors and 64 GB of memory, while each ES45 system has four 1-GHz Alpha EV68 processors and 16 GB of memory. Detailed performance data is available for both the GS1280 and the ES45. All servers are interconnected using a high-bandwidth, low-latency interconnect technology known as Memory Channel, supporting 90MB/s channel bandwidth between any two server nodes and 2.1 usec end-to-end latency.The server hardware described above is integrated into a single software environment known as a "cluster." Our cluster software is based on HP's TruCluster Server system, and provides for high-performance, scalable, highly available services. All server nodes utilize the same "single system image" of the operating system, and home directories, user files, and system files are accessible from all nodes in the cluster, resulting in location independence for all application software. This technology makes it possible to do application load sharing among cluster nodes, so that large compute-intensive jobs can be run on separate nodes from interactive jobs, for example. This technology also provides a highly-available computing environment, since a hardware or software failure on one member of the cluster results in the migration of those services provided by that node onto the remaining active nodes of the cluster. The entire cluster is accessed through a common cluster address (cluster alias), known as "socrates.ucsf.edu". Depending on which server nodes are available and which specific service is being accessed (e.g. our web server), the cluster alias resolves to a specific node that then provides the requested service. Additional technical details on TruCluster Server are available here. Sun Grid Engine is used to control the execution of compute-intensive jobs on the cluster.
Laboratory Overview | Research | Outreach & Training | Available Resources | Visitors Center | Search