The CPD Opteron cluster - hardware information
Description of the cluster
The configuration of the Opteron cluster is shown in Fig. 1. It consists of 114 compute nodes connected by Infiniband and gigabit networking. All machines run Red Hat Enterprise Linux 4 and Platform OCS (a vendor supported "Rocks"-like distribution) All compute nodes are are accessed from the front end. A RAID array is directly attached to the front end and contains the home directories and 2 globally accessible scratch spaces of 1 TB each. Also, there is a set of 3 IO nodes which serve as a 1.5 TB parallel file system and is accessible from the nodes and the front end via the infiniband network.
Diagram of the CPD Opteron cluster

Technical details
The following table lists the details of each machine in the cluster:
| # | Node name | architecture | processor | memory | local disk |
|---|---|---|---|---|---|
| 1 | Dell PowerEdge 2970 | pzt.wm.edu | Opteron 2220 / 2.8 GHz | 8 GB | 380 GB |
| 98 | Dell PowerEdge SC1435 | c1-[1-28],c2-[29-57],c4-[58-86],c5-[87-102] | Opteron 2218 / 2.6 GHz | 8 GB | 120 GB |
| 4 | Dell PowerEdge SC1435 | c5-[103-106] | Opteron 2222 SE / 3.0 GHz | 32 GB | 420 GB |
| 8 | Dell PowerEdge SC1435 | c5-[107-114] | Opteron 2222 SE / 3.0 GHz | 16 GB | 420 GB |
Table 2 shows the various filesystems that users may access within the cluster. The /home, /scr1 and /scr2 filespaces are all mounted via NFS over the gigabit ethernet network. Currently a 10GB quota is set for all users /home directory space. /scr1 and /scr2 are to be used as global scratch space. The global scratch space is to be used for temporarily storing large files. Neither filesystem is backed up and purging is not done automatically. Users with large amounts of data (>100 GB) will be made to reduce their usage when the scratch space starts to get too full (> ~70%).
| name | size | notes |
|---|---|---|
| /home/$USER | 1 TB | users get a 10 GB quota |
| /scr1/$USER | 1 TB | Global NFS scratch 1 |
| /scr2/$USER | 1 TB | Global NFS scratch 2 |
| /pvfs | 1.5 TB | Global parallel scratch |
| /lscr | 120 GB / 420 GB | Local scratch on compute node |