![]() ![]() These nodes don’t run applications rather, they store and serve data to the rest of the cluster. For example, data servers can be added to the cluster. Compute nodes are usually systems that run the bare minimum OS – meaning that unneeded daemons are turned off and unneeded packages are not installed – and have the bare minimum hardware.Īs the cluster grows, other roles typically arise, requiring that nodes be added. Typically compute nodes don’t do any cluster management functions they just compute. Other nodes in the cluster fill the role of compute nodes, which describes their function. ![]() For smaller clusters, the master node can be used for computation as well as management, but as the cluster grows larger, the master node becomes specialized and is not used for computation. It controls and performs the housekeeping for the cluster and many times is the login node for users to run applications. The master node is the “controller” node or “management” node for the cluster. Figure 1 is a simple illustration of the basic architecture.Īlmost always you have a node that serves the role of a “master node” (sometimes also called a “head node”). Typically the nodes are as similar as possible, but they don’t have to be however, I highly recommend that they be as similar as possible because it will make your life much easier. You have some servers (nodes) that serve various roles in a cluster and that are connected by some sort of network. The architecture of a cluster is pretty straightforward. Perhaps surprisingly, the other basic tools are almost always installed by default on an OS however, before discussing the software, you need to understand the architecture of a cluster. Most distributions have the basic tools for making a cluster work and for administering the tools however, you will most likely have to add the tools and libraries for the parallel applications (e.g., a message-passing interface library or libraries, compilers, and any additional libraries needed by the application). The most fundamental HPC architecture and software is pretty unassuming. In addition to some basic storage, and to make things a bit easier, I’ll create a shared filesystem from the master node to the other nodes in the cluster. Storage in each node can be as simple as an SD card to hold the OS, the applications, and the data. In this article, I am initially going to consider a single network, but later I will consider more than one. The network can theoretically be anything that allows communication between nodes, but the easiest solution is Ethernet. The second thing your cluster needs is a network to connect the nodes so they can communicate to share data, the state of the solution to the problem, and possibly even the instructions that need to be executed. This lessens a source of possible problems when you have to debug the system. (Strictly speaking, it doesn’t have to be this way, but otherwise, it is very difficult to run and maintain.) If you install a package on node 1, then it needs to be installed on node 2 as well. To keep things running smoothly, the OS on both nodes should be identical. If you are interested in parallel computing using multiple nodes, you need at least two separate systems (nodes), each with its own operating system (OS). With this goal in mind, you can make some reasonable assumptions about the HPC system. Although you could use a single node and then create two VMs, it’s important to understand how applications run across physically different servers and how you administer a system of disparate physical hardware. ![]() To do this, you need to run parallel applications across separate nodes. The King in Alice in Wonderland said it best, “Begin at the beginning ….” The general goal of HPC is either to run applications faster or to run problems that can’t or won’t run on a single server. ![]()
0 Comments
Leave a Reply. |