WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Multiple Interfaces For LNET
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Use Cases
A node may have one or more link sets - these are a set of nids that will be used in an aggregated fashion.
- one to many, many to one, many to many and rails situations need to be supported. Precisely link sets with K elements should be able to connect to link sets with L elements.
- clients with one interface to servers with 2
- vice versa
- rail situations
- a link set requires an aggregation descriptor:
- bandwidth aggregation behavior
- link level failover/failure recovery model
- Some of these are optional or for future versions.
- These descriptors need to go into /etc/modprobe.conf
- The MGS will be reached through passing multiple remote addresses describing a failover link set
- Aggregation is desirable for links on a single or on multiple LNETs
- Utilities like lctl ping <nid> can send/packets to an individual nid of an interface and to an aggregated link set. (a link set probably needs to be named with a nid)
- Clients will connect to servers by naming the server link set. This requirement is to allow clients outside a firewall to connect to a server behind a firewall where the server has non-reachable nids (like 192.168.1.*) which might have a different meaning near the client.
- Lustre will see multiple nid's only for failover, i.e. no new connection behavior
- The nids in a link set will be made available though the LNET management node (probably the MGS) to allow dynamic server addition.
- Configuration will allow "real failover IP addresses" to be configured.
- Desirable implementation constraint: a linkset will be a nid on a lustre LNET and the routing mechanisms will be used to reach the nid and implement aggregation behavior. LNET bandwidth sharing, failure handling etc.
- It shall be possible to specify lustre configurations for simultaneous use of different linksets on the same server targets.
- If modified nids are used, they shall be big enough to contains both linkset modifiers and IP-v6 addresses
- modprobe.conf shall remain a clusterwide file
CONFIGURATION MANAGEMENT
Lustre configuration adaptations
A lustre configuration specification must be able to describ linksets for each node that shall be used during Lustre setup.
The ip2net configuration directive is extremely similar to what we need here.
options lnet 'ip2linkgroup="eth-oss-vib-mds 192.168.0.[2-20]@tcp0:ethall; eth-oss-vib-mds 132.6.1.[2,3]@vib0; vib-all *@vib0"'
The ethall directive is a linkset nid modifier as defined in B below.
Specifying this as a modprobe.conf parameter is very desirable, because every node would have linkgroup descriptors which it could use to establish routes to aggregated linksets.
The mount command can give a parameter:
mount -t lustre -o linkgroup=<name> <mgs-nids-seq>:fsname /mnt/pt
The MGS will map a linkgroup name to a linkset nid (using one of the two alternatives below) for each server, to be used by nodes connecting to this. These linkset nids will be in the configuration log and can be interpreted by LNET.
This allows, for example, the MDS to connect to OSS's over IB while clients connect to the OSS's over TCP.
mount -t lustre -o linkgroup=vib-all /dev/mds-dev /mnt/pt mount -t lustre -o linkgroup=eth-oss-vib-mds <mgs-nids-seq>:fsname /mnt/pt
linkgroup indicators in NIDs
define linksets in /etc/modprobe.conf without a requirement to define a unique nid, e.g.:
linkset= <linkset-name>[{<aggr params>}]( <iface list> )
linkset=eth-all{failover,noloadbalance}(eth0 eth1)
extend the syntax of the nid from nid = <address>[@<network>] to
nid = <address>[@<network>][:<linkset name>]
Now:
192.168.1.5@tcp0:eth0 192.168.1.5@tcp0:eth-all
become valid nids.
ISSUES
- Suppose servers are added with a previously unspecified network. In this case the MGS needs to learn this at addition time, in particular, the MGS would have to reparse its own /etc/moprobe.conf file or get information from the new servers.
- This requirements discussion does not address the naming of nodes, which might be an additional useful requirement.