Architecture - Multiple Interfaces For LNET

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Use Cases
A node may have one or more link sets - these are a set of nids that will be used in an aggregated fashion.


 * 1)  one to many, many to one, many to many and rails situations need to be supported. Precisely link sets with K elements should be able to connect to   link sets with L elements.
 * 2)  clients with one interface to servers with 2
 * 3)  vice versa
 * 4)  rail situations
 * 5) a link set requires an aggregation descriptor:
 * 6) bandwidth aggregation behavior
 * 7) link level failover/failure recovery model
 * 8)    Some of these are optional or for future versions.
 * 9)    These descriptors need to go into /etc/modprobe.conf
 * 10)  The MGS will be reached through passing multiple remote addresses   describing a failover link set
 * 11)  Aggregation is desirable for links on a single or on multiple LNETs
 * 12)  Utilities like lctl ping can send/packets to an individual nid of an   interface and to an aggregated link set.  (a link set probably    needs to be  named with a nid)
 * 13)  Clients will connect to servers by naming the server link set.   This requirement is to allow clients outside a firewall to connect    to a server behind a firewall where the server has non-reachable    nids (like 192.168.1.*) which might have a different meaning near    the client.
 * 14)  Lustre will see multiple nid's only for failover, i.e. no new   connection behavior
 * 15) The nids in a link set will be made available though the LNET   management node (probably the MGS) to allow dynamic server    addition.
 * 16)  Configuration will allow "real failover IP addresses" to be   configured.
 * 17) Desirable implementation constraint: a linkset will be a nid on a   lustre LNET and the routing mechanisms will be used to reach the    nid and implement aggregation behavior.  LNET bandwidth sharing,    failure handling etc.
 * 18) It shall be possible to specify lustre configurations for   simultaneous use of different linksets on the same server targets.
 * 19) If modified nids are used, they shall be big enough to contains   both linkset modifiers and IP-v6 addresses
 * 20) modprobe.conf shall remain a clusterwide file

Lustre configuration adaptations
A lustre configuration specification must be able to describ linksets for each node that shall be used during Lustre setup.

The ip2net configuration directive is extremely similar to what we need here.

options lnet 'ip2linkgroup="eth-oss-vib-mds 192.168.0.[2-20]@tcp0:ethall; eth-oss-vib-mds 132.6.1.[2,3]@vib0; vib-all *@vib0"'

The ethall directive is a linkset nid modifier as defined in B below.

Specifying this as a modprobe.conf parameter is very desirable, because every node would have linkgroup descriptors which it could use to establish routes to aggregated linksets.

The mount command can give a parameter:

mount -t lustre -o linkgroup= :fsname /mnt/pt

The MGS will map a linkgroup name to a linkset nid (using one of the two alternatives below) for each server, to be used by nodes connecting to this. These linkset nids will be in the configuration log and can be interpreted by LNET.

This allows, for example, the MDS to connect to OSS's over IB while clients connect to the OSS's over TCP.

mount -t lustre -o linkgroup=vib-all /dev/mds-dev /mnt/pt mount -t lustre -o linkgroup=eth-oss-vib-mds :fsname /mnt/pt

linkgroup indicators in NIDs
define linksets in /etc/modprobe.conf without a requirement to define a unique nid, e.g.:

linkset= [{ }]

linkset=eth-all{failover,noloadbalance}(eth0 eth1)

extend the syntax of the nid from nid = [@ ] to

nid = [@ ][: ]

Now: 192.168.1.5@tcp0:eth0 192.168.1.5@tcp0:eth-all become valid nids.

ISSUES

 * 1) Suppose servers are added with a previously unspecified network.  In this case the MGS needs to learn this at addition time, in   particular, the MGS would have to reparse its own /etc/moprobe.conf   file or get information from the new servers.
 * 2) This requirements discussion does not address the naming of nodes,  which might be an additional useful requirement.