FAQ - Networking
(Updated: Dec 2009)
Which interconnects and protocols are currently supported? Today, Lustre supports TCP/IP (commonly over gigabit or 10-gigabit ethernet), OFED, Myrinet MX, and Cray's Seastar networks. Other older networking technologies were also supported, but are virtually unused today and will be dropped in future releases.
The up-to-date versions of each network type are at the beginning of the lnet/ChangeLog file and each release announcement.
Can I use more than one interface of the same type on the same node?
Yes, with Lustre 1.4.6 and later.
Can I use two or more different interconnects on the same node?
Yes, with Lustre 1.4.x, subject to the particular limitations of the interconnect. For example, we are told that it is not possible to use both Elan 3 and Elan 4 in the same node at the same time.
Can I use TCP offload cards?
Probably -- but we've tried many of these cards, and for various reasons we didn't see much improvement, if any. First, because Lustre runs entirely in the kernel, it uses kernel networking APIs which are often not supported (or at least not optimized) by the offload drivers.
Second, the problem isn't the overhead of checksum calculation or the need for interrupt coalescing; lots of commodity ethernet cards already support these features. The big overhead is memory copying and buffering, which these cards rarely do anything to address.
Does Lustre support crazy heterogeneous network topologies?
Yes, although the craziest of them are not yet fully supported.
Because Lustre supports native protocols on top of high speed cluster interconnects (in addition to TCP/IP), some special infrastructure is necessary.
Lustre uses its own implementation of the Portals message passing API, upon which we have implemented Gateway nodes, to route between two native protocols. These are commodity nodes with, for example, both gigabit ethernet and InfiniBand interfaces. The gateway software translates the Portals packets between the interfaces to bridge the two networks.
These routers are in use today, and may become more popular as more enterprises connect multiple clusters with special interconnects to a single global Lustre file system. On the other hand, TCP/IP on GigE is the interconnect of choice for most organizations, which requires no additional Portals routing.