Architecture - CTDB with Lustre

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Summary
CTDB_with_Lustre provides a failsafe solution for windows pCIFS.

Implementation constraints

 * 1) Use CIFS for interconnection between CTDB/Samba and Windows clients.
 * 2) pCIFS drivers filter Windows CIFS client, i.e. LanmanRedirector.

SHARING-VIOLATION issue
with CTDB, all Samba servers in a CTDB cluster share the same database to manage all session status, such like connections, file handles, locks, etc. So when pCIFS tries to open OST files with the file already opened on MDS server, Samba will check the shared oplock and share modes database and then complain conflict of sharing violation.

we need the Samba servers on OST ignore the share modes and oplocks and just pass the OPEN request to Lustre. Lustre could handle this case with ease.

pCIFS failover model
pCIFS failover model deponds on the failover supports of Lustre and CTDB. And here are some main issues to be addressed during implementation:


 * 1) CTDB couldn't support dynamically adding or removing a node to /from a working CTDB cluster. Tridge said they plan it for future, but wouldn't start for the moment. Currently there's no way to add a new node to CTDB and remvoing will cause CTDB failover. We need impelement this functionality to let Heartbeat renew CTDB cluster while Lustre failover occurs.
 * 2) We must enhance srvmap to collect lustre server public IP addresses. pCIFS clients will access Lustre volumes by these IPs. But Lustre itself could not provide these information, since cluster could be working on NON-IP networks. Fortunately Heartbeat could do the job instead, following a scheme we've prepared in advance. Another enhancement is socket communication with pCIFS clients. The purpose is to send the events of Lustre failover to pCIFS clients, which could be triggered by Heartbeat.
 * 3) CTDB is to select a (any) node as failover node inside the CTDB cluster to substitute the dead node, and requires the two nodes  must be inside the same subnet. We need CTDB adaptd to our election policy to decide which node (inside or outside of the CTDB cluster) to take over the dead node's IP. The top-priority candidate should be the standby node.We could also put the two nodes in a different subnet to make them failover each other in a CTDB cluster. This feature is to ensure MDS and OST servers won't take over each other's IP, or it will bring SHARING-VIOLATION issue.
 * 4) The timing issue between two different failover processes, CTDB is fast and Lustre is slow to confirm the node's death.