Search results

Page title matches

Recovering from a Node or Network Failure
For information about recovering from a node or network failure, see the following:

414 bytes (51 words) - 10:35, 20 January 2011

Page text matches

Guidelines for Setting Up a Cluster
...e consisting of a short prefix combined with regularly incremented decimal node numbers (e.g., n0001, n0002, etc.) works well with an automated tool like ' * '''Collect syslogs in one place.''' In addition to collecting logs on a per node basis, collecting syslogs in one location lets an administrator monitor a s

2 KB (378 words) - 06:10, 22 February 2010
Recovering from a Node or Network Failure
For information about recovering from a node or network failure, see the following:

414 bytes (51 words) - 10:35, 20 January 2011
FAQ - Recovery
...ly power off the failed node. Otherwise, there is a chance that the "dead" node could wake up, start using the disk at the same time, and cause massive cor ...and MDT level. Lustre failover is to handle the failure of an MDS or OSS node as a whole which, in our experience, is not very common.

4 KB (657 words) - 08:31, 22 February 2010
FAQ - Networking
'''Can I use more than one interface of the same type on the same node?''' '''Can I use two or more different interconnects on the same node?'''

2 KB (377 words) - 08:28, 22 February 2010
Architecture - HSM and Cache
...nitiator to request that file data is read from one location and stored in another. # '''3rd Party IO''' - A node requests through a lustre client that data can be read/written through a 3

1 KB (210 words) - 14:12, 22 January 2010
Configuring the Lustre File System
node, run: node, run:

5 KB (758 words) - 11:53, 20 January 2011
Architecture - FIDs on OST
...iguous set of fids, all fids from given sequence belong to single specific node ; '''fld''': persistent sequence to node mapping

3 KB (382 words) - 14:09, 22 January 2010
Lustre Configuration Example
|network type||TCP/IP||MGS node||10.2.0.1@tcp0 |block device||/dev/sdb||OSS 1 node||oss1

5 KB (684 words) - 11:02, 22 February 2010
FAQ - Installation
...the consistency of these share modes and oplocks, you should use a single node to export CIFS. '''What is the typical MDS node configuration?'''

6 KB (956 words) - 08:26, 22 February 2010
Architecture - Multiple Interfaces For LNET
A node may have one or more link sets - these are a set of nids that # The nids in a link set will be made available though the LNET management node (probably the MGS) to allow dynamic server addition.

4 KB (668 words) - 14:18, 22 January 2010
Running Hadoop with Lustre
...he Reduce node uses the HTTP protocol to retrieve Map results from the Map node protocol. The HTTP protocol is not a good choice for large data transfers b

2 KB (362 words) - 12:00, 22 February 2010
Upgrading to a New Version of Lustre
...then OSTs. Unmounting a block device causes Lustre to be shut down on that node. :a. Unmount the clients. On each client node, run:

10 KB (1,542 words) - 10:23, 20 January 2011
Large-Scale Tuning for Cray XT
...er than the product of the total number of nodes and maximum processes per node. ...cs_per_node'' is the maximum number of cores (CPUs), on a single Catamount node. Portals must know this value to properly clean up various queues. LNET is

3 KB (438 words) - 08:54, 22 February 2010
Using Red Hat Cluster Manager with Lustre
...number of Lustre targets and, in case of a failure, the active/non-failed node takes over the Lustre targets of the failed nodes and makes them available ...nodes are equipped with a service processor allowing to shut down a failed node using IPMI. For other methods of fencing, refer to the RedHat Cluster docum

16 KB (2,207 words) - 09:13, 20 December 2010
ZFS and Lustre
The Lustre™ node file system ''ldiskfs'' (based on ext3/ext4) is limited to an 8 TB maximum ...ct any data corruption introduced into the network between the application node and the disk drive in the Lustre storage system.

4 KB (617 words) - 11:26, 10 September 2010
Preparing to Install Lustre
...an also cause problems by, for example, making it difficult to debug multi-node issues or correlate logs, which depend on timestamps. We recommend that you ...re (new in Lustre 1.8), which consumes memory as it caches data on the OSS node.

9 KB (1,347 words) - 10:17, 20 January 2011
GSS / Kerberos
*For each client node, create a lustre_root principal and generate keytab. *Install the keytab on the client node.

10 KB (1,660 words) - 09:26, 12 April 2013
Lustre FAQ
'''[[FAQ - Sizing|Sizing]]''' - File system, file, I/O request, OSS, and node limitations.

1 KB (189 words) - 11:16, 22 February 2010
Architecture - Request Redirection
;'''Migration''': moving file data from one set of OSTs to another ...ts get sent to it, it does locking and optionally can redirect a client to another server where data access will happen without further locking

11 KB (1,647 words) - 14:22, 22 January 2010
Architecture - CTDB with Lustre
...ster failure || failover support to avoid any failure of a Lustre or Samba node stopping the whole cluster. ...|| all resources (like opened handles or locks) grabbed by a dead client node should be gracefully released, or other clients will be forbidden to access

16 KB (2,220 words) - 13:14, 2 February 2010
FAQ - Metadata Servers
...es containing millions of files, and we have several customers with 10,000-node clusters (or larger) and a single metadata server. '''What is the typical MDS node configuration?'''

7 KB (1,169 words) - 08:27, 22 February 2010
Architecture - Epochs
...ion. Source can be either a single client node (WBC case), a single server node (meta-data proxy), or a collection of server nodes (proxy cluster); ...file system state, that transfers file system from one consistent state to another consistent state. Typical example of an operation is a system call;

11 KB (1,784 words) - 14:07, 22 January 2010
Using Pacemaker with Lustre
...the address are cleared. Thus the configuration file is independent of any node and can be copied to all nodes. Local node ID (...)

23 KB (3,679 words) - 09:59, 4 February 2011
Creating and Managing OST Pools
The ''lctl'' command ''MUST'' be run on the MGS. Another requirements for managing OST pools is to either have the MDT and MGS on the same node or have a Lustre

5 KB (904 words) - 11:21, 3 December 2010
Configuring InfiniBand Connectivity
The node specified is on ''o2ib'' network ''3'' using HCA ''ib3''.

2 KB (267 words) - 11:57, 20 January 2011
Use:Use
* [[Recovering from a Node or Network Failure]]

3 KB (359 words) - 10:53, 24 July 2013
FAQ - Fundamentals
An Object Storage Server (OSS) is a server node, running the Lustre software stack. It has one or more network interfaces a ..., Lustre conforms to the most reasonable interpretation of what the single-node POSIX requirements would mean in a clustered environment.

9 KB (1,512 words) - 08:25, 22 February 2010
Sample Style
<node uuid='mdev10_UUID' name='mdev10'> |KERNEL||output of `uname -a` from the node on which lustre is being run

8 KB (1,092 words) - 13:13, 16 December 2009
Architecture - Clustered Metadata
”’mkdir”’ always create the new directory on another MDS ...hip of resources varies among file systems. In local file systems a single node owns all resources. No parallelism can be achieved with this. In traditiona

20 KB (3,407 words) - 14:00, 22 January 2010
Lustre Tuning
many CPUs are on each OSS node (1 thread / 128MB * num_cpus). If the load on the OSS node is high, new service threads will be started in order to process more

9 KB (1,630 words) - 05:57, 30 April 2010
GetInvolved:Get Involved
...Lustre]</ins> Manage one or more Lustre filesystems from an administrative node.

3 KB (503 words) - 11:18, 24 July 2013
Architecture - ZFS large dnodes
; '''dnode''' : DMU node, 512 bytes in original ZFS implementation and includes "bonus buffer" for u

2 KB (382 words) - 14:27, 22 January 2010
Architecture - Simple Space Balance Migration
; '''management node''' : a Lustre client used to spawn and dispatch instructions to agents on o ...inked from the filesystem namespace. If the migration agent fails, or the node on which it is running fails, the objects will be destroyed by MDS-OST orph

9 KB (1,467 words) - 14:23, 22 January 2010
Architecture - Pools of targets
...ure - Simple Space Balance Migration|Object Migration]] from a set OSTs to another pool of OSTs ...e configuration llog is parsed nodes that require the pool information the node will build a hash table mapping pool names to lists of OST's.

3 KB (470 words) - 19:48, 2 February 2010
Lustre DDN Tuning
'''''Note:''''' There is no risk from an OSS/MDS node crashing, only if the DDN itself fails. ...(262144 4 KB blocks) as it can consume up to this amount of RAM on the OSS node per OST.

7 KB (1,063 words) - 11:11, 22 February 2010
FAQ - Release Testing and Upgrading
...failover; a service is failed over, the software is updated on the stopped node, the service is failed back, and the failover partner is upgraded in the sa

3 KB (500 words) - 08:32, 22 February 2010
FAQ - Object Servers and I/O Throughput
'''What is a typical OSS node configuration?''' ...protocol, if a client requests a lock which conflicts with a lock held by another client, a message is sent to the lock holder asking for the lock to be drop

7 KB (1,290 words) - 08:29, 22 February 2010
Architecture - LRE Images
|Lustre Image||An image that can run and boot on a Lustre node. It is expected the Deployed LREs will include client systems configured w

5 KB (723 words) - 14:15, 22 January 2010
Clustered Metadata
node, which hold the root inode for a fileset. Clients will contact * ''mkdir'' always creates the new directory on another MDS.

5 KB (760 words) - 11:36, 10 September 2010
FAQ - OS Support
...VFS layer, implementing an API extension to make intent locking possible. Another substantial set of changes were made to ''ext3'' to make it more scalable a We have tested Lustre in the past on an Altix node, but we don't have regular access to this hardware and cannot test this on

5 KB (916 words) - 12:06, 24 May 2010
Architecture - Client Cleanup
...n is handled. When grant is renewed. How multiple mount points on the same node are handled.

4 KB (654 words) - 14:00, 22 January 2010
Architecture - Recovery Failures
|'''Commit on sharing'''||before new or updated metadata is shared with a node that is not the updater or creator, the metadata is committed. ...nt is making a lot of updates. it starts computing, the server crashes and another client wants to access these updates.

5 KB (785 words) - 14:22, 22 January 2010
LibLustre How-To Guide
...the liblustre client and the normal VFS client if you are also using that node for the liblustre client.

5 KB (819 words) - 10:19, 22 February 2010
Lustre FUSE
</pre> a single node lustre cluster

5 KB (800 words) - 11:19, 22 February 2010
Acceptance Small (acc-sm) Testing on Lustre
:Failover test for all pair-wise combinations of node failures. The default Lustre configuration is a single node setup with mdscount=1 and ostcount=2. All devices are loop back devices. YA

21 KB (3,353 words) - 10:46, 10 March 2010
Lustre 2.0 Features
* Reduces recovery problems when multiple node failures occur The replicated system may be another Lustre file system or any other file system. The replica is an exact copy o

6 KB (957 words) - 11:23, 20 January 2011
Architecture - Interoperability fids zfs
...pgrade (and downgrade) is performed in a piecemeal fashion, a node after a node. ...esponse:'''|| OLD client unmounts, OLD.x release is installed on a cluster node. Client connects to the MDT, requesting OBD_CONNECT_FID, which is not grant

34 KB (5,038 words) - 00:19, 4 October 2012
Fsck Support
...yed, you can briefly mount and unmount the ext3 filesystem directly on the node with Lustre stopped (''NOT'' via Lustre), using a command similar to:

8 KB (1,444 words) - 12:50, 27 March 2010
Architecture - Wire Level Protocol
...quals the Lustre nid of the receiver. The nid can be an IP address or Elan node-id. || This field is set to the final destination Lustre work id: IP addr o ...g the packet when the packet is routed. || This field is set to the Lustre node-id from the packet originates.

36 KB (5,757 words) - 14:26, 22 January 2010
Architecture - Adaptive Timeouts - Use Cases
|'''Stimulus:'''|| Peer node crash / hang / reboot or network failure

12 KB (1,843 words) - 13:57, 22 January 2010
Lustre Publications
|[[Media:Ols2003.pdf|'''Lustre: Building a cluster file system for 1,000 node clusters''']]||A technical presentation about successes and mistakes during

11 KB (1,510 words) - 17:57, 18 December 2009
Subsystem Map
...ructure; it handles locks between clients and servers and locks local to a node. Different kinds of locks are available with different properties. Also as ...rrupts and other notifiers from lower levels to Lustre. Liblustre includes another set of LNDs that are able to work from userspace.

51 KB (8,203 words) - 06:28, 11 April 2010
Architecture - Server Network Striping
...tions'''|| How to handle the case of failure of Master lock server? Choose another master lock server via election? |colspan=2|'''Scenario:'''|| DAG node for read fails.

26 KB (3,768 words) - 14:23, 22 January 2010
Change Log 1.6
...ses. This made it possible to incorrectly access the mballoc bitmap while another process was modifying it, causing a sanity assertion to fail. While no on- ...kage is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more inform

166 KB (24,668 words) - 06:38, 22 February 2010
Architecture - Security
'''Server authorization:''' The server performs another authorization check. The server assumes the identity and group membership o Lustre provides hooks for a client node to invoke the services of the

52 KB (8,446 words) - 14:23, 22 January 2010
Change Log 1.8
Another drawback of that is need to drop inode mutex on truncate before taking avoided. That fixed yet another deadlock between direct i/o reads: those who

188 KB (28,583 words) - 05:09, 24 July 2013

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Search results

Page title matches

Page text matches

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools