Handling Full OSTs

Should this be a subtopic under an umbrella topic called "Managing the File System".

Subtopics in Torben's paper include:

* Using Stripes

* Handling Full OSTs

* Migrating Data within a File System

(Do we need to change these headings to reuse this content?)

Sometimes the file system becomes unbalanced, often due to changed stripe settings. If an OST is full and an attempt is made to write more information to the file system, an error occurs.

The example below shows an unbalanced file system:

root@LustreClient01 ~]# lfs df -h UUID                bytes   Used  Available Use%  Mounted on lustre-MDT0000_UUID  4.4G   214.5M   3.9G     4%   /mnt/lustre[MDT:0] lustre-OST0000_UUID 2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:0] lustre-OST0001_UUID 2.0G   755.3M   1.1G    37%   /mnt/lustre[OST:1] lustre-OST0002_UUID 2.0G     1.7G 155.1M    86%   /mnt/lustre[OST:2] <- lustre-OST0003_UUID 2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:3] lustre-OST0004_UUID 2.0G   747.3M   1.1G    37%   /mnt/lustre[OST:4] lustre-OST0005_UUID 2.0G   743.3M   1.1G    36%   /mnt/lustre[OST:5]

filesystem summary: 11.8G    5.4G    5.8G    45%  /mnt/lustre

In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:

[root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1 [root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100 dd: writing `/mnt/lustre/test_3': No space left on device 98+0 records in 97+0 records out 1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s

To enable continued use of the file system, the full OST has to be taken offline or, more specifically, rendered read-only using the lctl command. This is done on the MDS, since the MSD allocates space for writing.

1. Log in to the MDS server:

[root@LustreClient01 ~]# ssh root@192.168.0.10 root@192.168.0.10's password: Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6

2. Use the lctl dl command to show the status of all file system components:

[root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5

3. Use the lctl deactivate command to take the full OST offline:

[root@mds ~]# lctl --device 7 deactivate

4. Display the status of the file system components:

[root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5

The device list shows that OST2 is now inactive. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.