WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Change Log 1.8
(Updated: July 2013)
Changes from v1.8.7 to v1.8.8-x1
Support for networks:
- socklnd - any kernel supported by Lustre,
- qswlnd - Qsnet kernel modules 5.20 and later,
- openiblnd - IbGold 1.8.2,
- o2iblnd - OFED 1.3, 1.4.1, 1.4.2, 1.5.1 and 1.5.2
- viblnd - Voltaire ibhost 3.4.5 and later,
- ciblnd - Topspin 3.2.0,
- iiblnd - Infiniserv 3.3 + PathBits patch,
- gmlnd - GM 2.1.22 and later,
- mxlnd - MX 1.2.10 or later,
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.32-279.2.1.el6 (RHEL 6)
- 2.6.32-279.2.1.el6 (OEL 6)
Client support for unpatched kernels: (see http://wiki.lustre.org/index.php?title=Patchless_Client)
- 2.6.32-279.2.1.el6 (RHEL 6)
- 2.6.32-279.2.1.el6 (OEL 6)
Recommended e2fsprogs version:
- 1.42.6.x1-mrp.107-8
The async journal commit feature (bug 19128) is off by default
Severity : minor
Bugzilla : MRP-1086 debug CWARN removed
Severity : normal
Bugzilla : MRP-1053 use mutex for cl_loi_list_lock instead of spinlock
Description: Async page operations are not guaranteed to not block, therefore spinlock is not appropriate for protecting structures accessed by them. This patch changes the spinlock with mutex.
Severity : normal
Bugzilla : MRP-1033 rpc.sh defect: LUSTRE is not set properly
Description: make do_nodes(), do_node() and rpc.sh to be more accurate on setting LUSTRE
Severity : minor
Bugzilla : MRP-1057 check lustre.conf for modprobe
Description: Add a check for /etc/modprobe.d/lustre.conf to get lnet module parameters during testing
Severity : minor
Bugzilla : MRP-1008 make lustre-iokit rpmbuildable
Severity : minor
Bugzilla : MRP-1007 update config files for rhel6
Severity : normal
Bugzilla : 24670 allow builing OFED of wider range of versions
Severity : normal
Bugzilla : 24668 fix broken sles10 build
Severity : normal
Bugzilla : 24668 fix for semaphore mess in ext4_ext_walk_space
Severity : normal
Bugzilla : 24554 noatime fix
Severity : normal
Bugzilla : 24554 noatime,nodiratime fix
Severity : normal
Bugzilla : 20128 Allow objects larger than 2TB in size
Severity : normal
Bugzilla : 24606 Misc changes
Description: - Remove unneeded patch file: ext4-store-tree-generation-at-find.patch
- Remove the hack for fsfilt_ext3_statfs()
- Use the correct spec file for rpmbuild
- Update the ChangeLog
Severity : normal
Bugzilla : 24606 Stop hacking around i_data_sem
Description: - Let ext4_ext_walk_space() itself handle the semaphore.
- Remove macro WALK_SPACE_HAS_DATA_SEM.
- Redefine macro fsfilt_up_truncate_sem().
Severity : normal
Bugzilla : 24606 ldiskfs changes for the new kernel
Description: Ldiskfs related changes for kernel 2.6.18-308.24.1.el5:
- Update related patches.
- Add Force over 24TB option.
- Add upstream patch to avoid loading bitmaps from full groups.
- Update the series file.
Severity : normal
Bugzilla : 24606 Update RHEL5 and OEL5 kernel patches
Description: The kernel is updated to 2.6.18-308.24.1.el5.
Details : Kernel related changes:
- Update some kernel patches to adapt to the new kernel.
- Remove unneeded kernel patch: md-avoid-corrupted-ldiskfs-after-rebuild.patch.
- Add a new upstream patch (soft RAID6 bug): make-bi_phys_segments-uint.patch.
- Update kernel configs, series, and targets, etc.
Severity : normal
Bugzilla : 24580 add OEL6 server support
Severity : normal
Bugzilla : 24580 quota fix
Description: specify QFMT_VFS_V1 if available
Severity : normal
Bugzilla : 24580 define ext4_mb_discard_inode_preallocations for rhel5
Severity : normal
Bugzilla : 24580 disable dump_trace for rhel6
Severity : normal
Bugzilla : 24580 use inode version in rhel6 server
Severity : normal
Bugzilla : 24580 update ldiskfs patches
Severity : normal
Bugzilla : 24580 ldiskfs for 2.6.32-279
Severity : normal
Bugzilla : 24580 update to 2.6.32-279
Severity : normal
Bugzilla : 24580 long long s_mount_opt for rhel6
Severity : normal
Bugzilla : 24580 deadlock fix
Severity : normal
Bugzilla : 24580 minor conflict resolving
Severity : normal
Bugzilla : 24580 RHEL6 server support
Description: Add RHEL6 server (kernel version is 2.6.32-279.2.1.el6) support. This introduces many changes and new features of ldiskfs (ext4) such as mmp, large EA, fs data in dirent, open file by inode number, etc.
NOTE: This patch only suffice mount and further tuning is needed for other file operations, which will be dealt with in later patches.
Severity : normal
Bugzilla : 19526 conf-sanity test_46a fix
Description: LU-743 conf-sanity: test_46a failure
Details : This failure is because client still didn't see the adding OSTs so it met a problem when decoding lsm because the # of OSTs was over tgt count at the client side.
Severity : normal
Bugzilla : 24645 build kernel debuginfo rpm for sles11sp1
Description: In order to build debuginfo rpm for SLES11 SP1, We need to modify SLES11 kernel spec file in the following way:
- explicitly declare __debug_package as true(1).
- use debugfiles.list as the %files content instead of the default file in spec.
- change the file attributes.
- ignore some missing/unpackaged files while doing rpmbuild.
Also, we need to increase the BUILD_GEN in order to avoid future RPM reuse of the testing builds.
Severity : normal
Bugzilla : 24596 skip metabench for rhel 6.2 nfs client
Description: rhel 6.2 nfs client bug
Details : https://bugzilla.redhat.com/show_bug.cgi?id=790729
Severity : normal
Bugzilla : 24515 test_7 activate osc failed
Description: take into account the possible race between activation from lctl and activation from pinger thread
Severity : normal
Bugzilla : 24580 RHEL6 support in b1_8 branch
Description: RHEL6.2 support along with build code refactor.
Details : This patch is largely based on the patches in the following bugs:
22375 RHEL6 patchless client support.
24089 Avoid reuse cache storage collisions.
24090 Distro and target autodetection.
24091 Find_linux_rpms utility.
24092 Build src.rpm for lustre if requested.
24300 Don't run autogen.sh in the spl and zfs repos.
LU-62 Adds support to build RHEL6 patchless client.
LU-73 Re-org of rhel* build code to max code reuse.
LU-402 Check if dump_trace wants address argument
LU-1116 Update RHEL6.2 kernel to 2.6.32-220.7.1.el6.
For more information, please refer to the individual bug.
Severity : normal
Bugzilla : 22065 ko2iblnd failover deadlock fix
Severity : normal
Bugzilla : 20288 IB bonding & fix kiblnd_check_conns deadlock
Bugzilla : 20153 IB bonding & fix kiblnd_check_conns deadlock
Description: Combined patch for IB bonding issues of Bug 20288 (att 25001) and Bug 20153 (att 26145) from Atul.
Severity : normal
Bugzilla : LU-278 build: Only warn for tag/version mismatch
Description: The configure process should NOT abort just because the most recent tag is not of the form that upstream uses to tag Lustre. Downstream developers may use their own tags, or just add extensions to upsteam's version tags.
Severity : normal
Bugzilla : 24458 files sometimes show up as zero size or missing
Description: LU-274 Update LVB from disk when glimpse callback return error
Details : Client ll_glimpse_callback() could fail to get inode if the inode is already been cleared, and this glimpse callback will fail for -ELDLM_NO_LOCK_DATA, so server should update LVB from disk (in filter_intent_policy()) when it received such error from client.
Severity : normal
Bugzilla : 22281 This patch combines patches from bug 22281
Description: This patch combines all the patches from bug 22281.
Details : It mainly deals with the build subsystem:
- add config opts like --downstream-release, --enable-dist, etc.
- add BUILDID support.
- build lustre with an external ldiskfs package.
Check bug 22281 for details.
Severity : normal
Bugzilla : 24450 new test: check bast timeout serialization
Severity : normal
Bugzilla : 19526 conf-sanity test_46a fix
Description: LU-743 conf-sanity: test_46a failure
Details : This failure is because client still didn't see the adding OSTs so it met a problem when decoding lsm because the # of OSTs was over tgt count at the client side.
Severity : normal
Bugzilla : 24645 build kernel debuginfo rpm for sles11sp1
Description: In order to build debuginfo rpm for SLES11 SP1, We need to modify SLES11 kernel spec file in the following way:
- explicitly declare __debug_package as true(1).
- use debugfiles.list as the %files content instead of the default file in spec.
- change the file attributes.
- ignore some missing/unpackaged files while doing rpmbuild.
Also, we need to increase the BUILD_GEN in order to avoid future RPM reuse of the testing builds.
Severity : normal
Bugzilla : 24646 fix a bug for raid6 driver from upstream
Description: For more info, refer to this link: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=581392
Severity : normal
Bugzilla : 20997 skip peer health check for not router
Description: this is patch from LU-630
Severity : normal
Bugzilla : 24636 compile fix for sles11 when jbd debug is turned on
Severity : normal
Bugzilla : 24376 do not shrink busy pages
Description: llap_shrink_cache_internail() used to avoid shrinking of dirty pages and pages being written. This patch makes it to avoid shrinking pages which are in use.
Severity : normal
Bugzilla : 23206 osc_precreate, osc_create: check OSCC_FLAG_NOSPC after checking for preallocated objects
Severity : normal
Bugzilla : 24531 replace generic_write_sync with ll_write_sync
Description: generic_write_sync() takes inode mutex which leads to deadlock because the mutex is taken now in ll_file_aio_write/ll_file_writev.
Details : replace generic_write_sync() with ll_write_sync() which skips taking of i_mutex
Severity : normal
Bugzilla : 24419 ldlm_pools_shrink algorithm change
Description: -shrink namespaces by batches of 64 namespaces, the batch is implemented as list
-stop shrinking once required number of elements is freed
-have ldlm_pools_recalc to operate with namespaces similar to ldlm_pools_shrink
-use global counters of unused locks on cliens and granted locks on servers to avoid iterating over namespaces
-port b=21519&LU-499, a race between shrink or recalc and namespace_free
Severity : normal
Bugzilla : 24531 vfs locking simplification and lockless i/o for direct i/o
Description: ll_file_write used to lock in the following order:
"lli_write_sem; ldlm extent lock; inode mutex (taken in generic_file_write)".
OTOH, direct I/O read used opposite order: "inode mutex; ldlm extent lock on
server". That led to a deadlock.
Another drawback of that is need to drop inode mutex on truncate before taking ldlm extent lock.
This patch fixes the problem by simplifing the locking with help of using version of generic_file_write routine which does not take inode mutex: "inode mutex; ldlm extent lock". That makes lli_write_sem in write and mutex re-lock in truncate unnecessary.
DIO read takes inode mutex as it used to be.
One more fix is to make sure that in case of DIO read fast lock matching is
avoided. That fixed yet another deadlock between direct i/o reads: those who
got a fast lock locked in order "ldlm lock; inode mutex" while those who ran
lockless reads locked in opposite order: "inode mutex; ldlm lock on server".
Details : The below summarizes read, write, truncate locking rules:
read: trunc sem, ldlm
write: mutex, ldlm
read direct: mutex, server ldlm
write direct: mutex, server ldlm
truncate: mutex, trunc sem, ldlm
Severity : normal
Bugzilla : 24592 ENOSUPP migratepage
Description: rhel6 kernel has "memory compaction" feature which seems to be slighlty inaccurate: it misses setting page->private to 0 for pages allocated for migration.
Details : Detect kernel with that feature and add ENOSUPP migration address space operation as a workaround for the problem
Severity : normal
Bugzilla : 23206 handle_async_create(): do not return ENOSPC if there are preallocated objects
Severity : normal
Bugzilla : 24628 OEL6 support in 1.8 branch
Description: Add OEL6 support in b1_8 branch. Kernel version is 2.6.32-279.2.1.el6.
Severity : normal
Bugzilla : 24580 RHEL6 support in b1_8 branch
Description: Update RHEL6 patchless client kernel to 2.6.32-279.2.1.el6.
Severity : normal
Bugzilla : 23206 return 0 if precreation succeeded even partially
Severity : normal
Bugzilla : 20569 count bad lines correctly
Description: -have parse_buffer() to count lines with bogus headers correctly
-simplification of end of line detection in parse_buffer()
Severity : normal
Bugzilla : 20569 test_170 fix
Description: use perl instead of sed to process binary files properly; verify that bad and good files differ; minor cleanup
Severity : normal
Bugzilla : 24596 skip metabench for rhel 6.2 nfs client
Description: rhel 6.2 nfs client bug
Details : https://bugzilla.redhat.com/show_bug.cgi?id=790729
Severity : normal
Bugzilla : 24515 test_7 activate osc failed
Description: take into account the possible race between activation from lctl and activation from pinger thread
Severity : normal
Bugzilla : 24580 RHEL6 support in b1_8 branch
Description: RHEL6.2 support along with build code refactor.
Details : This patch is largely based on the patches in the following bugs:
22375 RHEL6 patchless client support.
24089 Avoid reuse cache storage collisions.
24090 Distro and target autodetection.
24091 Find_linux_rpms utility.
24092 Build src.rpm for lustre if requested.
24300 Don't run autogen.sh in the spl and zfs repos.
LU-62 Adds support to build RHEL6 patchless client.
LU-73 Re-org of rhel* build code to max code reuse.
LU-402 Check if dump_trace wants address argument
LU-1116 Update RHEL6.2 kernel to 2.6.32-220.7.1.el6.
For more information, please refer to the individual bug.
Severity : normal
Bugzilla : 22065 ko2iblnd failover deadlock fix
Severity : normal
Bugzilla : 20288 IB bonding & fix kiblnd_check_conns deadlock
Bugzilla : 20153 IB bonding & fix kiblnd_check_conns deadlock
Description: Combined patch for IB bonding issues of Bug 20288 (att 25001) and Bug 20153 (att 26145) from Atul.
Severity : normal
Bugzilla : LU-278 build: Only warn for tag/version mismatch
Description: The configure process should NOT abort just because the most recent tag is not of the form that upstream uses to tag Lustre. Downstream developers may use their own tags, or just add extensions to upsteam's version tags.
Severity : normal
Bugzilla : 24458 files sometimes show up as zero size or missing
Description: LU-274 Update LVB from disk when glimpse callback return error
Details : Client ll_glimpse_callback() could fail to get inode if the inode is already been cleared, and this glimpse callback will fail for -ELDLM_NO_LOCK_DATA, so server should update LVB from disk (in filter_intent_policy()) when it received such error from client.
Severity : normal
Bugzilla : 22281 This patch combines patches from bug 22281
Description: This patch combines all the patches from bug 22281.
Details : It mainly deals with the build subsystem:
- add config opts like --downstream-release, --enable-dist, etc.
- add BUILDID support.
- build lustre with an external ldiskfs package.
Check bug 22281 for details.
Severity : normal
Bugzilla : 24450 new test: check bast timeout serialization
Changes from v1.8.6 to v1.8.7
Support for networks:
- socklnd - any kernel supported by Lustre,
- qswlnd - Qsnet kernel modules 5.20 and later,
- openiblnd - IbGold 1.8.2,
- o2iblnd - OFED 1.3, 1.4.1, 1.4.2, 1.5.1 and 1.5.2
- viblnd - Voltaire ibhost 3.4.5 and later,
- ciblnd - Topspin 3.2.0,
- iiblnd - Infiniserv 3.3 + PathBits patch,
- gmlnd - GM 2.1.22 and later,
- mxlnd - MX 1.2.10 or later,
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Server support for kernels:
- 2.6.16.60-0.69.1 (SLES 10),
- 2.6.32.19-0.2.1 (SLES11),
- 2.6.18-194.17.1.el5 (RHEL 5)
- 2.6.18-194.17.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: see "Patchless Client"
2.6.16 - 2.6.32 vanilla (kernel.org)
Recommended e2fsprogs version:
- 1.41.12.2-ora1
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 24548
Severity: normal
Description: regression test: make sure that data written concurrently do not get discarded on file close
Details: write_disjoint.c modification : -- several new options -- minor cleanup (rank=0: open file once; close file at the end; add usage ()); new parallel-scale write_disjoint2 () regression test; new mpi_run() --quiet option to skip lfs df
- Bugzilla: 24450
Severity: normal
Description: comment on top of ptlrpc_check_set() update
Details: ptlrpc_check_set() returns result of set_condition hook if it is defined
- Bugzilla: 24450
Severity: normal
Description: ldlm_run_bl_ast_work: use ptlrpc_set_wait() with condition
Details: ldlm_run_bl_ast_work() sends ASTs in sets of PARALLEL_AST_LIMIT requests and waits for whole set to complete and then sends another set of requests and waits again. If there is a least one request per set which timeouts, we have timeout serialization. This patch changes ldlm_run_bl_ast_work() so that having sent request set it then waits for any of sent requests to complete and refills running request set with requests which are yet to be sent. For a case where number of timeout-ing requests is smaller than PARALLEL_AST_LIMIT it is supposed to eliminate possibility of timeout serailization. This patch uses posibility to specify wait condition for ptlrpc_set_wait() (proposed in https://bugzilla.lustre.org/attachment.cgi?id=33099)
- Bugzilla: 24450
Severity: normal
Description: ptlrpc_set_wait flexibility
Details: ptlrpc_set_wait() waits until all requests in a set complete. This patch makes it possible to specify a condition on which ptlrpc_set_wait() will wait instead of default condition "no remaining requests". With that it wiil be possible to add requests to a set as sent ones complete without waiting for all requests to finish.
- Bugzilla: 22936
Severity: normal
Description: remove wrong assertion
Details: The assertion underestimates exp_refcount of obd_export. The exp_refcount is incremented on adding a lock into export's hash table. For decent RAM there can be millions of locks in memory. Similar problem is reported in 23265, 17924, 24376
- Bugzilla: 22221
Severity: normal
Description: use read-write semaphore for lov_lock
Details: After adding obd_getref() into lov_prep_async_page() it appeared that read performance degradated. lov_getref() uses mutex_down(), so it looks like concurrent reads got stuck on than mutex. This fix replaces the mutex with r/w semaphore, so that reads do not get blocked on it. That cured the performance.
- Bugzilla: 23978
Severity: normal
Description: avoid unnecessary dentry rehashing (v2)
Details: In patchless case the sequence __d_drop(); d_rehash_cond() creates race window where dentry incorrectly looks like unhashed when it is not. If dentry is not unhashed, it seems that rehashing can be avoided.
- Bugzilla: 17764
Severity: normal
Description: accessing files via nfs test
Details: -- add nfsserver MOUNT2 cleanup
- Bugzilla: 22060
Severity: normal
Description: use interval tree to calculate kms
Details: with interval tree of locked extents granted list iteration can be avoided which is supposed to save CPU in case of long granted lock lists
- Bugzilla: 17764
Severity: normal
Description: correct assertion
Details: orphan inode can be reached on mds_open when opening by fid which takes place on accessing files via nfs correct the assertion correspondingly
- Bugzilla: 17764
Severity: normal
Description: accessing files via nfs test
Details: -- new nfsread_orphan_file test -- rmultiop_start(), rmultiop_stop() modification: add possibility to run several multiop_bg on remote node
- Bugzilla: 21937
Severity: normal
Description: never resend glimpse ASTs
Details: when a connection to client fails glimpse ast gets resend endlessly as the request does not have rq_noresend flag. Set the flag to avoid resends.
- Bugzilla: 21812
Severity: normal
Description: generate warnings in case of discarding dirty pages
Details: When a client is evicted, dirty pages may get silently discarded. The caller of successful write(2) will not know that the data he wrote have been discarded due to eviction before they can be flushed to the OSS. With this patch system administrator gets warned about dirty page discard.
- Bugzilla: 23858
Severity: normal
Description: do not compare unsigned < 0
Details: this is also supposed to catch overflow of lqs_bwrite_pending
- Bugzilla: 24423
Severity: normal
Description: ext3_dx_find_entry: check directory entry consistency before ext3_match
Details: to avoid getting into infinite loop when directory block contains wrong data
- Bugzilla: 24141
Severity: normal
Description: llite: -EIO instead of LBUG for multi-referenced object
Details: Whenever an inode is used with a DLM lock, the client checks that no other inodes are referencing the same OST object, since this is a sign of filesystem corruption on the MDS (or some other code bug that behaves in this way). If the client detected the same OST object is referenced from multiple inodes at the same time, it will LASSERT() and print a message to this effect, rather than continue to corrupt the data files. osc_set_data_with_check() ASSERTION(old_inode->i_state & I_FREEING) failed: Found existing inode ffff880587d15d10/222311317/67781718 state 0 in lock: setting data to ffff88046b7f8d50/223489633/67781099 Instead of LASSERTing on this condition, instead return EIO for this file. This allows the problem to be analyzed and fixed without the need to reboot the client node.
- Bugzilla: 24264
Severity: normal
Description: Avoid corropt ldiskfs after MD rebuild on RHEL5/CentOS5.
- Bugzilla: 24546
Severity: normal
Description: limit bio size to BIO_MAX_PAGES
Details: this is neede because bio_alloc_bioset()->bvec_alloc_bs() refuses to allocate bigger bio-s
- Bugzilla: 19944
Severity: normal
Description: set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node (LU-196)
Details: The current set_default_debug_nodes() could not pass the values of $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE to the remote nodes while they are specified from the command line on the local node. This patch is to fix this issue.
- Bugzilla: 24437
Severity: normal
Description: fix deadlock caused by original fix b=24525 (LU-146)
Details: Get open lock inside mds_get_parent_child_locked() to avoid deadlock. Never get open lock if child is newly created to avoid deadlock.
- Bugzilla: 24548
Severity: normal
Description: fix v1
Details: canceling lock may contain data being sent to OSTs. Change find_cbdata iterator to take that into account
- Bugzilla: 24303
Severity: normal
Description: kernel BUG at fs/inode.c:323!
Details: workaround patch to avoid the race at truncate_inode_pages_range()
- Bugzilla: 24508
Severity: normal
Description: racer: general protection fault (LU-286)
- Bugzilla: 23485
Severity: normal
Description: fsync for directories
- Bugzilla: 23884
Severity: normal
Description: allow lnet to talk to gnilnd
- Bugzilla: 24490
Severity: normal
Description: obdfilter-survey cleanup
- Bugzilla: 24050
Severity: normal
Description: add an -s option to set an altenative order of services start
Details: -s start services in the order MGS->OST(s)->MDT(s). The default order is MGS->MDT(s)->OST(s).
- Bugzilla: 22638
Severity: normal
Description: add lst stat --count
- Bugzilla: 21103
Severity: normal
Description: ORNL LCE Router features\fixes
Details: Only squawk when md->start is NULL on non-zero length v2
- Bugzilla: 24512
Severity: normal
Description: lfs find -s doesn't seem to work quite with >2GB args
Details: fix the wrong size type in find_value_cmp()
- Bugzilla: 22221
Severity: normal
Description: client nodes crash on fs with inactive OST
Details: take lov reference in lov_prep_async_page()
- Bugzilla: 20831
Severity: normal
Description: replay-dual: ldlm_lock.c:1622:ldlm_lock_cancel()) LBUG type: PLN
Details: fix a race between do_requeue and client_disconnect_export
- Bugzilla: 24032
Severity: normal
Description: add lctl push
- Bugzilla: 18750
Severity: normal
Description: remove OBD_CHECK_FAIL_CHECK_ONCE
- Bugzilla: 24464
Severity: normal
Description: Load Lustre modules before mounting targets to avoid race conditions.
- Bugzilla: 24498
Severity: normal
Description: wait_osc_import_state () fixes
Details: -- increase maxtime to wait the timeout of 1st request; take into account at_min value; -- cleanup wait_osc_import_state () to use _wait_import_state (); -- ost-pools test_1 fix: use local var instead of global NAME
- Bugzilla: 24504
Severity: normal
Description: sanity test_133* and check_stats() fix
- Bugzilla: 24487
Severity: normal
Description: canonicalize the devices names
- Bugzilla: 21047
Severity: normal
Description: ->commit should always be called after successful ->prep on b1_8
Changes from v1.8.5 to v1.8.6
Support for networks:
- socklnd - any kernel supported by Lustre,
- qswlnd - Qsnet kernel modules 5.20 and later,
- openiblnd - IbGold 1.8.2,
- o2iblnd - OFED 1.3, 1.4.1, 1.4.2, 1.5.1 and 1.5.2
- viblnd - Voltaire ibhost 3.4.5 and later,
- ciblnd - Topspin 3.2.0,
- iiblnd - Infiniserv 3.3 + PathBits patch,
- gmlnd - GM 2.1.22 and later,
- mxlnd - MX 1.2.10 or later,
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Server support for kernels:
- 2.6.16.60-0.42.8 (SLES 10),
- 2.6.27.39-0.3.1 (SLES11),
- 2.6.18-194.3.1.el5 (RHEL 5)
- 2.6.18-194.3.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: see "Patchless Client"
2.6.16 - 2.6.30 vanilla (kernel.org)
Recommended e2fsprogs version:
- 1.41.12.2-ora1
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 19064
Severity: normal
Description: Allow OSTs to be created with no primary node (LU-57)
Details: Add a --servicenode parameter for mkfs.lustre to treat all service nodes equally.
- Bugzilla: 23935
Severity: normal
Description: append truncate race
- Bugzilla: 21847
Severity: normal
Description: obdfilter-survey: Syntax error in some locales
- Bugzilla: 21501
Severity: normal
Description: Properly cleanup flock lock on disconnect
Details: Properly wakeup flock waiters on eviction. Destroyed lock for flock completion ast is not an error, return success to avoid double lock decref.
- Bugzilla: 24437
Severity: normal
Description: revoke open lock for executable files if needed
Details: When a normal lustre client open write/exec a file, the open lock on that file needs to be revoked in case an NFSD lustre client still holds it.
- Bugzilla: 22729
Severity: normal
Description: Remove LPSZ & LPSSZ
Details: Code cleanup patch for 1.8 which removes the use of LPSZ/LPSSZ to improve the build portability.
- Bugzilla: 24418
Severity: normal
Description: run autogen if a Makefile.am is patched (LU-53)
- Bugzilla: 21137
Severity: normal
Description: Sles11 with 1.8 is slower than 1.6 sles10 for O_DIRECT single file IOR writes
Details: Fix ptlrpc_main() condition to start service threads correctly.
- Bugzilla: 23049
Severity: normal
Description: t-f do_node() VERBOSE fix
- Bugzilla: 24479
Severity: normal
Description: files and dirs missing in dist tarball (LU-92)
Details: Some files and dirs are missing in the "dist" tarball.
- Bugzilla: 19494
Severity: normal
Description: "lfs find" hangs when searching for an OST index
Details: - new test_88 "lfs find identifies the missing striped file segments" - exit_status () egrep pattern fix
- Bugzilla: 24194
Severity: normal
Description: increase reseed count to mitigate inconsistence in OST allocation
Details: in alloc_rr, "LOV_CREATE_RESEED_MULT" and "LOV_CREATE_RESEED_MIN" is increased to mitigate the inconsistence in OST allocation.
- Bugzilla: 24451
Severity: normal
Description: racer test cleanup
Details: - modify racer/racer.sh to wait the process killed, exit 1 if the process are still existing; - remove runracer;
- Bugzilla: 19649
Severity: normal
Description: sanity test_77j fix
- Bugzilla: 24426
Severity: normal
Description: add ERRLOG suffix to not ovewrite the lustre logs
- Bugzilla: 24420
Severity: normal
Description: avoid an LASSERT on recovery
- Bugzilla: 24375
Severity: normal
Description: Fix a race between completion and enqueue
Details: ldlm_enqueue_tail does not obtain proper lockng when checking lock mode to see if the lock is granted, so there is a window where ldlm_handle_completion_ast can update lvb with correct data, but beforeit has a chance to update the lock mode, the ldlm_enqueue_tail will check the lock mode and since the lock is not granted yet, it will overwrite correct lvb with stale value from enqueue time.
- Bugzilla: 24050
Severity: normal
Description: fix lustre_start to start server targets in the order of MGS->MDT->OST(s)
- Bugzilla: 24426
Severity: normal
Description: run_one(): run error() once
Details: there is no reason to run error() (and lctl dk thereby) more than once. second lctl dk overwrites the most important logs obtained on first lctl dk
- Bugzilla: 23787
Severity: normal
Description: Modified struct lprocfs_percpu to be C99 compliant.
- Bugzilla: 24432
Severity: normal
Description: mount_lustre.c/parse_options() fix to differentiate between 'force*' and 'force'
- Bugzilla: 22168
Severity: normal
Description: write-append-truncate: retry write when receives EINTR.
- Bugzilla: 22984
Severity: normal
Description: change all references to tune.ldiskfs in lustre to tunefs.ldiskfs
- Bugzilla: 21135
Severity: normal
Description: calculate Use% for "lfs df" the same way as standard "df"
- Bugzilla: 19944
Severity: normal
Description: adjust debug size to be -gt num_possible_cpus()
- Bugzilla: 23670
Severity: normal
Description: exit_status () fix
- Bugzilla: 23430
Severity: normal
Description: fix sanity-quota test 14a to write file in O_DIRECT mode
- Bugzilla: 24374
Severity: normal
Description: lov_dump_user_lmm_header () fix
- Bugzilla: 23064
Severity: normal
Description: create proper macro check for bdi interface
- Bugzilla: 14846
Severity: normal
Description: dynamically grow/shrink connd threads pool
Details: if multiple nodes are down, all socklnd connds could be blocked for a long while, we can workaround this by increase default nconnds but it always requires to have unnecessary number of threads. This patch can support dynamically grow/shrink connd threads pool, it can create new thread if there's pending active connecting, it will kill some threads if there are too many idle connds.
- Bugzilla: 24218
Severity: normal
Description: fix contention on ksock_tx_t
Details: If a connection is closed before ksocknal_transmit() returns to ksocknal_process_transmit(), then nobody has refcount on conn::ksnc_sock and all pending ZC requests will be finalized by ksocknal_connsock_decref->ksocknal_finalize_zcreq, ksocknal_finalize_zcreq will mark not-acked ZC request as error by setting tx::tx_reside = -1. This is race because ksocknal_process_transmit() will check tx::tx_resid right after calling ksocknal_transmit(), and it can get tx->tx_resid != 0 and rc == 0 then hit later LASSERT(rc < 0).
- Bugzilla: 23983
Severity: normal
Description: mmp test_10 fix
- Bugzilla: 23499
Severity: normal
Description: ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0)
Details: In lprocfs_exp_setup(), we need release old stats in all cases.
- Bugzilla: 23729
Severity: normal
Description: cancel_lru_locks not working cause some locks are still in cache from mmap files
Details: Fix sanity-benchmark.sh to remove files after fsx otherwise client keeps locks acquired for mmap files in cache.
- Bugzilla: 21581
Severity: normal
Description: change wrong URL
- Bugzilla: 21581
Severity: normal
Description: Fix a typo. Add Fedora for the yum cases per Andreas. (LU-47)
- Bugzilla: 24427
Severity: normal
Description: hopefully the last libcfs_memory_pressure_* fix for liblustre
- Bugzilla: 24427
Severity: normal
Description: another userspace fix for libcfs_memory_pressure_restore()
- Bugzilla: 24427
Severity: normal
Description: define libcfs_memory_pressure_get for userspace
- Bugzilla: 21581
Severity: normal
Description: too long file / path names for old tar
Details: Instruct automake to use tar's ustar format to prevent errors when pathnames are longer than 99 characters. - this requires automake >= 1.9, so adjust accordingly - including dealing with multiple versions of automake installed
- Bugzilla: 24410
Severity: normal
Description: exit with error if NFSCLIENT is set, but no nfs export found
- Bugzilla: 24388
Severity: normal
Description: remove files inadvertently added by previous commit
- Bugzilla: 24388
Severity: normal
Description: sgpdd-survey fix: use node_var_name () for variables
- Bugzilla: 21776
Severity: normal
Description: Set PF_MEMALLOC on outgoing path to prevent deadlock on memory allocation under pressure
- Bugzilla: 22980
Severity: normal
Description: init_logging does not exist in 1.8
- Bugzilla: 24417
Severity: normal
Description: Update Build-Depends
Details: - remove texlive-latex-recommended as a build requirement - add missing "| automake1.7 | automake1.8 | automake1.9" to debian/control.main
- Bugzilla: 24416
Severity: normal
Description: debian packaging fixes
Details: - don't make a patch out of anything in /debian - exclude noise files from the debian built source tarball - fake debian/patche{s,d} for make dist - a few more reasons to run autogen.sh - figure out if dist tarball needs autogen.shs and include it if so - look for and run autogen.sh in the build subdir - make debdiff as part of make dist - add a debian/source/format file - mv the orig tarball and the debdiff to the debs dir - don't try to dist /debian for non-dpkg-using build targets
- Bugzilla: 24413
Severity: normal
Description: fix for automake > 1.9.6
Details: We seem to be using a Makefile variable that does not exist in more recent versions of automake. This fixes that problem.
- Bugzilla: 22980
Severity: normal
Description: Support unlocked_ioctl
Details: Adding 'unlocked_ioctl' for performance sensitive ioctls, such as OBD_IOC_BRW_READ/WRITE
- Bugzilla: 24320
Severity: normal
Description: do not fork a new thread in mem pressure
Details: we already check for PF_MEMALLOC in ldlm shrinker and pass this flag to the blocking thread, but a new thread start was still done with no check for this flag.
- Bugzilla: 24245
Severity: normal
Description: fix SA perf test to support SA disabled by default
- Bugzilla: 17275
Severity: normal
Description: make lustre client less verbose at startup time for Cray
- Bugzilla: 24360
Severity: normal
Description: fix NULL pointer deref in mds_verify_child() when ll_lookup_one_len() fails
- Bugzilla: 20563
Severity: normal
Description: Fix fid_flatten() after 1 trillion SEQ numbers
Details: Fix the fid_flatten() function to properly handle FID mapping to 64-bit inode numbers, after the first 1 trillion SEQ numbers have been granted out. Even with CMD this would only happen after 1024 MDTs have each had 1B client mounts, so there is little risk of introducing collisions as a result of this change, and at worst this is a client-local phenomenon that is not persistent.
- Bugzilla: 20563
Severity: normal
Description: Fix fid_flatten32() to not lose OID bits
Details: The original implementation of fid_flatten32() was broken due to an error in the shift calculation (note to self - "0x00" is 8 bits, not 16 bits). This could negatively impact 32-bit clients that were creating more than 64k files in the same directory. This 32-bit inode number is visible only within a single client mount, is not used in any persistent storage, and only if a 2.x server is in use (which is basically none today) by a 32-bit client, so there is no issue to change it at this time.
- Bugzilla: 22660
Severity: normal
Description: Return kernel's locking return code to when lustre reports success
- Bugzilla: 23352
Severity: normal
Description: modified value of at_min is not taken into account
Details: xxx
- Bugzilla: 22378
Severity: normal
Description: Correct MDS client stats
Details: sanity test_133b fails with "The getattr counter on mds is wrong" message.
- Bugzilla: 15962
Severity: normal
Description: disable statahead by default due to important races found in the code
- Bugzilla: 22882
Severity: normal
Description: MMP might sleep negative time
- Bugzilla: 21456
Severity: normal
Description: Patch to support lnet v1 pings in 'lctl ping'
- Bugzilla: 23988
Severity: normal
Description: Remove sd iostats patch from sles11 patch series
- Bugzilla: 24039
Severity: normal
Description: actually add exit_traps.sh to EXTRA_DIST
- Bugzilla: 23122
Severity: normal
Description: make exit_traps.sh executable
Details: While bug 24093 added exit_traps.sh to the make dist list, it is not an executable file to start with. Fix this in the git repo.
- Bugzilla: 24093
Severity: normal
Description: not all build files/scripts being distributed
Details: Some files that need to be are not being included in the tarball when make dist is being run.
- Bugzilla: 24087
Severity: normal
Description: reverse order of $LINUX{,_OBJ}/include
Details: It is important that /usr/src/linux-...-obj/include is searched for includes before /usr/src/linux-.../include so that the inclusion of "include/linux/autoconf.h" picks up the one for the kernel we are trying to build against, and not the one for the currently running kernel, which is what is in /usr/src/linux-.../ copy is.
- Bugzilla: 24294
Severity: normal
Description: test_pios: take the ost-s sizes into account remove obsolete workaround bug19657 part
- Bugzilla: 23793
Severity: normal
Description: MOUNTOPT "-o" cleanup
- Bugzilla: 23051
Severity: normal
Description: improve summary of acc-sm to include test times
Details: acceptance-small test suites name cleanup: - rename sanityN -> sanityn, lfscktest -> lfsck - add racer.sh, liblustre.sh scripts - remove fsx,bonnie,dbench,iozone.lfsck parts
- Bugzilla: 23051
Severity: normal
Description: improve summary of acc-sm to include test times
- Bugzilla: 23081
Severity: normal
Description: Move llap page to tail instead of head.
- Bugzilla: 24226
Severity: normal
Description: typo fix for sanity test 72
- Bugzilla: 20394
Severity: normal
Description: correct check for transno value in filter_finish_transno
- Bugzilla: 24048
Severity: normal
Description: Set body->eadatasize in mdc_getattr_pack()
- Bugzilla: 18717
Severity: normal
Description: make "lfs check" output consistent on stdout
- Bugzilla: 23049
Severity: normal
Description: canonicalize disk names
- Bugzilla: 23049
Severity: normal
Description: various t-f.sh patches
Details: rundbench is a bash script; obdfilter-survey is a bash script; don't su if MPI_USER == "";
- Bugzilla: 23049
Severity: normal
Description: set path to truncate
- Bugzilla: 22544
Severity: normal
Description: delete module_setup.sh
- Bugzilla: 24039
Severity: normal
Description: lfs setstripe --pool broken
- Bugzilla: 24239
Severity: normal
Description: use SAMPLE_FILE instead of termcap
- Bugzilla: 24266
Severity: normal
Description: increase replay-single test_70d dbench duration for HARD failure mode
- Bugzilla: 24226
Severity: normal
Description: Only force the mode change if we're changing the size as well
Details: The offending code was added by commit 77ba4b2141d04180211efa8a75c11ab0abf7fafb to remove setgid/setuid bits when do_truncate() is called on the file. We should only force the change when that occurs, similarly to ll_setattr() in lustre/llite/llite_lib.c
- Bugzilla: 19808
Severity: normal
Description: fix d_obtain_alias() misuse due to compat macro
- Bugzilla: 24055
Severity: normal
Description: a patch to detect if quota is turned on properly
- Bugzilla: 22546
Severity: normal
Description: fix errors in test_18c
- Bugzilla: 24245
Severity: normal
Description: skip sanity test 123 under 1.8 <-> 2.x interoperability mode
Details: statahead is disabled automatically under 1.8 <-> 2.x interoperability mode
- Bugzilla: 23821
Severity: normal
Description: Limit bio_alloc() to BIO_MAX_PAGES iovecs.
Details: Fix logic error when patch was originally landed from b=9945.
- Bugzilla: 23786
Severity: normal
Description: make lh_exit code C99 compliant
Details: Based on the patch from Kenneth D. Matney, Sr. <matneykdsr@ornl.gov>
- Bugzilla: 23157
Severity: normal
Description: do not crash on wrong network message in filter_connect_internal
- Bugzilla: 24270
Severity: normal
Description: need to mkdir mntpt before mount
- Bugzilla: 16605
Severity: normal
Description: don't LASSERT on unverified client data in filter_parent
- Bugzilla: 13698
Severity: normal
Description: llapi_get_version
Details: this uses OBD_GET_VERSION ioctl to obtain lustre version
- Bugzilla: 23961
Severity: normal
Description: fix for setup with several network interfaces
Details: - metadata-updates fix for setup when several interfaces are UP on host; hostname could be assigned to IP which is different from lnet network used, the hostname-s of NODES_TO_USE are now stored in HOSTS - new SHUTDOWN_ATTEMPTS: the tunable number of attepts to shutdown node - shutdown_node_hard () fix: do not call "power off" each time, wait that the node is not pingable before the next "power off" attempt - unused check_port() is removed
- Bugzilla: 4424
Severity: normal
Description: Reserve obd_connect_data.ocd_max_easize field
Details: To avoid potential incompatible changes between b1_8 and master, reserve the ocd_max_easize field. The corresponding connect flag OBD_CONNECT_MAX_EASIZE has been reserved for some time already. Add several other OBD_CONNECT_ flags that have already been defined to the wirecheck/wiretest tools.
- Bugzilla: 22376
Severity: normal
Description: sanity test for non-root exec-only file execution
- Bugzilla: 23766
Severity: normal
Description: interop bits for sanity/203
- Bugzilla: 24118
Severity: normal
Description: test_70b rundbench load failed
Details: - give rundbench a chance to start before the dbench load check - new check_for_process () and killall_process () to check/kill any defined progs instead of "dbench" only - fix 70a, 70b to mount the clients on MOUNT instead of DIR
- Bugzilla: 24228
Severity: normal
Description: fix test duration check to be more accurate
- Bugzilla: 23535
Severity: normal
Description: sgpdd-survey.sh should check for sg_map
Details: check that iokit sgpdd-survey and sg_map are installed
- Bugzilla: 22157
Severity: normal
Description: combined mgs/mds fix for single node setup
Details: for configuration combined mgs/mds on single node setup we do not need to unload the modules because conf-sanity keeps the mgs mounted during all tests
- Bugzilla: 23402
Severity: normal
Description: mmp_fini () multiple oss fix
- Bugzilla: 23575
Severity: normal
Description: O2iblnd credit deadlock regression
Details: This fixed a regression of bug 14425.
- Bugzilla: 23868
Severity: normal
Description: fix "sanity-quota test_18c: @@@@@@ FAIL: quotaon failed!"
- Bugzilla: 23954
Severity: normal
Description: MGS device has stopped when we try to start the second mgs
Details: add test_24b to ALWAYS_EXCEPT list for configuration mgs/mds are not combined
- Bugzilla: 23869
Severity: normal
Description: HARD failure mode fixes
Details: facet_failover() has to restart only those affected facets which were UP before the node failure. replay-single tests which use shutdown_facet() && reboot_facet() instead of facet_failover() have to take care about the affected facets
- Bugzilla: 23956
Severity: normal
Description: change conf-sanity test_37 to be functional on remote setup
Details: fix test_37 to not be skipped on remote setup; use the existing mds device instead of create a new one
- Bugzilla: 24020
Severity: normal
Description: lustre doesn't start with ext4 based ldiskfs.
- Bugzilla: 24201
Severity: normal
Description: add procfs tunable to enable/disable lockless direct I/O
Details: llite.lustre-*.lockless_direct_io=0 will disable default semantics of direct I/O that forces it to be lockless. lockless_direct_io value, however, will be ignored if per-file LL_FILE_LOCKED_DIRECTIO bit is set.
- Bugzilla: 21804
Severity: normal
Description: make sure the request is protected by rq_refcount while
- Bugzilla: 21760
Severity: normal
Description: start bulk unregistering at the same time as reply unlink
- Bugzilla: 23820
Severity: normal
Description: ptlrpc_check_set()) ASSERTION(req->rq_phase == RQ_PHASE_BULK) failed
Details: Handle unsent requests with rq_net_err in ptlrpc_check_set().
Changes from v1.8.4 to v1.8.5
Support for networks:
- socklnd - any kernel supported by Lustre,
- qswlnd - Qsnet kernel modules 5.20 and later,
- openiblnd - IbGold 1.8.2,
- o2iblnd - OFED 1.3, 1.4.1, 1.4.2, 1.5.1 and 1.5.2
- viblnd - Voltaire ibhost 3.4.5 and later,
- ciblnd - Topspin 3.2.0,
- iiblnd - Infiniserv 3.3 + PathBits patch,
- gmlnd - GM 2.1.22 and later,
- mxlnd - MX 1.2.10 or later,
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Server support for kernels:
- 2.6.16.60-0.69.1 (SLES 10),
- 2.6.32.19-0.2.1 (SLES11),
- 2.6.18-194.17.1.el5 (RHEL 5)
- 2.6.18-194.17.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: see "Patchless Client"
2.6.16 - 2.6.30 vanilla (kernel.org)
Recommended e2fsprogs version:
- 1.41.10-sun2
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 23179
Severity: normal
Description: MDS node unresponsive
Details: improve hash distribution, doubled hash size for the lnet cookie hash.
- Bugzilla: 23683
Severity: normal
Description: Bring upstream patch for ldiskfs.
- Bugzilla: 21610
Severity: normal
Description: add support for SLES11 SP1
- Bugzilla: 23766
Severity: normal
Description: atime is not properly updated on an MDS
- Bugzilla: 22514
Severity: enhancement
Description: Update to RHEL5.5 kernel 2.6.19-194.17.1.el5. Update to OEL5.5 kernel 2.6.19-194.17.1.0.1.el5.
- Bugzilla: 20744
Severity: enhancement
Description: Update to SLES10 SP3 kernel 2.6.16.60-0.69.1.
- Bugzilla: 20744
Severity: normal
Frequency : only with SLES10
Description: Use OFED "KMP" provided by Novell
Details: SLES10 SP3 ships with OFED in a separate "KMP" package. Lustre is now built against this package. That means you need to install the ofed-kmp package from Novell for the patchless client and from our download site for the server. Note that the ofed-kmp that Novell ships may not exactly match the kernel version but should still be compatible.
- Bugzilla: 21610
Severity: enhancement
Description: Update SLES11 SP1 kernel to 2.6.32.19-0.2.1.
- Bugzilla: 21174
Severity: normal
Description: Enabling quotas fails with non-consecutive OST numbering.
- Bugzilla: 23645
Severity: normal
Description: Fix kernel warning due to lookup_on_len() called without i_mutex hold.
- Bugzilla: 23596
Severity: normal
Description: Account direct i/o inflight rpcs separately from non-direct i/o so that direct i/o, which is limited by max_rpcs_in_flight, should not block non-direct i/o, which is not limited by max_rpcs_in_flight.
- Bugzilla: 23827
Severity: normal
Description: Fix per-NID reporting on outstanding writes
- Bugzilla: 23701
Severity: normal
Description: Reduce stack pressure by uninlining some mds and ptlrpc functions.
- Bugzilla: 22770
Severity: normal
Description: Remove LASSERT in lprocfs_rd_conn_uuid() since conn == NULL is a legitimate case.
- Bugzilla: 23781
Severity: normal
Description: fix obdo leak issue in ll_setattr_raw()
- Bugzilla: 22117
Severity: normal
Description: limit MMP interval
- Bugzilla: 20101
Severity: enhancement
Description: add several lfs ost enhancements
- Bugzilla: 22820
Severity: normal
Description: Too many default ACLs break directory access on new directories
- Bugzilla: 23174
Severity: normal
Description: Lustre inode size is not coherent across nodes.
Details: Update lvbo from disk when AST fails with EINVAL. Lvbo is updated on EINVAL error in ldlm_handle_ast_error(). The updates in filter_intent_policy() and ldlm_cb_interpret() have been removed as redundant.
- Bugzilla: 23503
Severity: normal
Description: Oops at __percpu_counter_add+0x1b
Details: Use bdi_init()/bdi_destroy() to proper initialize backing_dev_info structure.
- Bugzilla: 20563
Severity: normal
Description: add mount option to generate 32bit ino, this can be used for 32bit application compatibility.
- Bugzilla: 22935
Severity: normal
Description: keep reference count for "lli_sai" to prevent it to be released when "statahead_enter()"
- Bugzilla: 21174
Severity: normal
Description: allow quotacheck over OSTs with sparse indices
- Bugzilla: 22891
Severity: normal
Description: Objects not getting deleted for files which have been removed
Details: ll_have_md_lock() should differentiate between CR and CW OPEN locks.
- Bugzilla: 22107
Severity: normal
Description: pin object's inode in memory to avoid certain timeouts
Details:
- Bugzilla: 21745
Severity: normal
Description: fix LBUG when obdfilter-survey is interrupted.
Changes from v1.8.3 to v1.8.4
Support for networks:
- socklnd - any kernel supported by Lustre,
- qswlnd - Qsnet kernel modules 5.20 and later,
- openiblnd - IbGold 1.8.2,
- o2iblnd - OFED 1.3, 1.4.1, 1.4.2 and 1.5.1
- viblnd - Voltaire ibhost 3.4.5 and later,
- ciblnd - Topspin 3.2.0,
- iiblnd - Infiniserv 3.3 + PathBits patch,
- gmlnd - GM 2.1.22 and later,
- mxlnd - MX 1.2.10 or later,
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Server support for kernels:
- 2.6.16.60-0.42.8 (SLES 10),
- 2.6.27.39-0.3.1 (SLES11),
- 2.6.18-194.3.1.el5 (RHEL 5)
- 2.6.18-194.3.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: see "Patchless Client"
2.6.16 - 2.6.30 vanilla (kernel.org)
Recommended e2fsprogs version:
- 1.41.10-sun2
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 18456
Severity: normal
Description: Reduce group prealloc size and skip groups with little free space.
- Bugzilla: 22237
Severity: normal
Description: Fix issue with proc_remove.
- Bugzilla: 23368
Severity: normal
Description: Disable delayed allocation by default for ext4-based ldiskfs on RHEL5.5
- Bugzilla: 23368
Severity: normal
Description: A mount failure can corrupt the slab. This is a bug in the latest RHEL5.5 kernel and only ext4-based ldiskfs is impacted.
- Bugzilla: 23076
Severity: normal
Description: With peer health detection, o2iblnd makes only one attempt to reconnect which is not enough with nodes running lustre 1.6 because of proto version mismatch. Fix o2iblnd to retry one more time.
- Bugzilla: 22771
Severity: normal
Description: add mount option to disable mb_cache since it can cause slowdown.
- Bugzilla: 16909
Severity: enhancement
Description: Quiet some LNET messages
- Bugzilla: 22787
Severity: enhancement
Description: Add OFED 1.5.1 support
- Bugzilla: 21678
Severity: enhancement
Description: The peer health code lacked some important debugging info in lnd_query code paths. We've added necessary debug prints, not just for bug 21678, but also for future troubleshooting.
- Bugzilla: 22514
Severity: enhancement
Description: Update RHEL5.5 kernel to 2.6.18-194.3.1.el5 and OEL5.5 kernel to 2.6.18-194.3.1.0.1.el5.
- Bugzilla: 22514
Severity: enhancement
Description: using inkernel OFED stack for rhel5 & oel5.
- Bugzilla: 22481
Severity: enhancement
Description: Add "lfs_migrate" script from manual into lustre/scripts and RPMs
Details: lfs_migrate does a "poor man's" migration of files from their current OST layout to a new OST layout as chosen by the MDS.
- Bugzilla: 22679
Severity: normal
Description: mds_orphan_add_link()) error linking orphan to PENDING
Details: quota limits might disallow linking orphans to PENDING when unlinking a file - temporary raise threads' privileges when processing unlinks.
- Bugzilla: 15253
Severity: enhancement
Description: add conf-param -d option to remove permanent settings.
Details: Add the ability to remove permanent lctl conf_param settings. (Previously conf_param settings could only be changed, not removed.) This also provides a method to change failover nid locations. Improve lctl man page.
- Bugzilla: 22455
Severity: enhancement
Description: add list_param to b1_8 and add "-R" option to list params recursively
- Bugzilla: 22194
Severity: enhancement
Description: lfs quota output is not very convenient for awk/sed-parsing
Details: Some positions in lfs quota output table could be empty or non-empty which made it hard to parse it with scripts, now a dash is put instead of space where there is not supposed to be any data.
- Bugzilla: 15685
Severity: enhancement
Description: fix obdfilter-survey script to work properly with remote oss-s
- Bugzilla: 22402
Severity: enhancement
Description: add new OBDFILTER_SURVEY test suite
- Bugzilla: 20326
Severity: enhancement
Description: add new multiple mount protection (MMP) test suite
- Bugzilla: 21647
Severity: enhancement
Description: add support for async journal commit in echo client
- Bugzilla: 21244
Severity: enhancement
Description: allow userland programs to include <lustre/lustre_idl.h> from stardard include directories
- Bugzilla: 18399
Severity: enhancement
Description: The prune-icache-use-trylock is no longer needed now that the patch from bug 20008 is landed.
- Bugzilla: 22755
Severity: normal
Description: The shrink grant feature is still active on the client although the connect flag is not set.
- Bugzilla: 22755
Severity: normal
Description: Don't leak grant space if the write failed with quota exceeded.
- Bugzilla: 22755
Severity: normal
Description: Don't consume grant space twice on recoverable resent.
- Bugzilla: 22610
Severity: normal
Description: a race condition could lead to SIGBUS being sent to an application using mmap-ped files from Lustre
Details: truncate_complete_page implementation for the patchless client could arbitrarily unset PG_Uptodate flag for a page being kicked from the page cache, an uptodate check right after a readpage call in filemap_fault could fail because of that as though the page read had been unsuccessful.
- Bugzilla: 22476
Severity: normal
Description: dlm lock slab shrinking is not efficient
Details: The dlm_locks slab can grow significantly and consumes a lot of memory on the server. Set a hardlimit to grant_plan.
- Bugzilla: 22850
Severity: normal
Description: Lustre does not do 1MB IOs to HW RAID
Details: Bump MAX_PHYS/HW_SEGMENTS and SG_ALL to 256 in the RHEL5 kernel. This is what we do already for SLES kernels.
- Bugzilla: 22223
Severity: normal
Description: bump maximum number of phys/hw segments in the SLES11 kernel until s/g chaining works properly.
- Bugzilla: 17086
Severity: normal
Description: LSI Fusion MPT driver hacks to improve performance
Details: Set CONFIG_FUSION_MAX_SGE to 256 for RHEL5
- Bugzilla: 22509
Severity: enhancement
Description: increase default md stripe_cache_size to 16k
Severity: normal
Description: don't handle security.capability xattr
Details: CONFIG_SECURITY_FILE_CAPABILITIES is enabled by default on SLES11. This results in additional getxattr calls, causing VBR testfailures as well as a preformance drop when writing.
- Bugzilla: 22749
Severity: normal
Description: obdfilter-survey is no longer working
Details: revert patch from bug 20355 to resolve an issue with lctl --threads not working correctly with $(PTHREAD_LIBS) being linked to lctl.
- Bugzilla: 22786
Severity: normal
Description: ll_shrink_cache does not handle __GFP_FS properly
- Bugzilla: 19102
Severity: normal
Description: lfs getstripe shows wrong info for directories
Details: Set correct LOVEA default values for filesystem-wide.
- Bugzilla: 11742
Severity: normal
Description: FSX checksum false positves due to mmap IO
Details: Use OBD_FL_MMAP flag for IOs on a memory mapped file. Do not print checksum errors, if the flag is set on a request.
- Bugzilla: 22360
Severity: normal
Description: file operations after eviction have successful return values
Details: use vfs ->flush callback to return any pending async errors on file close.
- Bugzilla: 20433
Severity: normal
Description: mdsrate fails to write after 1.3+M files opened
Details: decrease memory usage on clients by recycling dentries and inodes.
- Bugzilla: 17382
Severity: normal
Description: obdfilter-survey gives unreasonably high numbers
Details: Wait for all threads to complete when running test_brw.
- Bugzilla: 22299
Severity: normal
Description: do not set lustre read_only device when server umount and keep client records for recoverable ones
- Bugzilla: 22241
Severity: normal
Description: move sync_on_lock_cancel tunable to the obdfilter layer
Details: move the tunable to trigger a journal flush on lock cancel from the ost layer to the obdfilter layer. This tunable is useful when using the async journal commit feature.
- Bugzilla: 21871
Severity: normal
Description: exp->exp_nid_stats == NULL in filter_tally()
Details: fix race with per-nid stats by delaying procfs cleanup until exp_refcount == 0
- Bugzilla: 21556
Severity: normal
Description: extent lock cancellation on client can keep the cpu busy for too long.
- Bugzilla: 22658
Severity: normal
Description: Do not fail OST activation when a llog is not found, just issue an error message.
- Bugzilla: 22911
Severity: normal
Description: Don't enable extents by default for MDT.
- Bugzilla: 21877
Severity: normal
Description: Protect bitfield access to ptlrpc_request's rq_flags, since the AT code can access it concurrently while sending early replies.
- Bugzilla: 23175
Severity: normal
Description: Disable lockless truncate by default since it is sometimes flawed and causes the write_disjoint test to fail.
- Bugzilla: 23139
Severity: normal
Description: OSSs which don't have the patch from bug 20278 can trigger an LBUG on 1.8 clients.
- Bugzilla: 21528
Severity: enhancement
Description: don't print message to the console when we have not managed to cancel all locks.
- Bugzilla: 23305
Severity: normal
Description: The MDS fails to synchronize OSTs which registered with the MGS after the MDT. The problem is that OBD_NOTIFY_CREATE events are raised too early and thus discarded by the MDT stack. The fix consists of issuing OBD_NOTIFY_CREATE event in the lov layer.
- Bugzilla: 23192
Severity: normal
Description: Fix race when the ping evictor and a service thread execute target_recovery_check_and_stop() concurrently.
- Bugzilla: 23196
Severity: normal
Description: quota broadcast can trigger a LBUG on the MDT if there are inactive OSCs.
- Bugzilla: 17485
Severity: enhancement
Description: Resetting the lov_objid values to last_id reported by the OST during orphan recovery is incorrect and can cause the same objects to be allocated twice.
- Bugzilla: 21452
Severity: enhancement
Description: "weak-modules" support
Details: Implement "weak-modules" support which enables kernel modules to be used with any kernel that implements the same kABI. In order to achieve this modules are now installed in /lib/modules/$(uname -r)/updates/kernel on all distributions.
- Bugzilla: 22464
Severity: enhancement
Description: add writeconf as mount option
- Bugzilla: 22846
Severity: enhancement
Description: produce debuginfo packages for SLES.
- Bugzilla: 15253
Severity: enhancement
Description: add failover nidlist to the import proc file.
- Bugzilla: 20563
Severity: enhancement
Description: fix LUSTRE_SEQ_MAX_WIDTH for interoperability between 1.8 clients and 2.0 servers.
- Bugzilla: 22938
Severity: enhancement
Description: lfs find -s does not work correctly because of a bug in find_value_cmp().
- Bugzilla: 22309
Severity: normal
Description: ll_read_ahead_page() must validate the dlm lock before using it.
- Bugzilla: 22656
Severity: normal
Description: Prevent failover nids from registering with MGS first.
- Bugzilla: 11063
Severity: normal
Description: fix lock inversion in ll_setattr_raw().
- Bugzilla: 22884
Severity: normal
Description: object allocation is not balanced across OSTs.
Details: osc_precreate() should return 0, if there are enough objects left.
Changes from v1.8.2 to v1.8.3
Support for networks:
- socklnd - any kernel supported by Lustre™
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1 and 1.4.2
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.10 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.42.8 (SLES 10)
- 2.6.27.39-0.3.1 (SLES11, i686 & x84_64 only)
- 2.6.18-164.11.1.el5 (RHEL 5)
- 2.6.18-164.11.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.30 vanilla (kernel.org)
Recommended e2fsprogs version: 1.41.10-sun2
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 22363
Severity: normal
Description: fix for a race condition in linux quotas implementation
Details: dq_flags(struct dquot) access is not properly locked which could lead to certain inconsistencies when accessing it using non-atomic bit operations like __set_bit in do_set_dqblk. This patch replaces non-atomic __set_bit calls with atomic set_bit calls.
- Bugzilla: 22307
Severity: normal
Description: initialize the child_res_id for OPEN lock
Details: in mds_open, initialize the child_res_id before enqueuing the OPEN lock for the child inode, then to avoid senting wrong ldlm_res_id to client.
- Bugzilla: 22556
Severity: normal
Description: lst: check # of remained RPCs before aborting
Details: lstcon_rpc_trans_postwait() calls lstcon_rpc_trans_abort() only when the transaction is timeout, so if we got "end_session" to interrupt waiting on transaction, then we can hit the assertion failure ASSERTION(crpc->crp_stamp != 0)
- Bugzilla: 16909
Severity: normal
Description: Suppress "changing the import ..." warning.
Details: This warning will always be printed when the MDT reconnects to an OST after the MDT is restarted. There is nothing wrong here and more importantly there is nothing the admin should do or care about so I'm moving the warning to D_HA.
- Bugzilla: 16909
Severity: normal
Description: Use INFO/WARN instead of WARN/ERROR for the slow messages.
Details: We should use INFO/WARN instead of WARN/ERROR for the slow messages. Not only is there no real error here but it fixes an annoying quirk of the message formatting. With the old levels you would see the messages formatted differently based on the time.
- Bugzilla: 22385
Severity: normal
Description: Computing result of unsigned variable may < 0.
- Bugzilla: 22252
Severity: major
Description: allow multiple instances of the same nid in NID hash
Details: Case of multiple separate clients from the same NID (as with liblustre) is legitimate and so we should allow multiple instances of the same NID in nid hash.
- Bugzilla: 22423
Severity: normal
Description: rely on pings to issue reconnects
Details: Don't wake up pinger on reconnect failures and rely on regular pings to trigger the next reconnection. Please note that the pinger already uses a smaller interval if the import is disconnected.
- Bugzilla: 20615
Severity: normal
Description: print more debug info for timedout ZC-req
Details: 1. output more information for timedout ZC-req and partial received connection
2. close connection for timedout ZC-req
3. always send ZC_ACK on non-blocking connection(BULK_IN)
- Bugzilla: 22307
Severity: normal
Description: remove lock acquisition during holding spinlock
Details: in ras_update, "lov_get_info" could be called during increasing readahead windows, which tries to get the mutex lock "lov_lock" while holding the spin_lock "ras_lock", then causes system lockup.
- Bugzilla: 20278
Severity: normal
Description: ASSERTION(cli->cl_avail_grant >= 0) failed
Details: This patch tries to address several issues:
1. osc_init_grant(): calculate avail_grant according to recovery status.
2. osc_reconnect(): request grant should include cl_dirty.
3. filter_grant(): beside server reboot, we should also grant the requested amount in case of normal reconnect.
4. round-up grant amount instead of round-down, otherwise client would still have situation that dirty > granted.
- Bugzilla: 20805
Severity: normal
Description: Use CNETERR in specific places in the portal's LNET driver
- Bugzilla: 22108
Severity: normal
Description: include last created object in precreate slow case
- Bugzilla: 20373
Severity: normal
Description: don't do rep-ack if not created anything
Details: mds_open currently always put a lock into a rep-ack regardless if something was created or not. This is pointless and only creates needless contention. In fact the entire idea was to do this for real creates as a recovery protection.
- Bugzilla: 22409
Severity: normal
Description: Spurious error messages from smp_processor_id() on preemptible kernel
Details: Disable a preemption by grabbing the lock in fs_trace_get_tcd() first. The function fs_trace_get_tcd() was moved up.
- Bugzilla: 21500
Severity: normal
Description: 2.6.31-fc12 patchless client support.
- Bugzilla: 17258
Severity: normal
Description: give the BUILD_TESTS love to ldiskfs as well
Details: Because ldiskfs re-uses so (too?) much of the lustre auto* goop we need to stub the BUILD_TESTS assignment into it's autoMakefile.am, even though it's completely unused/unneed there.
- Bugzilla: 22181
Severity: normal
Description: interval_erase() fix
Details: interval_erase() calls update_maxhigh() properly when child == NULL
- Bugzilla: 21945
Severity: normal
Description: Adding WIRE_ATTR attribute to LNET types
Details: LST nodes on different platforms might not communicate well due to the lack of WIRE_ATTR attribute in some LNET structures traversing network. The patch fixes the problem by adding WIRE_ATTR where needed.
- Bugzilla: 22069
Severity: normal
Description: replace server_major_version with connect_flags for quota utils interoperability
- Bugzilla: 22233
Severity: normal
Description: do_div arguments not cross-platform compatible
- Bugzilla: 22177
Severity: normal
Description: fix error message in mds_mfd_close()
Details: Fix error messages in mds_mfd_close() since it is now legitimate to have i_nlink = 1 for dirs in /PENDING.
- Bugzilla: 22327
Severity: normal
Description: "lfs df" does not print stats for all mountpoints
Details: Print all mounted lustre filesystems with "lfs df"
- Bugzilla: 21957
Severity: normal
Description: debug_mb not correctly initialized on newer kernels (2.6.31)
Details: Fixed the debug_mb initialization problem for kernel 2.6.31
- Bugzilla: 19919
Severity: normal
Description: support relative path in llapi_search_fsname()
Details: Use realpath() to provide absolute pathname.
- Bugzilla: 21486
Severity: normal
Description: fix for truncated reply buffer
Details: reply buffer could be referred by reply_in_callback after released
- Bugzilla: 22194
Severity: normal
Description: Add quiet -q option to lfs quota
- Bugzilla: 21619
Severity: normal
Description: hash MEs on RDMA portal
Details: RDMA portal can have very long ME list on client side, which will trigger soft lockup because of long searching on list. Hash MEs on RDMA portal can resolve this problem.
- Bugzilla: 21259
Severity: normal
Description: udev rule to set /dev/obd perms 666
Details: Provide Udev rules file for Lustre, so that /dev/obd permissions are now 666.
- Bugzilla: 22301
Severity: normal
Description: lustre.lov error when backing up symlinks with extended attributes
Details: Improved logic in ll_listxattr()
- Bugzilla: 22187
Severity: normal
Description: properly handle null value for setattr -n lustre.lov
Details: Running "setfattr -n trusted.lov ." causes a NULL dereference in ll_setxattr() due to no checking if "value" is NULL. This command now resets to the default striping when executed against a directory.
- Bugzilla: 22319
Severity: normal
Description: skip statahead for NFSCLIENT
- Bugzilla: 22352
Severity: normal
Description: Kernel update for SLES9 2.6.5-7.322.
- Bugzilla: 22194
Severity: normal
Description: lfs quota output cleanup
Details: Suppress standard output in error cases
- Bugzilla: 22235
Severity: normal
Description: llapi_uuid_match() prints bogus error message on upgraded filesystem Details:
1. Increase the "lfs df" column width to handle TB sized devices cleanly
2. Allow matching OST names without trailing _UUID
3. Allow negating the "--obd" option to "lfs find"
4. Remove duplicate code in mntdf() iterating over MDTs/OSTs. Handle errors
- Bugzilla: 22241
Severity: normal
Description: call sync instead of fsync on local cancel to reduce stack usage
Details: sync_on_lock_cancel is needed for recovery when async journal is enabled, but we actually just need to make sure that metadata blocks have hit the journal, so doing a fs sync should be enough and should consume less stack (just create an empty handle and commmit it).
- Bugzilla: 21686
Severity: normal
Description: simplify client disconnect code on server side
Details: This patch was reverted because we were chasing some regression. It is now safe to re-apply.
- Bugzilla: 22035
Severity: normal
Description: workaround patch
Details: disable the per-thread data (current->journal_info) containing the lock info during I/O to work around the issue for short tem
- Bugzilla: 22194
Severity: normal
Description: Print a dash in empty lfs quota grace columns
Details: Polish lfs quota output for easier processing with awk/sed
- Bugzilla: 21938
Severity: normal
Description: rq_invalid_rqset should be a bitfield
- Bugzilla: 19933
Severity: normal
Description: control DCACHE_LUSTRE_INVALID flag with MDS_INODELOCK_LOOKUP lock
Details: "DCACHE_LUSTRE_INVALID" is controlled by "MDS_INODELOCK_LOOKUP" lock which is corresponding to "IT_LOOKUP", do not skip invalidate for other intent.
- Bugzilla: 20997
Severity: normal
Description: Cannot send after transport shutdown
Details: Clear imp_vbr_failed flag upon eviction
- Bugzilla: 21938
Severity: normal
Description: use req->rq_set itself during recovery
Details: during recovery, uses req->rq_set itself to replay the request instead of ptlrpcd_recovery_pc
- Bugzilla: 22069
Severity: normal
Description: introduce server major version for b1_8 and b2_0 quota utils interoperability
- Bugzilla: 21983
Severity: normal
Description: Use CFS_ALLOC_IO instead of _STD in llap_from_page_with_lockh
Details: During an ll_readahead under ll_readpage, we have seen the the OBD_SLAB_ALLOC hang under ldlm_pools_shrink when trying to lock a page that is already locked by the readahead code.
Using CFS_ALLOC_IO instead of CFS_ALLOC_STD will prevent ldlm_pools_shrink from actually freeing slab, so the call path that blocks indefinitely can never happen.
- Bugzilla: 22177
Severity: normal
Description: inc nlink by 2 instead of 1 in mds_orphan_add_link()
Details: Fix regression introduced by 19640. ext3_inc_count() can reset nlink to 1 when the directory is indexed and inode->i_nlink == 2. Work around the problem by incrementing nlink by 2 instead of 1.
- Bugzilla: 22095
Severity: normal
Description: MDS operations hang when issued with lfs setstripe on a degraded OST
Details: Change the locking order in mds_lookup()
- Bugzilla: 17258
Severity: normal
Description: fix error with make rpms after configure --disable-tests
Details: If one configures lustre with "--disable-tests" a subsequent "make rpms" will fail as it would still try to package up the lustre-tests RPM. Fixing this provided the opportunity to fix another wart, that being the subst'ing the configure arguments into the lustre.spec. Now they are passed as value with "--define 'configure_args ...'" when calling rpmbuild.
- Bugzilla: 21726
Severity: normal
Description: stop waitting for next replay transno if shutdown
Details: if the system is shutting down, wake up service thread blocked to wait for next replay transno during recovery, then all the references held by queued requests can be dropped and device can be stopped.
- Bugzilla: 21816
Severity: normal
Description: return approximate block/inode usage when OSTs are down
Details: Really return approximate block/inode usage when OSTs are down. The old version erroneously skipped oqctl copying on error which prevented this from working properly.
- Bugzilla: 20989
Severity: normal
Description: lov_merge_lvb()) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed
Details: Protect lli->lli_smd pointer updates with lli->lli_lock.
- Bugzilla: 21815
Severity: normal
Description: Avoid operating lustre-hash internal structures directly.
- Bugzilla: 22097
Severity: normal
Description: mount.lustre fails to pass some options to mount()
- Bugzilla: 18649
Severity: normal
Description: set wait_recovery_complete() MAX value to max recovery time estimated
- Bugzilla: 21380
Severity: normal
Description: make dist seems to exclude the "darwin" bits
Details: Include all of the darwin bits in the distribution tarball created with make dist.
- Bugzilla: 21911
Severity: normal
Description: fix for double release of ibc_lock in o2iblnd
Details: Re-acquire ibc_lock in kiblnd_post_tx_locked(). Add extra reference to conn before calling kiblnd_post_tx_locked() to avoid scenario when conn disappears inside kiblnd_post_tx_locked().
- Bugzilla: 17952
Severity: normal
Description: allow relative pathnames
Details: This patch allows one to give relative pathnames to --with-linux and friends.
- Bugzilla: 19336
Severity: normal
Description: post landing cleanups
Details: Remove generic find_linux_devel_paths() - now that both the rhel5 and sles method files have their own particular version of this method, remove this hacky-trying-to-work-for-both versions from lbuild. Remove a block of what is now redundant code. Remove the comments from the target files describing what happened with this bug.
Align the sles10 and sles11 target files:
- include the rpmfix specifier in the sles10 file
- remove the EXTRA_VERSION_DELIMETER from the sles10 file
- change the TARGET_DELIMETER to FLAVOR_DELIMETER in the sles11 file
- Some whitespace cleanups.
- Bugzilla: 20433
Severity: normal
Description: decrease the usage of memory on clients.
Details: 1. On clients, recycle dentries and inodes unused.
2. Delete the code related to ll_deathrow(att 6215 in bug 1443). It is useless now.
- Bugzilla: 21137
Severity: major
Description: ext4 extent allocation is slower than in ext3
Details: Increase the default value of MB_DEFAULT_ORDER2_REQS to 8, enlarge ext4 preallocation table for 2048 4K blocks extents creation.
- Bugzilla: 22074
Severity: normal
Description: incorrect triggering of synchronous IO
Details: The OSC can mistakenly fall back to synchronous IO when the max_dirty_mb limit is reached and no write requests have yet been issued. This can occur when the dirty pages are spread over many files all of which are below the optimal request size.
- Bugzilla: 20383
Severity: normal
Description: fix errant m4 "dnl" usage
Details: Some dnl() usage seems to have been causing some errors in the resulting configure script.
- Bugzilla: 21829
Severity: normal
Description: fix broken llobdstat and add a counter parameter
Details: Need to make sure we limit the search for OBD stats files to the obdfilter subdirectory of "/proc/fs/lustre".
Add a counter argument to limit the number of items returned when using the interval parameter.
Fix lots of whitespace atrocities as well as better format some of the code.
- Bugzilla: 13520
Severity: normal
Description: PTLRPC_PAUSE_REQ checking should ignore PING.
- Bugzilla: 20355
Severity: normal
Description: Add $(PTHREAD_LIBS) to lctl and lfs build
Details: $(PTHREAD_LIBS) is needed to compile lctl and lfs for BG/P
- Bugzilla: 21919
Severity: normal
Description: Optimize quota_ctl operations by sending requests in parallel
Details: Based on a patch from Joseph Herring (LLNL).
Send MDS->OST quota_ctl requests in parallel, do not resend.
Compiled from two attachments in the ticket.
- Bugzilla: 18030
Severity: normal
Description: deadlock fix
Details: start the transaction earlier in llog_lvfs_destroy to get transaction start and inode mutex lock nested properly.
- Bugzilla: 21264
Severity: normal
Description: workaround dd bus error
Details: A buggy coreutils/gettext combination workaround. Suppressing dd xfer statistic makes dd do not call gnu gettext library and avoid crashing.
- Bugzilla: 15057
Severity: minor
Description: fix file ownerships in lustre-modules RPM
Details: The files in the lustre-modules RPM were not being set with a correct owner and were therefore just using what was on the filesystem.
- Bugzilla: 21665
Severity: normal
Description: a small fix for "lfs osts"
Details: Actually, we don't want to traverse the directory tree, so return a positive value from sem_init to terminate the traversal before it starts.
- Bugzilla: 21882
Severity: normal
Description: handle SLV==1 on client side
Details: Initialize ldlm pool SLV to 0 on client side to handle SLV==1 obtained from server correctly
- Bugzilla: 21882
Severity: normal
Description: lru resize SLV can get stuck
Details: calculate SLV with a greater precision to not lose small changes due to interger math truncation; round up SLV only if the amount of granted locks less than the limit to not get stuck with this SLV
- Bugzilla: 21666
Severity: normal
Description: prevent use of OFED source dir instead of headers
Details: Try to determine if the user is pointing configure at the OFED source directory intead of the devel/headers directory and error out of configure if so and display an informative warning.
- Bugzilla: 19553
Severity: normal
Description: Ignore broken cancel_dirty_page() in OFED 1.4.1
Details: OFED 1.4.1 had a broken implementation of cancel_dirty_page for SLES10. This patch detects that and ignores the function if found.
- Bugzilla: 19336
Severity: normal
Description: Get rid of the EXTRA_VERSION_DELIMETER shenanigans
Details: We used to carry around a bunch of baggage in order to specify what kind of delimeter to put between the version and "extra version". The truth of the matter is that this should always be "-".
This patch includes some support for a build system developer to force an uncached rebuild of all products.
- Bugzilla: 21961
Severity: normal
Description: (17914) ignore trailing -mdc when determining index number
- Bugzilla: 21966
Severity: normal
Description: avoid divide-by-zero in lprocfs_rd_import()
- Bugzilla: 21953
Severity: normal
Description: use separate failover counter for each facet
- Bugzilla: 21147
Severity: normal
Description: call build_lqs only from generic_quota_on
- Bugzilla: 21259
Severity: normal
Description: "lfs check" is only allowed for root.
Details: Code cleanup around obd_class_*() functions and sanity test for non-root lfs check
- Bugzilla: 21632
Severity: normal
Description: Kernel update to OEL5.4 2.6.18-164.11.1.0.1.el5.
- Bugzilla: 21686
Severity: normal
Description: fail the request if its obd_device stopping
Details: in ldlm_handle_enqueue, the request should be failed if its obd_device had been marked as "fail"(obd_fail=1), which will be set during umount.
- Bugzilla: 21815
Severity: normal
Description: lustre_hash_rehash_key() should use lh_read_unlock()
Details: lh_read_lock() is no-op if rehash is disabled, so we should use lh_read_unlock() in this function. This should not have any consequence, but better to fix it.
- Bugzilla: 21815
Severity: normal
Description: move assertion under write lock
- Bugzilla: 21815
Severity: normal
Description: print more debug info in lustre_hash_exit when assertion fails
- Bugzilla: 19405
Severity: normal
Description: do not flag a request as rq_replay for non replayable imports
- Bugzilla: 21906
Severity: normal
Description: LBUG doesn't print stack trace on sles9 because show_stack not exported
Changes from v1.8.1.1 to v1.8.2
Support for networks:
- socklnd - any kernel supported by Lustre™
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1 and 1.4.2
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.10 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.42.8 (SLES 10)
- 2.6.27.39-0.3.1 (SLES11, i686 & x84_64 only)
- 2.6.18-164.11.1.el5 (RHEL 5)
- 2.6.18-164.6.1.0.1.el5 (OEL 5)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.30 vanilla (kernel.org)
Recommended e2fsprogs version: 1.41.6.sun1
The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.
- Bugzilla: 21459
Severity: minor
Description: should update lp_alive for non-router peers.
- Bugzilla: 15332
Severity: enhancement
Description: LNet router shuffler.
- Bugzilla: 15332
Severity: enhancement
Description: LNet fine grain routing support.
- Bugzilla: 20171
Severity: normal
Description: router checker stops working when system wall clock goes backward
Details: use monotonic timing source instead of system wall clock time.
- Bugzilla: 18460
Severity: enhancement
Description: avoid asymmetrical router failures
- Bugzilla: 19735
Severity: enhancement
Description: multiple-instance support for kptllnd
- Bugzilla: 20897
Severity: normal
Description: ksocknal_close_conn_locked connection race
Details: A race was possible when ksocknal_create_conn calls ksocknal_close_conn_locked for already closed conn.
- Bugzilla: 13065
Severity: enhancement
Description: port router pinger to userspace
- Bugzilla: 17546
Severity: normal
Description: kptllnd HELLO protocol deadlock
Details: kptllnd HELLO protocol doesn't run to completion in finite time
- Bugzilla: 18075
Severity: normal
Description: LNet selftest fixes and enhancements
- Bugzilla: 19156
Severity: enhancement
Description: allow a test node to be a member of multiple test groups
- Bugzilla: 18654
Severity: enhancement
Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
Details: an update from the upstream developer Scott Atchley.
- Bugzilla: 21632
Severity: enhancement
Description: Update RHEL5.4 kernel to 2.6.18-164.11.1.el5 and OEL5.4 kernel to 2.6.18-164.11.1.0.1.el5.
Severity: enhancement
Description: Update SLES11 kernel to 2.6.27.39-0.3.1.
- Bugzilla: 20758
Severity: enhancement
Description: Update supported SLES10 kernel to 2.6.16.60-0.42.8.
- Bugzilla: 20773
Severity: enhancement
Description: Update kernel to RHEL5.4 2.6.18-164.6.1.el5 and OEL5 2.6.18-164.6.1.0.1.el5(Both in-kernel OFED enabled).
- Bugzilla: 16312
Severity: enhancement
Description: Build kernels (RHEL5, OEL5 and SLES10/11) using the vendor's own kernel spec file.
- Bugzilla: 19808
Severity: enhancement
Description: Vanilla kernel 2.6.30 patchless client support.
- Bugzilla: 20892
Severity: major
Frequency: rare
Description: bad entry in directory xxx: inode out of bounds
Details: fix locking issue in the rename path which could race with any other operations updating the same directory.
- Bugzilla: 20722
Severity: enhancement
Description: Make watchdog timer messages to be more clear and descriptive.
- Bugzilla: 21489
Severity: normal
Description: cp -p command does not preserve the dates and timestamp
Details: mtime could be spoiled by a write callback
- Bugzilla: 21513
Severity: normal
Description: Clear imp_force_reconnect correctly in ptlrpc_connect_interpret()
- Bugzilla: 21259
Severity: normal
Description: Allow non-root access for "lfs check".
Details: Added a check in obd_class_ioctl() for OBD_IOC_PING_TARGET.
- Bugzilla: 19763
Severity: enhancement
Description: quotacheck performance/scaling issues
Details: reduce quotacheck time on empty filesystem by skipping uninit group.
- Bugzilla: 20200
Severity: enhancement
Description: Enhancement for lfs(1) command to use numeric uid/gid.
- Bugzilla: 19325
Severity: enhancement
Description: Adjust locks' extents on their first enqueue, so that at the time they get granted, there is no need for another pass through the queues since they are already shaped into the proper forms.
- Bugzilla: 20020
Severity: normal
Description: Fix mds_shrink_intent_reply()/mds_intent_policy() to pass correct arguments and prevent LBUG() in lustre_shrink_reply_v2().
- Bugzilla: 19689
Severity: normal
Description: Change tunefs.lustre and mkfs.lustre --mountfsoptions so that exactly the specified mount options are used. Leaving off any "mandatory" mount options is an error. Leaving off any default mount options causes a warning, but is allowed. Change errors=remount-ro from mandatory to default. Sanitize the mount string before storing it. Update man pages accordingly.
- Bugzilla: 20302
Severity: normal
Description: mds_getattr() should return 0, even if mds_fid2entry() fails with -ENOENT. Also fix in ptlrpc_expire_one_request() to print signed time difference.
- Bugzilla: 19662
Severity: enhancement
Description: Remove set_info(KEY_UNLINKED) from MDS/OSC
- Bugzilla: 16774
Severity: enhancement
Description: Clients can replay thousands of unused locks during recovery
Details: Don't replay unused locks (only read locks for now) during recovery. This feature is disabled by default and can be enabled by running the following command on the clients: lctl get_param ldlm.cancel_unused_locks_before_replay
- Bugzilla: 19526
Severity: normal
Description: can't stat file in some situation.
Details: improve initialize osc date when target is added to mds and ability to resend too big getattr request is client isn't have info about ost.
- Bugzilla: 19566
Severity: normal
Description: Prevent inconsistences between linux and lustre mount structures.
Details: Wait indefinitely in server_wait_finished() until mnt_count drops. Make the sleep interruptible.
- Bugzilla: 18539
Severity: enhancement
Description: Communicate OST degraded/readonly state via statfs to MDS
Details: Flags in the statfs returned from OSTs indicate whether the OST is in a degraded RAID state, or if the filesystem has turned read-only after a filesystem error is detected.
- Bugzilla: 20122
Severity: normal
Frequency: rare
Description: don't panic if EPROTO was hit when reading symlink
Details: correctly handling request reference in error cases.
- Bugzilla: 17545
Severity: normal
Frequency: common
Description: open sometimes returns ENOENT instead of EACCES
Details: checking permission should be part of open part of mds_open, not lookup part. so server should be set DISP_OPEN_OPEN disposition before starting permission check. Also not need revalidate dentry if client already have LOOKUP lock.
- Bugzilla: 19854
Severity: normal
Frequency: on servers with multiple network interfaces
Description: enable client interface failover
Details: When a child reconnects from another NID, properly update export nid hash position and ldlm reverse import.
- Bugzilla: 18801
Severity: enhancement
Description: implemented direct I/O with arbitrary (nonaligned) memory addresses and file offsets.
- Bugzilla: 18948
Severity: enhancement
Description: added more recovery timeout options.
- Bugzilla: 16267
Severity: enhancement
Description: added llapi_file_open, llapi_file_create, llapi_file_get_stripe man pages.
- Bugzilla: 19529
Severity: normal
Frequency: only on systems with clients writing to an OST on the same node
Description: Avoid deadlock for local client writes
Details: Use new OBD_BRW_MEMALLOC flag to notify OST about writes in the memory freeing context. This allows OST threads to set the PF_MEMALLOC flag on task structures in order to allocate memory from reserved pools and complete IO. Use GFP_HIGHUSER for OST allocations for non-local client writes, so that the OST threads generate memory pressure and allow inactive pages to be reclaimed.
- Bugzilla: 18380
Severity: normal
Frequency: rare
Description: lock ordering violation between &cli->cl_sem and _lprocfs_lock
Details: .move ldlm namespace creation in setup phase to avoid grab _lprocfs_lock with cli_sem held
- Bugzilla: 18624
Severity: normal
Frequency: only during format of test systems
Description: Unable to run several mkfs.lustre on loop devices at the same time
Details: mkfs.lustre returns error 256 on the concurrent loop devices formatting. The solution is to proper handle the error.
- Bugzilla: 18357
Severity: enhancement
Description: implement async create (obd_async_create) method for osc, to avoid too long waiting new ost objects with holding ldlm lock.
- Bugzilla: 18674
Severity: normal
Frequency: occasionally during network problems
Description: client not allowed to reconnect to OST because of active request
Details: abort bulk requests received by the OST once the client has timed out since the client will resend the request anyway. The client also now retries to reconnect to the same server if a connect request failed with EBUSY or -EAGAIN.
- Bugzilla: 18382
Severity: normal
Frequency: rare, if used wide striped file and one ost in down.
Description: don't return error if we created a subset of objects for file.
Details: lov_update_create_set() uses set->set_success as index for created objects, so if some requests failed, they will have hole at end of array and we can use qos_shrink_lsm for allocate correct lsm.
- Bugzilla: 20978
Severity: normal
Description: Slow stale export processing during normal start up
Details: The global mgc lock prevents OST setup to be run in parallel. Replace the global lock with a per-config_llog_data semaphore.
- Bugzilla: 19128
Severity: normal
Description: Out or order replies might be lost on replay
Details: In ptlrpc_retain_replayable_request if we cannot find retained request with tid smaller then one currently being added, add it to the start, not end of the list.
- Bugzilla: 19557
Severity: normal
Description: BUG: soft lockup - CPU#1 stuck for 10s! [ll_mdt_07:4523]
Details: add cond_resched() calls to avoid hogging the cpu for too long in the hash code. Make also lustre_hash_for_each_empty() more efficient.
- Bugzilla: 17682
Severity: enhancement
Description: Performance improvements for debug messages with D_RPCTRACE, D_LDLM, D_QUOTA options.
- Bugzilla: 20989
Severity: normal
Frequency: only with NFS export
Description: (lov_merge.c:74:lov_merge_lvb()) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed (SR 71691004)
Details: Fix a race in the nfs export code by populating inode info while the new inode is still locked
- Bugzilla: 11680
Severity: enhancement
Description: add a new file in procfs called force_lbug. Writting to this ile triggers a LBUG. Only for test purpose.
- Bugzilla: 18213
Severity: normal
Description: OOM killer causes node hang
Details: really interrupt the sleep in osc_enter_cache on signals
- Bugzilla: 18630
Severity: normal
Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion()) LBUG
Details: fix race during quota release on the slave.
- Bugzilla: 18690
Severity: enhancement
Description: smaller hash bucket sizes, cleanups
Details: increase hash table sizes and enabled rehashing for pools, uuid, nid & per-nid stats.
- Bugzilla: 19673
Severity: enhancement
Description: Add ldiskfs maxdirsize mount option
Details: add max_dir size mount option
- Bugzilla: 20139
Severity: normal
Description: panic in ll_statahead_thread
Details: prevent parent thread to be killed before its child
- Bugzilla: 20301
Severity: normal
Frequency: only with 16TB device
Description: unable to perform "mount -t lustre" of 16TB OST device
Details: Mounting 16TB LUNs failed due to three bugs in mkfs.lustre.
- Bugzilla: 20456
Severity: normal
Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0) failed
Details: unregistering should be zero if no RPC inflight.
- Bugzilla: 20607
Severity: normal
Description: hyperion: Oops during metabench
Details: Correct the refcount of lov_request_set
- Bugzilla: 20617
Severity: enhancement
Description: Add mptlinux and nxge drivers to Lustre builds
- Bugzilla: 20722
Severity: enhancement
Description: Fix watchdog timer message to be more clear
Details: Make watchdog timer messages more clear and descriptive.
- Bugzilla: 21396
Severity: normal
Description: LNET soft lockups in socknal_cd thread
Details: don't hog CPU for active-connecting if another connd is accepting connecting-requst from the same peer
- Bugzilla: 21411
Severity: normal
Description: recovery-small test_17 hang
Details: Land several AT improvements & fixes.
- Bugzilla: 21420
Severity: normal
Description: MDS panic and hanging client processes
Details: Replace exp_ops_stats with exp_nid_stats->nid_stats
- Bugzilla: 21471
Severity: normal
Description: OSS stuck in recovery.
Details: fix race during recovery. class_unlink_export, class_set_export_delayed and target_queue_last_replay_reply may race while increasing/decreasing obd_recoverable_clients and obd_delayed_clients, causing recovery to wait forever.
- Bugzilla: 21547
Severity: enhancement
Description: add cascading_rw.c to lustre/tests
- Bugzilla: 21565
Severity: normal
Description: filter_last_id() NULL deref
Details: lprocfs_filter_rd_last_id() should check for the fully setup obd device, before proceeding further.
- Bugzilla: 21571
Severity: enhancement
Description: Loadgen improvements
Details: stacksize and locking fixes for loadgen
- Bugzilla: 21656
Severity: normal
Description: Quiet CERROR("dirty %d > system dirty_max %d\n"
Details: The atomic_read() allowing the atomic_inc() are not covered by a lock. Thus they may safely race and trip this CERROR() unless we add in a small fudge factor (+1).
- Bugzilla: 21800
Severity: enhancement
Description: shrink_slab: nr=-9223362083340912175
Details: fix spurious message from shrink_slab reporing negative nr
- Bugzilla: 21681
Severity: normal
Description: Quiet bogus previously committed transno error
Details: suppress the "server went back in time" error message which is always printed even in the common case after a client eviction
- Bugzilla: 20065
Severity: enhancement
Description: Parallel statfs() calls result in client eviction
Details: cache statfs data for 1s.
- Bugzilla: 21574
Severity: normal
Description: parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1
Details: fix serveral issues in pinger code causing clients not to ping servers for too long, resulting in evictions.
- Bugzilla: 21564
Severity: normal
Description: e2fsck should warn when MMP update interval is extended
Details: print mmp_check_interval and make it possible to abort mount operation in case it takes too long.
- Bugzilla: 21595
Severity: normal
Description: mdsrate-create-large.sh, BUG: soft lockup - CPU#0 stuck for 10s!
Details: fix bug in the RHEL5's jbd2 callback patch.
- Bugzilla: 21828
Severity: normal
Description: drop number of active requests when queued for recovery
Details: Now that we take a reference on the original request instead of making a copy of it for recovery. We need to drop the number of active requests or the queued requests will prevent all request processing when they exceed (srv->srv_threads_running - 1).
- Bugzilla: 21826
Severity: enhancement
Description: refuse to invalidate operational quota files when they are in use
Details: an attempt to invalidate operational quota files on the quota master is not actually permitted by VFS (returning -EPERM), but we should not depend on that and should return the error earlier.
- Bugzilla: 21406
Severity: normal
Description: Applications stuck in jbd2_log_wait_commit during exit
Details: fix deadlock between kjournald2 trying to acquire the page lock owned by an ost_io thread waiting for journal commit.
Changes from v1.8.1 to v1.8.1.1
Support for networks:
- socklnd - any kernel supported by Lustre™
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.1 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.42.4 (SLES 10)
- 2.6.27.29-0.1 (SLES11, i686 & x84_64 only)
- 2.6.18-128.7.1.el5 (RHEL 5)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.27 vanilla (kernel.org)
Recommended e2fsprogs version: 1.41.6.sun1
File join has been disabled in this release, refer to bugzilla 16929
NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre file system with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630
ext4 support for RHEL5 is experimental and thus should not be used in production.
- Bugzilla: 20539
Severity: enhancement
Description: Add OEL5 support.
- Bugzilla: 19848
Severity: enhancement
Description: Update kernel to SLES11 2.6.27.29-0.1.
- Bugzilla: 20560
Severity: major
Description: File checksum failures with OST read cache on
Details: Disable page poisoning when the bulk transfer has to be aborted because the client got evicted.
- Bugzilla: 19557
Severity: normal
Description: Don't allow make backward step on assiging osc next id.
Details: race between allocation next id and ll_sync thread can be cause of set wrong osc next id and can be kill valid ost objects.
- Bugzilla: 20400
Severity: enhancement
Description: Update kernel to RHEL5 2.6.18-128.7.1.el5.
- Bugzilla: 20758
Severity: enhancement
Description: Update kernel to SLES10 SP2 2.6.16.60-0.42.4.
- Bugzilla: 20533
Severity: normal
Description: Changes in raid5-large-io-rhel5.patch to calculate sectors properly
- Bugzilla: 20533
Severity: normal
Description: Increase the default BLK_DEF_MAX_SECTORS value for RHEL5 and SLES11
- Bugzilla: 20482
Severity: normal
Description: Do not send statfs() requests to OSTs disabled by administrator.
Details: Check in lov_prep_statfs_set() for non-NULL ltd_exp.
- Bugzilla: 20482
Severity: normal
Description: Error handling in osc_statfs_interpret() has been improved.
Details: Check in osc_statfs_interpret() for EBADR.
- Bugzilla: 20146
Severity: normal
Description: Do not update ctime for the deleted inode.
Details: Check in mds_reint_unlink() before calling fsfilt_setattr().
- Bugzilla: 20146
Severity: normal
Description: Increase of the size of the LDLM resource hash.
Details: Bump up RES_HASH_BITS=12.
- Bugzilla: 19934
Severity: normal
Description: correctly send lsm on open replay
Details: MDS is trust to LSM size on replay open, but client can set wrong size of lsm buffer.
- Bugzilla: 20321
Severity: normal
Description: Deadlock between filter_destroy() and filter_commitrw_write().
Details: filter_destroy() does not hold the DLM lock over the whole operation. If the DLM lock is dropped, filter_commitrw() can go through, causing the deadlock between page lock and i_mutex. The i_alloc_sem should also be hold in filter_destroy() while truncating the file.
- Bugzilla: 20008
Severity: normal
Description: truncate starts GFP_FS allocation under transaction causing deadlock
Details: ldiskfs_truncate calls grab_cache_page which may start page allocation under an open transaction. This may lead to calling prune_icache with consequent lustre reentrance.
- Bugzilla: 20318
Severity: normal
Frequency: only when down/upgrading the MDS to 1.6/1.8 while 1.8 clients are still up and when the OST pool feature is used
Description: interop testing got LBUG when run dd with OST pool :LustreError: 30032:0:(llite_lib.c:1913:ll_replace_lsm()) LBUG
Details: down/upgrading the MDS to a version that doesn't/does support OST pool can cause clients to crash because the lsm has changed behind their back.
- Bugzilla: 20550
Severity: normal
Description: missing tree_status on 1.8.1 RPM build
Details: make rpms failed due because the tree_status file is missing.
- Bugzilla: 19551
Severity: normal
Description: continuing LustreError "mds adjust qunit failed!"
Details: don't print message on the console when ->adjust_qunit fails.
- Bugzilla: 18618
Severity: normal
Description: don't increase ldlm timeout if previous client was evicted
Details: if a client doesn't respond to a blocking callback within the adaptive ldlm enqueue timeout, don't adjust the adaptive estimate when the lock is next granted.
- Bugzilla: 20518
Severity: normal
Description: ost is being unmounted w/o all writes to last_rcvd landing on disk. affects recovery negatively.
Details: make sure all exports have been properly destroyed by the zombie thread processed before stopping the target.
- Bugzilla: 20205
Severity: normal
Description: Performance degradation with O_DIRECT between 1.6 & 1.8.1 b190
Details: disable write barrier for ext4/SLES11.
- Bugzilla: 18571
Severity: normal
Description: Kernel panic - not syncing: Out of memory and no killable processes... on OSS when iozone
Details: fix memory leak in the journal checksum patch.
- Bugzilla: 18793
Severity: normal
Description: group quota "too many blocks" OSS crashes
Details: we should keep the same uid/gid for lquota_chkquota() and lquota_pending_commit()
- Bugzilla: 18630
Severity: normal
Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion()) LBUG
Details: don't LBUG on release quota error. Just a workaround until the problem is understood.
Changes from v1.8.0.1 to v1.8.1
Support for networks:
- socklnd - any kernel supported by Lustre
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.1 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.39.3 (SLES 10)
- 2.6.27.23-0.1 (SLES11, i686 & x84_64 only)
- 2.6.18-128.1.14.el5 (RHEL 5)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.27 vanilla (kernel.org)
Recommended e2fsprogs version: 1.41.6.sun1
File join has been disabled in this release, refer to bugzilla 16929
NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630
ext4 support for RHEL5 is experimental and thus should not be used in production.
- Bugzilla: 18102
Severity: normal
Description: router_proc.c is rewritten to use sysctl-interface for parameters residing in /proc/sys/lnet
- Bugzilla: 18075
Severity: normal
Description: LNet selftest fixes and enhancements
- Bugzilla: 18654
Severity: enhancement
Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution
Details: an update from the upstream developer Scott Atchley.
- Bugzilla: 15332
Severity: enhancement
Description: add a new LND optiion to control peer buffer credits on routers
- Bugzilla: 18844
Severity: normal
Description: Fixing deadlock in usocklnd
Details: A deadlock was possible in usocklnd due to race condition while tearing connection down. The problem resulted from erroneous assumption that lnet_finalize() could have been called holding some lnd-level locks.
Severity: major
Description: Protocol V2 of o2iblnd
Details: o2iblnd V2 has several new features:
- map-on-demand: map-on-demand is disabled by default, it can be enabled by using modparam "map_on_demand=@value@", @value@ should >= 0 and < 256, 0 will disable map-on-demand, any other valid value will enable map-on-demand.
- Oi2blnd will create FMR or physical MR for RDMA if fragments of RD > @value@.
- Enable map-on-demand will take less memory for new connection, but a little more CPU for RDMA.
- iWARP : to support iWARP, please enable map-on-demand, 32 and 64 are recommanded value. iWARP will probably fail for value >=128.
- OOB NOOP message: to resolve deadlock on router.
- tunable peer_credits_hiw: (high water to return credits), default value of peer_credits_hiw equals to (peer_credits -1), user can change it between peer_credits/2 and (peer_credits - 1). Lower value is recommended for high latency network.
- tunable message queue size: it always equals to peer_credits, higher value is recommended for high latency network.
- It's compatible with earlier version of o2iblnd
- Bugzilla: 18414
Severity: normal
Description: Fixing 'running out of ports' issue
Details: Add a delay before next reconnect attempt in ksocklnd in the case of lost race. Limit the frequency of query-requests in lnet. Improved handling of 'dead peer' notifications in lnet.
- Bugzilla: 16034
Severity: normal
Description: Change ptllnd timeout and watchdog timers
Details: Add ptltrace_on_nal_failed and bump ptllnd timeout to match Portals wire timeout.
- Bugzilla: 16186
Severity: normal
Description: One down Lustre FS hangs ALL mounted Lustre filesystems
Details: Shared routing enhancements - peer health detection.
- Bugzilla: 11245
Severity: minor
Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off
Details: See comment 46 in bug 11245 for details - it's indeed a bug introduced by the original 11245 fix.
- Bugzilla: 15984
Severity: minor
Description: uptllnd credit overflow fix
Details: kptl_msg_t::ptlm_credits could be overflown by uptllnd since it is only a __u8.
- Bugzilla: 14634
Severity: major
Description: socklnd protocol version 3
Details: With current protocol V2, connections on router can be blocked and can't receive any incoming messages when there is no more router buffer, so ZC-ACK can't be handled (LNet message can't be finalized) and will cause deadlock on router. Protocol V3 has a dedicated connection for emergency messages like ZC-ACK to router, messages on this dedicated connection don't need any credit so will never be blocked. Also, V3 can send keepalive ping in specified period for router healthy checking.
- Bugzilla: 18192
Severity: minor
Frequency: in recovery
Description: don't mix llog inodes with normal.
Details: allocate inodes for log in last inode group
- Bugzilla: 20321
Severity: normal
Description: Deadlock between filter_destroy() and filter_commitrw_write().
Details: filter_destroy() does not hold the DLM lock over the wholeoperation. If the DLM lock is dropped, filter_commitrw() can gothrough, causing the deadlock between page lock and i_mutex.
- Bugzilla: 19847
Severity: enhancement
Description: Description: Update
- Bugzilla: 20020
Severity: normal
Frequency: with 1.8 server and 1.6 clients
Description: correctly shrink reply for avoid send too big message to client.
Details: 1.8 mds is allocate to big buffer to LOV EA data and this produce some problems with sending this reply to 1.6 client.
- Bugzilla: 19917
Severity: normal
Description: Repeated atomic allocation failures.
Details: Use GFP_HIGHUSER | __GFP_NOMEMALLOC flags for memory allocations to generate memory pressure and allow reclaiming of inactive pages. At the same time, do not allow to exhaust emergency pools. For local clients the use of GFP_NOFS will be introduced in 1.8.2
Severity: enhancement
Description: Update kernel to RHEL5 2.6.18-128.1.14.el5.
Severity: enhancement
Description: Add support for SLES11 2.6.27.23-0.1.
- Bugzilla: 14250
Severity: enhancement
Description: Update client support to vanila kernels up to 2.6.27.
- Bugzilla: 19212
Severity: enhancement
Description: Update kernel to SLES10 SP2 2.6.16.60-0.37.
- Bugzilla: 15981
Severity: enhancement
Description: Compile with -Werror by default for i686 and x86_64.
- Bugzilla: 19528
Severity: normal
Description: resolve race between obd_disconnect and class_disconnect_exports
Details: if obd_disconnect will be called to already disconnected export he forget release one reference and osc module can't unloaded.
- Bugzilla: 19293
Severity: enhancement
Description: move AT tunable parameters for more consistent usage
Details: add AT tunables under /proc/sys/lustre, add to conf_param parsing
- Bugzilla: 19223
Severity: normal
Description: correctly skip time estimate if in recovery
Details: rq_send_state insn't bitmask so using bitwise ops is forbid.
- Bugzilla: 18399
Severity: normal
Description: OSS DeadLock
Details: Use trylock to prevent deadlock when shrink icache.
- Bugzilla: 18688
Severity: enhancement
Description: Allow tuning service thread via /proc
Details: For each service a new /proc/fs/lustre/{service}/*/thread_{min,max,started} entry is created that can be used to set min/max thread counts, and get the current number of running threads.
- Bugzilla: 18798
Severity: enhancement
Description: Add state history info file, enhance import info file
Details: Track import connection state changes in a new osc/mdc proc file; add overview-type data to the osc/mdc import proc file.
- Bugzilla: 18645
Severity: normal
Description: Reduce small size read RPC
Details: Set read-ahead limite for every file and only do read-ahead when available read-ahead pages are bigger than 1M to avoid small size read RPC.
- Bugzilla: 18204
Severity: normal
Description: free_entry erroneously used groups_free instead of put_group_info
- Bugzilla: 17817
Severity: enhancement
Description: Make read-ahead stripe size aligned.
- Bugzilla: 17536
Severity: enhancement
Description: MDS create should not wait for statfs RPC while holding DLM lock.
- Bugzilla: 17310
Severity: normal
Frequency: rare, connect and disconnect target at same time
Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0
Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
- Bugzilla: 16839
Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: ll_sync thread stay in waiting mds<>ost recovery finished
Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
- Bugzilla: 18049
Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: aborting recovery hang on MDS
Details: don't throttle destroy RPCs for the MDT.
- Bugzilla: 18016
Severity: low
Description: Slow reads beyond 8Tb offsets.
Details: Page index integer overflow in ll_read_ahead_page
- Bugzilla: 18304
Severity: normal
Description: MSG_CONNECT_INITIAL is not set on the initial MDS->OST connect.
Details: MSG_CONNECT_INITIAL is not set on the initial MDS->OST connect. As a conseqence, the patch from bug 18224 is not operational and the MDS export cannot be reused on the OSTs until it gets evicted.
- Bugzilla: 17895
Severity: major
Frequency: rare, only if using MMP with Linux RAID
Description: MMP doesn't work with Linux RAID
Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
- Bugzilla: 17895
Severity: minor
Frequency: rare, during recovery
Description: Assertion failure in ldlm_lock_put
Details: Do not put cancelled locks into replay list, hold references on locks in replay list
- Bugzilla: 18577
Severity: normal
Description: 1.6.5 mdsrate performance is slower than 1.4.11/12 (MDS is not cpu bound!)
Details: create_count always drops to the min value (=32) because grow_count is being changed before the precreate RPC completes.
- Bugzilla: 19184
Severity: normal
Frequency: Only in RHEL5 when mounting multiple ext3 filesystems simultaneously
Description: kmem_cache_create: duplicate cache jbd_4k" error message
Details: add proper locking for creation of jbd_4k slab cache
- Bugzilla: 19058
Severity: normal
Description: MMP check in ext3_remount() fails without displaying any error
Details: When multiple mount protection fails during remount, proper error should be returned
- Bugzilla: 15010
Severity: Low
Description: Rare Client crash on resend if the file was deleted.
Details: When file is opened, but open reply is lost and file is subsequently deleted before resend, resend processing logic breaks trying to open the file again, should not try to open.
- Bugzilla: 17569
Severity: high
Description: add check for >8TB ldiskfs filesystems
Details: ext3-based ldiskfs does not support greater than 8TB LUNs. Don't allow >8TB ldiskfs filesystems to be mounted without force_over_8tb mount option
- Bugzilla: 20011
Severity: normal
Description: Client locked up when running multiple instances of an app. on multiple mount points
Details: ll_shrink_cache() can sleep while holding the ll_sb_lock. Convert ll_sb_lock to a read/write semaphore to fix the problem.
- Bugzilla: 19559
Severity: normal
Description: Cannot acces an NFS-mounted Lustre filesystem
Details: An NFS client cannot access the Lustre filesystem NFS-mounted from a Lustre-client exporting the Lustre filesystem via NFS.
- Bugzilla: 20139
Severity: normal
Description: panic in ll_statahead_thread
Details: grab dentry reference in parent process.
Changes from v1.8.0 to v1.8.0.1
Support for networks:
- socklnd - any kernel supported by Lustre
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.1 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.37 (SLES 10)
- 2.6.18-128.1.6.el5 (RHEL 5)
- 2.6.22.14 vanilla (kernel.org)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.22 vanilla (kernel.org)
Recommended e2fsprogs version: 1.40.11-sun1
File join has been disabled in this release, refer to bugzilla 16929
A new Lustre ADIO driver is available for MPICH2-1.0.7.
NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630
- Bugzilla: 19520
Severity: major
Description: Handle new CM events in OFED 1.4
- Bugzilla: 17671
Severity: enhancement
Description: Update OFED release to 1.4.1 RC4
- Bugzilla: 19212
Severity: enhancement
Description: Update kernel to SLES10 SP2 2.6.16.60-0.37.
- Bugzilla: 19024
Severity: enhancement
Description: Update to RHEL5.3 kernel-2.6.18-128.1.6.el5.
- Bugzilla: 17671
Severity: enhancement
Description: Add support for OFED 1.4.1.
- Bugzilla: 19731
Severity: enhancement
Description: build ofed 1.4.1 with mlx4_en (Mellanox ConnectX drivers in 10GbE mode) enabled
- Bugzilla: 19553
Severity: major (SLES10/OFED 1.4.1 only)
Description: BUG: soft lockup - CPU#7 stuck for 10s! [ll_imp_inval:18451]
Details: ll_imp_inval can sleep on waiting for a semaphore while holding a spinlock. Convert lco_lock to a semaphore to address the problem.
- Bugzilla: 18518
Severity: major, only with big OST
Description: Very poor metadata performance on Infiniband lustre configuration
Details: OST object precreation becomes very slow on big OSTs. This is due to the ialloc patch spending too much time scanning groups.
- Bugzilla: 18192
Severity: normal
Frequency: during recovery
Description: don't mix llog inodes with normal.
Details: allocate inodes for log in last inode group
- Bugzilla: 19495
Severity: major
Frequency: rare
Description: fix lqs' reference which won't be put in some situations
Details: This patch fixes:
1. In quota_check_common(), this function will check quota for user and group, but only send one return via "pending". In most cases, the pendings should be same. But that is not always the case. 2. If quotaoff runs between lquota_chkquota() and lquota_pending_commit(), the same thing will happen too. That is why it comes: - if (!ll_sb_any_quota_active(qctxt->lqc_sb)) - RETURN(0);
- Bugzilla: 18775
Severity: enhancement
Description: improve lctl set/get_param
Details: handle the bad options, support more than one arguments, add '-F' option to append the indicator to the parameters.
Changes from v1.6.7.1 to v1.8.0
Support for networks:
- socklnd - any kernel supported by Lustre
- qswlnd - Qsnet kernel modules 5.20 and later
- openiblnd - IbGold 1.8.2
- o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3.1
- viblnd - Voltaire ibhost 3.4.5 and later
- ciblnd - Topspin 3.2.0
- iiblnd - Infiniserv 3.3 + PathBits patch
- gmlnd - GM 2.1.22 and later
- mxlnd - MX 1.2.1 or later
- ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x
Support for kernels:
- 2.6.16.60-0.31 (SLES 10)
- 2.6.18-92.1.17.el5 (RHEL 5)
- 2.6.22.14 vanilla (kernel.org)
Client support for unpatched kernels: (see Patchless Client)
- 2.6.16 - 2.6.22 vanilla (kernel.org)
Recommended e2fsprogs version: 1.40.11-sun1
File join has been disabled in this release, refer to bugzilla 16929
A new Lustre ADIO driver is available for MPICH2-1.0.7.
NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630
- Bugzilla: 16114
Severity: minor
Description: minor fixes and cleanups
Details: use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK. Initialize 'ix' variable in extents patch to stop compiler warning.
- Bugzilla: 17942
Severity: feature
Description: update FIEMAP ioctl to match upstream kernel version
Details: the FIEMAP block-mapping ioctl had a prototype version in ldiskfs 3.0.7 but this release updates it to match the interface in the upstream kernel, with a new ioctl number.
- Bugzilla: 18173
Severity: normal
Frequency: only if MMP is active and detects filesystem is in use
Description: if MMP startup fails, an oops is triggered
Details: if ldiskfs mounting doesn't succeed the error handling doesn't clean up the MMP data correctly, causing an oops.
- Bugzilla: 12182
Severity: enhancement
Description: Caching OSS
Details: introduce data caching on the OSS. The OSS now relies on the linux kernel page cache to keep recently accessed data in memory. It is worth noting that all write requests are still flushed synchronously as in lustre 1.6.
- Bugzilla: 10609
Severity: enhancement
Description: version based recovery
Details: introduce finer grained recovery able to detect transaction dependencies and can deal with transaction gaps caused by clients failing at the same time as the server.
- Bugzilla: 3055
Severity: enhancement
Description: Enable adaptive timeouts by default
Details: The Lustre timeout value in /proc/sys/lustre/timeout is now managed dynamically based on server load and should not need to be tuned manually based on cluster size. This allows Lustre to work under a wider variety of system sizes and loads, without unnecessarily causing lengthy recovery times.
- Bugzilla: 15899
Severity: enhancement
Description: Add OST Pools support
Details: File striping can now be set to use an arbitrary pool of OSTs
- Bugzilla: 17974
Severity: enhancement
Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs
Details: allow skip disconnected ost for send statfs request and hide error in this case.
- Bugzilla: 16839
Severity: normal
Frequency: rare, on llog test 6
Description: don't allow connect to already connected import
Details: allowing connect to already connected import is hide connecting problem.
- Bugzilla: 17310
Severity: normal
Frequency: rare, connect and disconnect target at same time
Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0
Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
- Bugzilla: 18896
Severity: normal
Frequency: rare, on failed llog setup
Description: don't leak obd reference on failed llog setup
Details: for failed llog setup - mgc forget call class_destroy_import for client import, move destroy import to more generic place.
- Bugzilla: 18902
Severity: normal
Frequency: rare
Description: allow kill process which wait statahead result
Details: for some reasons 'ls' can stick in waiting result from statahead, in this case need way for kill this process.
- Bugzilla: 18154
Severity: normal
Frequency: rare
Description: don't lose wakeup for imp_recovery_waitq
Details: recover_import_no_retry or invalidate_import and import_close can both sleep on imp_recovery_waitq, but we was send only one wakeup to sleep queue.
- Bugzilla: 18773
Severity: normal
Frequency: rare, at shutdown
Description: panic at umount
Details: llap_shrinker can be raced with killing super block from list and this produce panic with access to already freeded pointer
- Bugzilla: 18238
Severity: normal
Frequency: rare
Description: panic in mds_open
Details: don't confuse mds_finish_transno() with PTR_ERR(-ENOENT)
- Bugzilla: 17972
Severity: normal
Frequency: rare
Description: stuck in cache_remove_extent() or panic with accessing to already freed look.
Details: release lock refernce only after add page to pages list.
- Bugzilla: 16839
Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: ll_sync thread stay in waiting mds<>ost recovery finished
Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
- Bugzilla: 17636
Severity: normal
Frequency: always with long access acl
Description: mds can't pack reply with long acl.
Details: mds don't control size of acl but they limited by reint/getattr reply buffer.
- Bugzilla: 18049
Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: aborting recovery hang on MDS
Details: don't throttle destroy RPCs for the MDT.
- Bugzilla: 18018
Severity: major
Frequency: on remount
Description: external journal device not working after the remount
Details: clear dev_rdonly flag for external journal devices in blkdev_put()
- Bugzilla: 17802
Severity: minor
Frequency: rare
Description: shutdown vs evict race
Details: client_disconnect_export vs connect request race. if client will evicted at this time - we start invalidate thread without referece to import and import can be freed at same time.
- Bugzilla: 16693
Severity: minor
Frequency: always
Description: shrink LOV EAs before replying
Details: correctly adjust LOV EA buffer for reply.
- Bugzilla: 16081
Severity: normal
Frequency: rare
Description: don't skip ost target if they assigned to file
Details: Drop slow OSCs if we can, but not for requested start idx. This means "if OSC is slow and it is not the requested start OST, then it can be skipped, otherwise skip it only if it is inactive/recovering/out-of-space.
- Bugzilla: 17201
Severity: enhancement
Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.
- Bugzilla: 17458
Severity: enhancement
Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.
- Bugzilla: 16492
Severity: normal
Frequency: rare, need acl's on inode.
Description: client can't handle ost additional correctly
Details: if ost was added after client connected to mds client can have hit lnet_try_match_md ... to big messages to wide striped files. in this case need teach client to handle config events about add lov target and update client max ea size at that event.
- Bugzilla: 16578
Severity: normal
Frequency: Create a symlink file with a very long name
Description: ldlm_cancel_pack()) ASSERTION(max >= dlm->lock_count + count)
Details: If there is no extra space in the request for early cancels, ldlm_req_handles_avail() returns 0 instead of a negative value.
- Bugzilla: 16492
Severity: major
Frequency: rare
Description: mds is deadlocked
Details: in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order.
- Bugzilla: 1819
Severity: enhancement
Description: Add /proc entry for import status
Details: The mdc, osc, and mgc import directories now have an import directory that contains useful import data for debugging connection problems.
- Bugzilla: 15966
Severity: enhancement
Description: Re-disable certain /proc logging
Details: Enable and disable client's offset_stats, extents_stats and extents_stats_per_process stats logging on the fly.
- Bugzilla: 16303
Severity: major
Frequency: Only on FC kernels 2.6.22+
Description: oops in statahead
Details: Do not drop reference count for the dentry from VFS when lookup, VFS will do that by itself.
- Bugzilla: 16643
Severity: enhancement
Description: Generic /proc file permissions
Details: Set /Proc file permissions in a more generic way to enable non-root users operate on some /proc files.
- Bugzilla: 16561
Severity: major
Description: Hitting mdc_commit_close() ASSERTION
Details: Properly handle request reference release in ll_release_openhandle().
- Bugzilla: 15975
Severity: normal
Description: only patchless client
Details: add workaround for race between add/remove dentry from hash
- Bugzilla: 16845
Severity: enhancement
Description: Allow OST glimpses to return PW locks
- Bugzilla: 16717
Severity: minor
Description: LBUG when llog conf file is full
Details: When llog bitmap is full, ENOSPC should be returned for plain log.
- Bugzilla: 16907
Severity: normal
Description: Prevent import from entering FULL state when server in recovery
- Bugzilla: 16750
Severity: major
Description: service mount cannot take device name with ":"
Details: Only when device name contains ":/" will mount treat it as client mount.
- Bugzilla: 15927
Severity: normal
Frequency: rare
Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback
- Bugzilla: 16611
Severity: normal
Frequency: on recovery
Description: I/O failures after umount during fail back
Details: if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks.
- Bugzilla: 15825
Severity: normal
Description: Kernel BUG tries to release flock
Details: Lustre does not destroy flock lock before last reference goes away. So always drop flock locks when client is evicted and perform unlock regardless of successfulness of speaking to MDS.
- Bugzilla: 16566
Severity: enhancement
Description: Upcall on Lustre log has been dumped
Details: Allow for a user mode script to be called once a Lustre log has been dumped. It passes the filename of the dumped log to the script, the location of the script can be specified via /proc/sys/lnet/debug_log_upcall.
- Bugzilla: 16583
Severity: minor
Frequency: rare
Description: avoid messages about idr_remove called for id that is not allocated
Details: Move assigment s_dev for clustered nfs to end of initialization, for avoid problem with error handling.
- Bugzilla: 16109
Severity: minor
Frequency: rare
Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages
Details: When connection is reused this not moved from CONN_UNUSED_HASH into CONN_USED_HASH and this prodice warning when put connection again in unused hash.
- Bugzilla: 15139
Severity: normal
Frequency: rare
Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed
Details: release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export.
- Bugzilla: 16679
Severity: normal
Description: more cleanup in mds_lov
Details: add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer.
- Bugzilla: 16573
Severity: enhancement
Description: Export bytes_read/bytes_write count on OSC/OST.
- Bugzilla: 16237
Severity: normal
Description: Early reply size mismatch, MGC loses connection
Details: Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so the connect flags are properly negotiated.
- Bugzilla: 16006
Severity: normal
Description: Properly propagate oinfo flags from lov to osc for statfs
Details: restore missing copy oi_flags to lov requests.
- Bugzilla: 16317
Severity: normal
Description: exports in /proc are broken
Details: recreate /proc entries for clients when they reconnect.
- Bugzilla: 16581
Severity: enhancement
Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
Details: included man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
- Bugzilla: 16208
Severity: enhancement
Description: Implement lustre ll_show_options method.
- Bugzilla: 16317
Severity: normal
Description: exports in /proc are broken
Details: recreate /proc entries for clients when they reconnect.
- Bugzilla: 16080
Severity: normal
Description: don't fail open with -ERANGE
Details: if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA.
- Bugzilla: 15576
Severity: normal
Description: Resolve device initialization race
Details: Prevent proc handler from accessing devices added to the obd_devs array but yet be intialized.
- Bugzilla: 16091
Severity: enhancement
Description: configure's --enable-quota should check the kernel .config for CONFIG_QUOTA
Details: configure is terminated if --enable-quota is passed but no quota support is in kernel
- Bugzilla: 16318
Severity: normal
Frequency: rare, on PPC clients
Description: don't swab ost objects in response about directory, because this not exist.
Details: bug similar bug 14856, but in different function.
- Bugzilla: 15754
Severity: enhancement
Description: lfs quota tool enhancement
Details: added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page.
- Bugzilla: 15625
Severity: enhancement
Description: *optional* service tags registration
Details: if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more information about the Service Tags asset management system.
- Bugzilla: 16037
Severity: normal
Description: Client runs out of low memory
Details: Consider only lowmem when counting initial number of llap pages
- Bugzilla: 15210
Severity: normal
Frequency: occasional
Description: add refcount for osc callbacks, so avoid panic on shutdown
- Bugzilla: 12653
Severity: normal
Frequency: testing only
Description: sanity test 65a fails if stripecount of -1 is set
Details: handle -1 striping on filesystem in ll_dirstripe_verify
- Bugzilla: 16014
Severity: normal
Frequency: only in unusual configurations
Description: Kernel panic with find ost index.
Details: lov_obd have panic if some OST's have sparse indexes.
- Bugzilla: 15924
Severity: major
Frequency: rarely, if filesystem is mounted with -o flock
Description: do not process already freed flock
Details: flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast.
- Bugzilla: 14480
Severity: normal
Frequency: rarely, if filesystem is mounted with -o flock
Description: LBUG during stress test
Details: Need properly lock accesses the flock deadlock detection list.
- Bugzilla: [1]
Severity: minor
Frequency: rarely, if binaries are being run from Lustre
Description: oops in page fault handler
Details: kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
- Bugzilla: 15716
Severity: minor
Frequency: rarely, during shutdown
Description: timeout with invalidate import.
Details: ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be handled by ptlrpcd. This produce long age waiting and -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.
- Bugzilla: 14742
Severity: normal
Frequency: rarely
Description: ASSERTION(CheckWriteback(page,cmd)) failed
Details: badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion.
- Bugzilla: 15779
Severity: normal
Frequency: only with broken builds/installations
Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions
Details: just return an error to a user, put a console error message
- Bugzilla: 14134
Severity: enhancement
Description: enable MGS and MDT services start separately
Details: add a 'nomgs' option in mount.lustre to enable start a MDT with a co-located MGS without starting the MGS, which is a complement to 'nosvc' mount option.
- Bugzilla: 14856
Severity: normal
Frequency: always, on big-endian systems
Description: cleanup in ptlrpc code, related to PPC platform
Details: store magic in native order avoid panic's in recovery on PPC node and forbid from this error in future. Also fix possibly of twice swab data. Fix get lov striping to userpace.
- Bugzilla: 15756
Severity: normal
Frequency: rarely, if replay get lost on server
Description: server incorrectly drop resent replays lead to recovery failure.
Details: do not drop replay according to msg flags, instead we check the per-export recovery request queue for duplication of transno.
- Bugzilla: 14835
Severity: normal
Frequency: after recovery
Description: precreate to many object's after del orphan.
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
- Bugzilla: 14835
Severity: normal
Frequency: after recovery
Description: precreate to many object's after del orphan.
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
- Bugzilla: 15139
Severity: normal
Frequency: rare, on clear nid stats
Description: ASSERTION(client_stat->nid_exp_ref_count == 0)
Details: when clean nid stats sometimes try destroy live entry, and this produce panic in free.
- Bugzilla: 15575
Severity: major
Frequency: occasionally since 1.6.4
Description: Stack overflow during MDS log replay
Details: ease stack pressure by using a thread dealing llog_process.
- Bugzilla: 13380
Severity: minor
Frequency: very rare
Description: MDT cannot be unmounted, reporting "Mount still busy"
Details: Mountpoint references were being leaked during open reply reconstruction after an MDS restart. Drop mountpoint reference in reconstruct_open() and free dentry reference also.
- Bugzilla: 15443
Severity: normal
Frequency: rare
Description: wait until IO finished before start new when do lock cancel.
Details: VM protocol want old IO finished before start new, in this case need wait until PG_writeback is cleared until check dirty flag and call writepages in lock cancel callback.
- Bugzilla: 12888
Severity: normal
Frequency: rare
Description: mds_mfd_close() ASSERTION(rc == 0)
Details: In mds_mfd_close(), we need protect inode's writecount change within its orphan write semaphore to prevent possible races.
- Bugzilla: 14645
Severity: minor
Frequency: rare, on shutdown ost
Description: don't hit live lock with umount ost.
Details: shrink_dcache_parent can be in long loop with destroy dentries, use shrink_dcache_sb instead.
- Bugzilla: 14949
Severity: minor
Frequency: only when echo_client is used
Description: don't panic with use echo_client
Details: echo client pass NULL as client nid pointer and this produce NULL pointer dereference.
- Bugzilla: 15278
Severity: normal
Frequency: Always on 32-bit PowerPC systems
Description: fix build on PPC32
Details: compile code with -m64 flag produce wrong object file for PPC32.
- Bugzilla: 15574
Severity: normal
Frequency: rare
Description: MDS LBUG: ASSERTION(!IS_ERR(dchild))
Details: In reconstruct_* functions, LASSERTs on both the data supplied by a client, and the data on disk are dangerous and incorrect. Change them with client eviction.
- Bugzilla: 15346
Severity: enhancement
Description: skiplist implementation simplification
Details: skiplists are used to group compatible locks on granted list that was implemented as tracking first and last lock of each lock group the patch changes that to using doubly linked lists
- Bugzilla: 15933
Severity: normal
Description: delete compatibility for 32bit qdata
Details: as planned, when lustre is beyond b1_8, lquota won't support 32bit qunit. That means servers of b1_4 and servers of b1_8 can't be used together if users want to use quota.
- Bugzilla: 14693
Severity: normal
Frequency: only with administrator action
Description: mount failure if config log has invalid conf_param setting
Details: If administrator specified an incorrect configuration parameter with "lctl conf_param" this would cause an error during future client mounts. Instead, ignore the bad configuration parameter.
- Bugzilla: 15932
Severity: normal
Frequency: blocks per group < blocksize*8 and uninit_groups is enabled
Description: ldiskfs error: XXX blocks in bitmap, YYY in gd
Details: If blocks per group is less than blocksize*8, set rest of the bitmap to 1.
- Bugzilla: 16172
Severity: major
Frequency: Application do stride read on lustre
Description: The read performance will drop a lot if the application does stride read.
Details: Because the stride_start_offset are missing in stride read-ahead, it will cause clients read a lot of unused pages in read-ahead, then the read-performance drops.
- Bugzilla: 15953
Severity: normal
Description: more ldlm soft lockups
Details: In ldlm_resource_add_lock(), call to ldlm_resource_dump() starve other threads from the resource lock for a long time in case of long waiting queue, so change the debug level from D_OTHER to the less frequently used D_INFO.
- Bugzilla: 13128
Severity: enhancement
Description: add -gid, -group, -uid, -user options to lfs find
- Bugzilla: 15284
Severity: enhancement
Description: ll_recover_lost_found_objs - recover objects in lost+found
Details: OST corruption and subsequent e2fsck can leave objects in the lost+found directory. Using the "ll_recover_lost_found_objs" tool, these objects can be retrieved and data can be salvaged by using the object ID saved in the fid EA on each object.
- Bugzilla: 15758
Severity: minor
Frequency: rare
Description: this bug _only_ happens when inode quota limitation is very low (less than 12), so that inode quota unit is 1 at initialization.
Details: if remaining quota equates 1, it is a sign to demonstate that quota is effective now. So least quota qunit should be 2.
- Bugzilla: 15950
Severity: normal
Description: Hung threads in invalidate_inode_pages2_range
Details: The direct IO path doesn't call check_rpcs to submit a new RPC once one is completed. As a result, some RPCs are stuck in the queue and are never sent.
- Bugzilla: 15684
Severity: normal
Description: Procfs and llog threads access destoryed import sometimes.
Details: Sync the import destoryed process with procfs and llog threads by the import refcount and semaphore.
- Bugzilla: 15674
Severity: major
Description: mds fails to respond, threads stuck in ldlm_completion_ast
Details: Sort source/child resource pair after updating child resource.
- Bugzilla: 16226
Severity: major
Frequency: rare
Description: kernel BUG at ldiskfs2_ext_new_extent_cb
Details: If insertion of an extent fails, then discard the inode preallocation and free data blocks else it can lead to duplicate blocks.
- Bugzilla: 16199
Severity: normal
Description: don't always update ctime in ext3_xattr_set_handle()
Details: Current xattr code updates inode ctime in ext3_xattr_set_handle() In some cases the ctime should not be updated, for example for 2.0->1.8 compatibility it is necessary to delete an xattr and it should not update the ctime.
- Bugzilla: 15058
Severity: normal
Description: add quota statistics
Details: 1. sort out quota proc entries and proc code. 2. add quota statistics
- Bugzilla: 16125
Severity: normal
Frequency: often
Description: quotas are not honored with O_DIRECT
Details: all writes with the flag O_DIRECT will use grants which leads to this problem. Now using OBD_BRW_SYNC to guard this.
Severity: major
Frequency: rare
Description: Assertion in iopen_connect_dentry in 1.6.3
Details: looking up an inode via iopen with the wrong generation number can populate the dcache with a disconneced dentry while the inode number is in the process of being reallocated. This causes an assertion failure in iopen since the inode's dentry list contains both a connected and disconnected dentry.
- Bugzilla: 16496
Severity: normal
Description: assertion failure in ldlm_handle2lock()
Details: fix a race between class_handle_unhash() and class_handle2object() introduced in lustre 1.6.5 by bug 13622.
- Bugzilla: 11817
Severity: enhancement
Description: superblock lock contention with many SMP cores on one client
Details: several client filesystem locks were highly contended on SMP NUMA systems with 8 or more cores. Per-CPU datastructure and more efficient locking implemented to reduce contention.
- Bugzilla: 12755
Severity: minor
Frequency: rare
Description: Kernel BUG: sd_iostats_bump: unexpected disk index
Details: remove the limit of 256 scsi disks in the sd_iostat patch
- Bugzilla: 16494
Severity: minor
Frequency: rare
Description: oops in sd_iostats_seq_show()
Details: unloading/reloading the scsi low level driver triggers a kernel bug when trying to access the sd iostat file.
- Bugzilla: 16404
Severity: major
Frequency: rare
Description: Kernel panics during QLogic driver reload
Details: REQ_BLOCK_PC requests are not handled properly in the sd iostat patch, causing memory corruption.
- Bugzilla: 16140
Severity: minor
Frequency: rare
Description: journal_dev option does not work in b1_6
Details: pass mount option during pre-mount.
- Bugzilla: 10555
Severity: enhancement
Frequency:
Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs
Details: FIEMAP ioctl will allow an application to efficiently fetch the extent information of a file. It can be used to map logical blocks in a file to physical blocks in the block device.
- Bugzilla: 16972
Severity: normal
Frequency: only with adaptive timeout enabled
Description: DEBUG_REQ() bad paging request
Details: ptlrpc_at_recv_early_reply() should not modify req->rq_repmsg because it can be accessed by reply_in_callback() without the rq_lock held.
- Bugzilla: 16813
Severity: normal
Frequency: only on Cray X2
Description: X2 build failures
Details: fix build failures on Cray X2.
- Bugzilla: 2066
Severity: normal
Description: xid & resent requests
Details: Initialize RPC XID from clock at startup (randomly if clock is bad).
- Bugzilla: 14840
Severity: major
Description: quota recovery deadlock during mds failover
Details: This patch includes att18982, att18236, att18237 in bz14840. Solve the problems: 1. fix osts hang when mds does failover with quotaon 2. prevent watchdog storm when osts threads wait for the recovery of mds
- Bugzilla: 16695
Severity: normal
Description: kernel panic on racer
Details: Do not access dchild->d_inode when IS_ERR(dchild) is true.
- Bugzilla: 14095
Severity: enhancement
Description: Add lustre_start utility to start or stop multiple Lustre servers from a CSV file.
- Bugzilla: 17024
Severity: major
Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373}
Details: In case of memory pressure, list_del() can be called twice on req->rq_history_list, causing a kernel oops.
- Bugzilla: 17026
Severity: normal
Description: kptllnd_peer_check_sends()) ASSERTION(!in_interrupt()) failed
Details: fix stack overflow in the distributed lock manager by defering export eviction after a failed ast to the elt thread instead of handling it in the dlm interpret routine.
- Bugzilla: 12800
Severity: enhancement
Description: More exported tunables for mballoc
Details: Add support for tunable preallocation window and new tunables for large/small requests
- Bugzilla: 16680
Severity: normal
Description: Detect corruption of block bitmap and checking for preallocations
Details: Checks validity of on-disk block bitmap. Also it does better checking of number of applied preallocations. When corruption is found, it turns filesystem readonly to prevent further corruptions.
- Bugzilla: 16438
Severity: normal
Frequency: only for big-endian servers
Description: Check if big-endian system while mounting fs with extents feature
Details: Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. Can be overridden with "bigendian_extents" mount option.
- Bugzilla: 16860
Severity: normal
Description: Excessive recovery window
Details: With AT enabled, the recovery window can be excessively long (6000+ seconds). To address this problem, we no longer use OBD_RECOVERY_FACTOR when extending the recovery window (the connect timeout no longer depends on the service time, it is set to INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time.
- Bugzilla: 16522
Severity: normal
Description: Watchdog triggered on MDS failover
Details: enable OBD_CONNECT_MDT flag when connecting from the MDS so that the OSTs know that the MDS "UUID" can be reused for the same export from a different NID, so we do not need to wait for the export to be evicted.
- Bugzilla: 16919
Severity: enhancement
Description: Don't sync journal after every i/o
Details: Implement write RPC replay to allow server replies for write RPCs before data is on disk. However, this feature is disabled by default since some issues leading to data corruptions have been found during recovery (e.g. bug 19128). This feature can be enabled by running the following command on the OSSs: lctl set_param obdfilter.*.sync_journal=0
- Bugzilla: 18016
Severity: low
Description: Slow reads beyond 8Tb offsets.
Details: Page index integer overflow in ll_read_ahead_page
- Bugzilla: 17895
Severity: major
Frequency: rare, only if using MMP with Linux RAID
Description: MMP doesn't work with Linux RAID
Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
- Bugzilla: 17895
Severity: minor
Frequency: rare, during recovery
Description: Assertion failure in ldlm_lock_put
Details: Do not put cancelled locks into replay list, hold references on locks in replay list
- Bugzilla: 18695
Severity: critical
Description: Lustre detected file system corruption with inode out of bounds
Details: don't update i_size on MDS_CLOSE for directories. This causes directory corruptions on the MDT.
- Bugzilla: 19223
Severity: normal
Description: client doesn't try to reconnect
Details: correctly skip time estimate if in recovery