Change Log 1.8

(Updated: Aug 2010)

=Changes from v1.8.3 to v1.8.4= Support for networks:
 * socklnd  - any kernel supported by Lustre,
 * qswlnd   - Qsnet kernel modules 5.20 and later,
 * openiblnd - IbGold 1.8.2,
 * o2iblnd  - OFED 1.3, 1.4.1, 1.4.2 and 1.5.1
 * viblnd   - Voltaire ibhost 3.4.5 and later,
 * ciblnd   - Topspin 3.2.0,
 * iiblnd   - Infiniserv 3.3 + PathBits patch,
 * gmlnd    - GM 2.1.22 and later,
 * mxlnd    - MX 1.2.10 or later,
 * ptllnd   - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Server support for kernels:
 * 2.6.16.60-0.42.8 (SLES 10),
 * 2.6.27.39-0.3.1 (SLES11),
 * 2.6.18-194.3.1.el5 (RHEL 5)
 * 2.6.18-194.3.1.0.1.el5 (OEL 5)

Client support for unpatched kernels: see "Patchless Client" 2.6.16 - 2.6.30 vanilla (kernel.org)

Recommended e2fsprogs version:
 * 1.41.10-sun2

The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.

Severity: normal Description: Reduce group prealloc size and skip groups with little free space.
 * Bugzilla: 18456

Severity: normal Description: Fix issue with proc_remove.
 * Bugzilla: 22237

Severity: normal Description: Disable delayed allocation by default for ext4-based ldiskfs on RHEL5.5
 * Bugzilla: 23368

Severity: normal Description: A mount failure can corrupt the slab. This is a bug in the latest RHEL5.5 kernel and only ext4-based ldiskfs is impacted.
 * Bugzilla: 23368

Severity: normal Description: With peer health detection, o2iblnd makes only one attempt to reconnect which is not enough with nodes running lustre 1.6 because of proto version mismatch. Fix o2iblnd to retry one more time.
 * Bugzilla: 23076

Severity: normal Description: add mount option to disable mb_cache since it can cause slowdown.
 * Bugzilla: 22771

Severity: enhancement Description: Quiet some LNET messages
 * Bugzilla: 16909

Severity: enhancement Description: Add OFED 1.5.1 support
 * Bugzilla: 22787

Severity: enhancement Description: The peer health code lacked some important debugging info in lnd_query code paths. We've added necessary debug prints, not just for bug 21678, but also for future troubleshooting.
 * Bugzilla: 21678

Severity: enhancement Description: Update RHEL5.5 kernel to 2.6.18-194.3.1.el5 and OEL5.5 kernel to 2.6.18-194.3.1.0.1.el5.
 * Bugzilla: 22514

Severity: enhancement Description: using inkernel OFED stack for rhel5 & oel5.
 * Bugzilla: 22514

Severity: enhancement Description: Add "lfs_migrate" script from manual into lustre/scripts and RPMs Details: lfs_migrate does a "poor man's" migration of files from their current OST layout to a new OST layout as chosen by the MDS.
 * Bugzilla: 22481

Severity: normal Description: mds_orphan_add_link) error linking orphan to PENDING Details: quota limits might disallow linking orphans to PENDING when unlinking a file - temporary raise threads' privileges when processing unlinks.
 * Bugzilla: 22679

Severity: enhancement Description: add conf-param -d option to remove permanent settings. Details: Add the ability to remove permanent lctl conf_param settings. (Previously conf_param settings could only be changed, not removed.) This also provides a method to change failover nid locations. Improve lctl man page.
 * Bugzilla: 15253

Severity: enhancement Description: add list_param to b1_8 and add "-R" option to list params recursively
 * Bugzilla: 22455

Severity: enhancement Description: lfs quota output is not very convenient for awk/sed-parsing Details: Some positions in lfs quota output table could be empty or non-empty which made it hard to parse it with scripts, now a dash is put instead of space where there is not supposed to be any data.
 * Bugzilla: 22194

Severity: enhancement Description: fix obdfilter-survey script to work properly with remote oss-s
 * Bugzilla: 15685

Severity: enhancement Description: add new OBDFILTER_SURVEY test suite
 * Bugzilla: 22402

Severity: enhancement Description: add new multiple mount protection (MMP) test suite
 * Bugzilla: 20326

Severity: enhancement Description: add support for async journal commit in echo client
 * Bugzilla: 21647

Severity: enhancement Description: allow userland programs to include  from stardard include directories
 * Bugzilla: 21244

Severity: enhancement Description: The prune-icache-use-trylock is no longer needed now that the patch from bug 20008 is landed.
 * Bugzilla: 18399

Severity: normal Description: The shrink grant feature is still active on the client although the connect flag is not set.
 * Bugzilla: 22755

Severity: normal Description: Don't leak grant space if the write failed with quota exceeded.
 * Bugzilla: 22755

Severity: normal Description: Don't consume grant space twice on recoverable resent.
 * Bugzilla: 22755

Severity: normal Description: a race condition could lead to SIGBUS being sent to an application using mmap-ped files from Lustre Details: truncate_complete_page implementation for the patchless client could arbitrarily unset PG_Uptodate flag for a page being kicked from the page cache, an uptodate check right after a readpage call in filemap_fault could fail because of that as though the page read had been unsuccessful.
 * Bugzilla: 22610

Severity: normal Description: dlm lock slab shrinking is not efficient Details: The dlm_locks slab can grow significantly and consumes a lot of memory on the server. Set a hardlimit to grant_plan.
 * Bugzilla: 22476

Severity: normal Description: Lustre does not do 1MB IOs to HW RAID Details: Bump MAX_PHYS/HW_SEGMENTS and SG_ALL to 256 in the RHEL5 kernel. This is what we do already for SLES kernels.
 * Bugzilla: 22850

Severity: normal Description: bump maximum number of phys/hw segments in the SLES11 kernel until s/g chaining works properly.
 * Bugzilla: 22223

Severity: normal Description: LSI Fusion MPT driver hacks to improve performance Details: Set CONFIG_FUSION_MAX_SGE to 256 for RHEL5
 * Bugzilla: 17086

Severity: enhancement Description: increase default md stripe_cache_size to 16k
 * Bugzilla: 22509

Severity: normal Description: don't handle security.capability xattr Details: CONFIG_SECURITY_FILE_CAPABILITIES is enabled by default on SLES11. This results in additional getxattr calls, causing VBR testfailures as well as a preformance drop when writing.
 * Bugzilla: 15587
 * Bugzilla: 21439

Severity: normal Description: obdfilter-survey is no longer working Details: revert patch from bug 20355 to resolve an issue with lctl --threads not working correctly with $(PTHREAD_LIBS) being linked to lctl.
 * Bugzilla: 22749

Severity: normal Description: ll_shrink_cache does not handle __GFP_FS properly
 * Bugzilla: 22786

Severity: normal Description: lfs getstripe shows wrong info for directories Details: Set correct LOVEA default values for filesystem-wide.
 * Bugzilla: 19102

Severity: normal Description: FSX checksum false positves due to mmap IO Details: Use OBD_FL_MMAP flag for IOs on a memory mapped file. Do not print checksum errors, if the flag is set on a request.
 * Bugzilla: 11742

Severity: normal Description: file operations after eviction have successful return values Details: use vfs ->flush callback to return any pending async errors on file close.
 * Bugzilla: 22360

Severity: normal Description: mdsrate fails to write after 1.3+M files opened Details: decrease memory usage on clients by recycling dentries and inodes.
 * Bugzilla: 20433

Severity: normal Description: obdfilter-survey gives unreasonably high numbers Details: Wait for all threads to complete when running test_brw.
 * Bugzilla: 17382

Severity: normal Description: do not set lustre read_only device when server umount and keep client records for recoverable ones
 * Bugzilla: 22299

Severity: normal Description: move sync_on_lock_cancel tunable to the obdfilter layer Details: move the tunable to trigger a journal flush on lock cancel from the ost layer to the obdfilter layer. This tunable is useful when using the async journal commit feature.
 * Bugzilla: 22241

Severity: normal Description: exp->exp_nid_stats == NULL in filter_tally Details: fix race with per-nid stats by delaying procfs cleanup until exp_refcount == 0
 * Bugzilla: 21871

Severity: normal Description: extent lock cancellation on client can keep the cpu busy for too long.
 * Bugzilla: 21556

Severity: normal Description: Do not fail OST activation when a llog is not found, just issue an error message.
 * Bugzilla: 22658

Severity: normal Description: Don't enable extents by default for MDT.
 * Bugzilla: 22911

Severity: normal Description: Protect bitfield access to ptlrpc_request's rq_flags, since the AT code can access it concurrently while sending early replies.
 * Bugzilla: 21877

Severity: normal Description: Disable lockless truncate by default since it is sometimes flawed and causes the write_disjoint test to fail.
 * Bugzilla: 23175

Severity: normal Description: OSSs which don't have the patch from bug 20278 can trigger an LBUG on 1.8 clients.
 * Bugzilla: 23139

Severity: enhancement Description: don't print message to the console when we have not managed to cancel all locks.
 * Bugzilla: 21528

Severity: normal Description: The MDS fails to synchronize OSTs which registered with the MGS after the MDT. The problem is that OBD_NOTIFY_CREATE events are raised too early and thus discarded by the MDT stack. The fix consists of issuing OBD_NOTIFY_CREATE event in the lov layer.
 * Bugzilla: 23305

Severity: normal Description: Fix race when the ping evictor and a service thread execute target_recovery_check_and_stop concurrently.
 * Bugzilla: 23192

Severity: normal Description: quota broadcast can trigger a LBUG on the MDT if there are inactive OSCs.
 * Bugzilla: 23196

Severity: enhancement Description: Resetting the lov_objid values to last_id reported by the OST during orphan recovery is incorrect and can cause the same objects to be allocated twice.
 * Bugzilla: 17485

Severity: enhancement Description: "weak-modules" support Details: Implement "weak-modules" support which enables kernel modules to be used with any kernel that implements the same kABI. In order to achieve this modules are now installed in /lib/modules/$(uname -r)/updates/kernel on all distributions.
 * Bugzilla: 21452

Severity: enhancement Description: add writeconf as mount option
 * Bugzilla: 22464

Severity: enhancement Description: produce debuginfo packages for SLES.
 * Bugzilla: 22846

Severity: enhancement Description: add failover nidlist to the import proc file.
 * Bugzilla: 15253

Severity: enhancement Description: fix LUSTRE_SEQ_MAX_WIDTH for interoperability between 1.8 clients and 2.0 servers.
 * Bugzilla: 20563

Severity: enhancement Description: lfs find -s does not work correctly because of a bug in find_value_cmp.
 * Bugzilla: 22938

Severity: normal Description: ll_read_ahead_page must validate the dlm lock before using it.
 * Bugzilla: 22309

Severity: normal Description: Prevent failover nids from registering with MGS first.
 * Bugzilla: 22656

Severity: normal Description: fix lock inversion in ll_setattr_raw.
 * Bugzilla: 11063

Severity: normal Description: object allocation is not balanced across OSTs. Details: osc_precreate should return 0, if there are enough objects left.
 * Bugzilla: 22884

=Changes from v1.8.2 to v1.8.3= Support for networks:
 * socklnd - any kernel supported by Lustre™
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1 and 1.4.2
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.10 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.42.8 (SLES 10)
 * 2.6.27.39-0.3.1 (SLES11, i686 & x84_64 only)
 * 2.6.18-164.11.1.el5 (RHEL 5)
 * 2.6.18-164.11.1.0.1.el5 (OEL 5)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.30 vanilla (kernel.org)

Recommended e2fsprogs version: 1.41.10-sun2

The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.

Severity: normal Description: fix for a race condition in linux quotas implementation Details: dq_flags(struct dquot) access is not properly locked which could lead to certain inconsistencies when accessing it using non-atomic bit operations like __set_bit in do_set_dqblk. This patch replaces non-atomic __set_bit calls with atomic set_bit calls.
 * Bugzilla: 22363

Severity: normal Description: initialize the child_res_id for OPEN lock Details: in mds_open, initialize the child_res_id before enqueuing the OPEN lock for the child inode, then to avoid senting wrong ldlm_res_id to client.
 * Bugzilla: 22307

Severity: normal Description: lst: check # of remained RPCs before aborting Details: lstcon_rpc_trans_postwait calls lstcon_rpc_trans_abort only when the transaction is timeout, so if we got "end_session" to interrupt waiting on transaction, then we can hit the assertion failure ASSERTION(crpc->crp_stamp != 0)
 * Bugzilla: 22556

Severity: normal Description: Suppress "changing the import ..." warning. Details: This warning will always be printed when the MDT reconnects to an OST after the MDT is restarted. There is nothing wrong here and more importantly there is nothing the admin should do or care about so I'm moving the warning to D_HA.
 * Bugzilla: 16909

Severity: normal Description: Use INFO/WARN instead of WARN/ERROR for the slow messages. Details: We should use INFO/WARN instead of WARN/ERROR for the slow messages. Not only is there no real error here but it fixes an annoying quirk of the message formatting. With the old levels you would see the messages formatted differently based on the time.
 * Bugzilla: 16909

Severity: normal Description: Computing result of unsigned variable may < 0.
 * Bugzilla: 22385

Severity: major Description: allow multiple instances of the same nid in NID hash Details: Case of multiple separate clients from the same NID (as with liblustre) is legitimate and so we should allow multiple instances of the same NID in nid hash.
 * Bugzilla: 22252

Severity: normal Description: rely on pings to issue reconnects Details: Don't wake up pinger on reconnect failures and rely on regular pings to trigger the next reconnection. Please note that the pinger already uses a smaller interval if the import is disconnected.
 * Bugzilla: 22423

Severity: normal Description: print more debug info for timedout ZC-req Details: 1. output more information for timedout ZC-req and partial received connection 2. close connection for timedout ZC-req 3. always send ZC_ACK on non-blocking connection(BULK_IN)
 * Bugzilla: 20615

Severity: normal Description: remove lock acquisition during holding spinlock Details: in ras_update, "lov_get_info" could be called during increasing readahead windows, which tries to get the mutex lock "lov_lock" while holding the spin_lock "ras_lock", then causes system lockup.
 * Bugzilla: 22307

Severity: normal Description: ASSERTION(cli->cl_avail_grant >= 0) failed Details: This patch tries to address several issues: 1. osc_init_grant: calculate avail_grant according to recovery status. 2. osc_reconnect: request grant should include cl_dirty. 3. filter_grant: beside server reboot, we should also grant the requested amount in case of normal reconnect. 4. round-up grant amount instead of round-down, otherwise client would still have situation that dirty > granted.
 * Bugzilla: 20278

Severity: normal Description: Use CNETERR in specific places in the portal's LNET driver
 * Bugzilla: 20805

Severity: normal Description: include last created object in precreate slow case
 * Bugzilla: 22108

Severity: normal Description: don't do rep-ack if not created anything Details: mds_open currently always put a lock into a rep-ack regardless if something was created or not. This is pointless and only creates needless contention. In fact the entire idea was to do this for real creates as a recovery protection.
 * Bugzilla: 20373

Severity: normal Description: Spurious error messages from smp_processor_id on preemptible kernel Details: Disable a preemption by grabbing the lock in fs_trace_get_tcd first. The function fs_trace_get_tcd was moved up.
 * Bugzilla: 22409

Severity: normal Description: 2.6.31-fc12 patchless client support.
 * Bugzilla: 21500

Severity: normal Description: give the BUILD_TESTS love to ldiskfs as well Details: Because ldiskfs re-uses so (too?) much of the lustre auto* goop we need to stub the BUILD_TESTS assignment into it's autoMakefile.am, even though it's completely unused/unneed there.
 * Bugzilla: 17258

Severity: normal Description: interval_erase fix Details: interval_erase calls update_maxhigh properly when child == NULL
 * Bugzilla: 22181

Severity: normal Description: Adding WIRE_ATTR attribute to LNET types Details: LST nodes on different platforms might not communicate well due to the lack of WIRE_ATTR attribute in some LNET structures traversing network. The patch fixes the problem by adding WIRE_ATTR where needed.
 * Bugzilla: 21945

Severity: normal Description: replace server_major_version with connect_flags for quota utils interoperability
 * Bugzilla: 22069

Severity: normal Description: do_div arguments not cross-platform compatible
 * Bugzilla: 22233

Severity: normal Description: fix error message in mds_mfd_close Details: Fix error messages in mds_mfd_close since it is now legitimate to have i_nlink = 1 for dirs in /PENDING.
 * Bugzilla: 22177

Severity: normal Description: "lfs df" does not print stats for all mountpoints Details: Print all mounted lustre filesystems with "lfs df"
 * Bugzilla: 22327

Severity: normal Description: debug_mb not correctly initialized on newer kernels (2.6.31) Details: Fixed the debug_mb initialization problem for kernel 2.6.31
 * Bugzilla: 21957

Severity: normal Description: support relative path in llapi_search_fsname Details: Use realpath to provide absolute pathname.
 * Bugzilla: 19919

Severity: normal Description: fix for truncated reply buffer Details: reply buffer could be referred by reply_in_callback after released
 * Bugzilla: 21486

Severity: normal Description: Add quiet -q option to lfs quota
 * Bugzilla: 22194

Severity: normal Description: hash MEs on RDMA portal Details: RDMA portal can have very long ME list on client side, which will trigger soft lockup because of long searching on list. Hash MEs on RDMA portal can resolve this problem.
 * Bugzilla: 21619

Severity: normal Description: udev rule to set /dev/obd perms 666 Details: Provide Udev rules file for Lustre, so that /dev/obd permissions are now 666.
 * Bugzilla: 21259

Severity: normal Description: lustre.lov error when backing up symlinks with extended attributes Details: Improved logic in ll_listxattr
 * Bugzilla: 22301

Severity: normal Description: properly handle null value for setattr -n lustre.lov Details: Running "setfattr -n trusted.lov ." causes a NULL dereference in ll_setxattr due to no checking if "value" is NULL. This command now resets to the default striping when executed against a directory.
 * Bugzilla: 22187

Severity: normal Description: skip statahead for NFSCLIENT
 * Bugzilla: 22319

Severity: normal Description: Kernel update for SLES9 2.6.5-7.322.
 * Bugzilla: 22352

Severity: normal Description: lfs quota output cleanup Details: Suppress standard output in error cases
 * Bugzilla: 22194

Severity: normal Description: llapi_uuid_match prints bogus error message on upgraded filesystem Details: 1. Increase the "lfs df" column width to handle TB sized devices cleanly 2. Allow matching OST names without trailing _UUID 3. Allow negating the "--obd" option to "lfs find" 4. Remove duplicate code in mntdf iterating over MDTs/OSTs. Handle errors
 * Bugzilla: 22235

Severity: normal Description: call sync instead of fsync on local cancel to reduce stack usage Details: sync_on_lock_cancel is needed for recovery when async journal is enabled, but we actually just need to make sure that metadata blocks have hit the journal, so doing a fs sync should be enough and should consume less stack (just create an empty handle and commmit it).
 * Bugzilla: 22241

Severity: normal Description: simplify client disconnect code on server side Details: This patch was reverted because we were chasing some regression. It is now safe to re-apply.
 * Bugzilla: 21686

Severity: normal Description: workaround patch Details: disable the per-thread data (current->journal_info) containing the lock info during I/O to work around the issue for short tem
 * Bugzilla: 22035

Severity: normal Description: Print a dash in empty lfs quota grace columns Details: Polish lfs quota output for easier processing with awk/sed
 * Bugzilla: 22194

Severity: normal Description: rq_invalid_rqset should be a bitfield
 * Bugzilla: 21938

Severity: normal Description: control DCACHE_LUSTRE_INVALID flag with MDS_INODELOCK_LOOKUP lock Details: "DCACHE_LUSTRE_INVALID" is controlled by "MDS_INODELOCK_LOOKUP" lock which is corresponding to "IT_LOOKUP", do not skip invalidate for other intent.
 * Bugzilla: 19933

Severity: normal Description: Cannot send after transport shutdown Details: Clear imp_vbr_failed flag upon eviction
 * Bugzilla: 20997

Severity: normal Description: use req->rq_set itself during recovery Details: during recovery, uses req->rq_set itself to replay the request instead of ptlrpcd_recovery_pc
 * Bugzilla: 21938

Severity: normal Description: introduce server major version for b1_8 and b2_0 quota utils interoperability
 * Bugzilla: 22069

Severity: normal Description: Use CFS_ALLOC_IO instead of _STD in llap_from_page_with_lockh Details: During an ll_readahead under ll_readpage, we have seen the the OBD_SLAB_ALLOC hang under ldlm_pools_shrink when trying to lock a page that is already locked by the readahead code. Using CFS_ALLOC_IO instead of CFS_ALLOC_STD will prevent ldlm_pools_shrink from actually freeing slab, so the call path that blocks indefinitely can never happen.
 * Bugzilla: 21983

Severity: normal Description: inc nlink by 2 instead of 1 in mds_orphan_add_link Details: Fix regression introduced by 19640. ext3_inc_count can reset nlink to 1 when the directory is indexed and inode->i_nlink == 2. Work around the problem by incrementing nlink by 2 instead of 1.
 * Bugzilla: 22177

Severity: normal Description: MDS operations hang when issued with lfs setstripe on a degraded OST Details: Change the locking order in mds_lookup
 * Bugzilla: 22095

Severity: normal Description: fix error with make rpms after configure --disable-tests Details: If one configures lustre with "--disable-tests" a subsequent "make rpms" will fail as it would still try to package up the lustre-tests RPM. Fixing this provided the opportunity to fix another wart, that being the subst'ing the configure arguments into the lustre.spec. Now they are passed as value with "--define 'configure_args ...'" when calling rpmbuild.
 * Bugzilla: 17258

Severity: normal Description: stop waitting for next replay transno if shutdown Details: if the system is shutting down, wake up service thread blocked to wait for next replay transno during recovery, then all the references held by queued requests can be dropped and device can be stopped.
 * Bugzilla: 21726

Severity: normal Description: return approximate block/inode usage when OSTs are down Details: Really return approximate block/inode usage when OSTs are down. The old version erroneously skipped oqctl copying on error which prevented this from working properly.
 * Bugzilla: 21816

Severity: normal Description: lov_merge_lvb) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed Details: Protect lli->lli_smd pointer updates with lli->lli_lock.
 * Bugzilla: 20989

Severity: normal Description: Avoid operating lustre-hash internal structures directly.
 * Bugzilla: 21815

Severity: normal Description: mount.lustre fails to pass some options to mount
 * Bugzilla: 22097

Severity: normal Description: set wait_recovery_complete MAX value to max recovery time estimated
 * Bugzilla: 18649

Severity: normal Description: make dist seems to exclude the "darwin" bits Details: Include all of the darwin bits in the distribution tarball created with make dist.
 * Bugzilla: 21380

Severity: normal Description: fix for double release of ibc_lock in o2iblnd Details: Re-acquire ibc_lock in kiblnd_post_tx_locked. Add extra reference to conn before calling kiblnd_post_tx_locked to avoid scenario when conn disappears inside kiblnd_post_tx_locked.
 * Bugzilla: 21911

Severity: normal Description: allow relative pathnames Details: This patch allows one to give relative pathnames to --with-linux and friends.
 * Bugzilla: 17952

Severity: normal Description: post landing cleanups Details: Remove generic find_linux_devel_paths - now that both the rhel5 and sles method files have their own particular version of this method, remove this hacky-trying-to-work-for-both versions from lbuild. Remove a block of what is now redundant code. Remove the comments from the target files describing what happened with this bug. Align the sles10 and sles11 target files: - include the rpmfix specifier in the sles10 file - remove the EXTRA_VERSION_DELIMETER from the sles10 file - change the TARGET_DELIMETER to FLAVOR_DELIMETER in the sles11 file - Some whitespace cleanups.
 * Bugzilla: 19336

Severity: normal Description: decrease the usage of memory on clients. Details: 1. On clients, recycle dentries and inodes unused. 2. Delete the code related to ll_deathrow(att 6215 in bug 1443). It is useless now.
 * Bugzilla: 20433

Severity: major Description: ext4 extent allocation is slower than in ext3 Details: Increase the default value of MB_DEFAULT_ORDER2_REQS to 8, enlarge ext4 preallocation table for 2048 4K blocks extents creation.
 * Bugzilla: 21137

Severity: normal Description: incorrect triggering of synchronous IO Details: The OSC can mistakenly fall back to synchronous IO when the max_dirty_mb limit is reached and no write requests have yet been issued. This can occur when the dirty pages are spread over many files all of which are below the optimal request size.
 * Bugzilla: 22074

Severity: normal Description: fix errant m4 "dnl" usage Details: Some dnl usage seems to have been causing some errors in the resulting configure script.
 * Bugzilla: 20383

Severity: normal Description: fix broken llobdstat and add a counter parameter Details: Need to make sure we limit the search for OBD stats files to the obdfilter subdirectory of "/proc/fs/lustre". Add a counter argument to limit the number of items returned when using the interval parameter. Fix lots of whitespace atrocities as well as better format some of the code.
 * Bugzilla: 21829

Severity: normal Description: PTLRPC_PAUSE_REQ checking should ignore PING.
 * Bugzilla: 13520

Severity: normal Description: Add $(PTHREAD_LIBS) to lctl and lfs build Details: $(PTHREAD_LIBS) is needed to compile lctl and lfs for BG/P
 * Bugzilla: 20355

Severity: normal Description: Optimize quota_ctl operations by sending requests in parallel Details: Based on a patch from Joseph Herring (LLNL). Send MDS->OST quota_ctl requests in parallel, do not resend. Compiled from two attachments in the ticket.
 * Bugzilla: 21919

Severity: normal Description: deadlock fix Details: start the transaction earlier in llog_lvfs_destroy to get transaction start and inode mutex lock nested properly.
 * Bugzilla: 18030

Severity: normal Description: workaround dd bus error Details: A buggy coreutils/gettext combination workaround. Suppressing dd xfer statistic makes dd do not call gnu gettext library and avoid crashing.
 * Bugzilla: 21264

Severity: minor Description: fix file ownerships in lustre-modules RPM Details: The files in the lustre-modules RPM were not being set with a correct owner and were therefore just using what was on the filesystem.
 * Bugzilla: 15057

Severity: normal Description: a small fix for "lfs osts" Details: Actually, we don't want to traverse the directory tree, so return a positive value from sem_init to terminate the traversal before it starts.
 * Bugzilla: 21665

Severity: normal Description: handle SLV==1 on client side Details: Initialize ldlm pool SLV to 0 on client side to handle SLV==1 obtained from server correctly
 * Bugzilla: 21882

Severity: normal Description: lru resize SLV can get stuck Details: calculate SLV with a greater precision to not lose small changes due to interger math truncation; round up SLV only if the amount of granted locks less than the limit to not get stuck with this SLV
 * Bugzilla: 21882

Severity: normal Description: prevent use of OFED source dir instead of headers Details: Try to determine if the user is pointing configure at the OFED source directory intead of the devel/headers directory and error out of configure if so and display an informative warning.
 * Bugzilla: 21666

Severity: normal Description: Ignore broken cancel_dirty_page in OFED 1.4.1 Details: OFED 1.4.1 had a broken implementation of cancel_dirty_page for SLES10. This patch detects that and ignores the function if found.
 * Bugzilla: 19553

Severity: normal Description: Get rid of the EXTRA_VERSION_DELIMETER shenanigans Details: We used to carry around a bunch of baggage in order to specify what kind of delimeter to put between the version and "extra version". The truth of the matter is that this should always be "-". This patch includes some support for a build system developer to force an uncached rebuild of all products.
 * Bugzilla: 19336

Severity: normal Description: (17914) ignore trailing -mdc when determining index number
 * Bugzilla: 21961

Severity: normal Description: avoid divide-by-zero in lprocfs_rd_import
 * Bugzilla: 21966

Severity: normal Description: use separate failover counter for each facet
 * Bugzilla: 21953

Severity: normal Description: call build_lqs only from generic_quota_on
 * Bugzilla: 21147

Severity: normal Description: "lfs check" is only allowed for root. Details: Code cleanup around obd_class_* functions and sanity test for non-root lfs check
 * Bugzilla: 21259

Severity: normal Description: Kernel update to OEL5.4 2.6.18-164.11.1.0.1.el5.
 * Bugzilla: 21632

Severity: normal Description: fail the request if its obd_device stopping Details: in ldlm_handle_enqueue, the request should be failed if its obd_device had been marked as "fail"(obd_fail=1), which will be set during umount.
 * Bugzilla: 21686

Severity: normal Description: lustre_hash_rehash_key should use lh_read_unlock Details: lh_read_lock is no-op if rehash is disabled, so we should use lh_read_unlock in this function. This should not have any consequence, but better to fix it.
 * Bugzilla: 21815

Severity: normal Description: move assertion under write lock
 * Bugzilla: 21815

Severity: normal Description: print more debug info in lustre_hash_exit when assertion fails
 * Bugzilla: 21815

Severity: normal Description: do not flag a request as rq_replay for non replayable imports
 * Bugzilla: 19405

Severity: normal Description: LBUG doesn't print stack trace on sles9 because show_stack not exported
 * Bugzilla: 21906

=Changes from v1.8.1.1 to v1.8.2= Support for networks:
 * socklnd - any kernel supported by Lustre™
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3, 1.4.1 and 1.4.2
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.10 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.42.8 (SLES 10)
 * 2.6.27.39-0.3.1 (SLES11, i686 & x84_64 only)
 * 2.6.18-164.11.1.el5 (RHEL 5)
 * 2.6.18-164.6.1.0.1.el5 (OEL 5)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.30 vanilla (kernel.org)

Recommended e2fsprogs version: 1.41.6.sun1

The async journal commit feature (bug 19128) and the cancel lock before replay feature (bug 16774) are disabled by default.

Severity: minor Description: should update lp_alive for non-router peers.
 * Bugzilla: 21459

Severity: enhancement Description: LNet router shuffler.
 * Bugzilla: 15332

Severity: enhancement Description: LNet fine grain routing support.
 * Bugzilla: 15332

Severity: normal Description: router checker stops working when system wall clock goes backward Details: use monotonic timing source instead of system wall clock time.
 * Bugzilla: 20171

Severity: enhancement Description: avoid asymmetrical router failures
 * Bugzilla: 18460

Severity: enhancement Description: multiple-instance support for kptllnd
 * Bugzilla: 19735

Severity: normal Description: ksocknal_close_conn_locked connection race Details: A race was possible when ksocknal_create_conn calls ksocknal_close_conn_locked for already closed conn.
 * Bugzilla: 20897

Severity: enhancement Description: port router pinger to userspace
 * Bugzilla: 13065

Severity: normal Description: kptllnd HELLO protocol deadlock Details: kptllnd HELLO protocol doesn't run to completion in finite time
 * Bugzilla: 17546

Severity: normal Description: LNet selftest fixes and enhancements
 * Bugzilla: 18075

Severity: enhancement Description: allow a test node to be a member of multiple test groups
 * Bugzilla: 19156

Severity: enhancement Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution Details: an update from the upstream developer Scott Atchley.
 * Bugzilla: 18654

Severity: enhancement Description: Update RHEL5.4 kernel to 2.6.18-164.11.1.el5 and OEL5.4 kernel to 2.6.18-164.11.1.0.1.el5.
 * Bugzilla: 21632

Severity: enhancement Description: Update SLES11 kernel to 2.6.27.39-0.3.1.
 * Bugzilla: 21511
 * Bugzilla: 19848

Severity: enhancement Description: Update supported SLES10 kernel to 2.6.16.60-0.42.8.
 * Bugzilla: 20758

Severity: enhancement Description: Update kernel to RHEL5.4 2.6.18-164.6.1.el5 and OEL5 2.6.18-164.6.1.0.1.el5(Both in-kernel OFED enabled).
 * Bugzilla: 20773

Severity: enhancement Description: Build kernels (RHEL5, OEL5 and SLES10/11) using the vendor's own kernel spec file.
 * Bugzilla: 16312

Severity: enhancement Description: Vanilla kernel 2.6.30 patchless client support.
 * Bugzilla: 19808

Severity: major Frequency: rare Description: bad entry in directory xxx: inode out of bounds Details: fix locking issue in the rename path which could race with any other operations updating the same directory.
 * Bugzilla: 20892

Severity: enhancement Description: Make watchdog timer messages to be more clear and descriptive.
 * Bugzilla: 20722

Severity: normal Description: cp -p command does not preserve the dates and timestamp Details: mtime could be spoiled by a write callback
 * Bugzilla: 21489

Severity: normal Description: Clear imp_force_reconnect correctly in ptlrpc_connect_interpret
 * Bugzilla: 21513

Severity: normal Description: Allow non-root access for "lfs check". Details: Added a check in obd_class_ioctl for OBD_IOC_PING_TARGET.
 * Bugzilla: 21259

Severity: enhancement Description: quotacheck performance/scaling issues Details: reduce quotacheck time on empty filesystem by skipping uninit group.
 * Bugzilla: 19763

Severity: enhancement Description: Enhancement for lfs(1) command to use numeric uid/gid.
 * Bugzilla: 20200

Severity: enhancement Description: Adjust locks' extents on their first enqueue, so that at the time they get granted, there is no need for another pass through the queues since they are already shaped into the proper forms.
 * Bugzilla: 19325

Severity: normal Description: Fix mds_shrink_intent_reply/mds_intent_policy to pass correct arguments and prevent LBUG in lustre_shrink_reply_v2.
 * Bugzilla: 20020

Severity: normal Description: Change tunefs.lustre and mkfs.lustre --mountfsoptions so that exactly the specified mount options are used. Leaving off any "mandatory" mount options is an error. Leaving off any default mount options causes a warning, but is allowed. Change errors=remount-ro from mandatory to default. Sanitize the mount string before storing it. Update man pages accordingly.
 * Bugzilla: 19689

Severity: normal Description: mds_getattr should return 0, even if mds_fid2entry fails with -ENOENT. Also fix in ptlrpc_expire_one_request to print signed time difference.
 * Bugzilla: 20302

Severity: enhancement Description: Remove set_info(KEY_UNLINKED) from MDS/OSC
 * Bugzilla: 19662

Severity: enhancement Description: Clients can replay thousands of unused locks during recovery Details: Don't replay unused locks (only read locks for now) during recovery. This feature is disabled by default and can be enabled by running the following command on the clients: lctl get_param ldlm.cancel_unused_locks_before_replay
 * Bugzilla: 16774

Severity: normal Description: can't stat file in some situation. Details: improve initialize osc date when target is added to mds and ability to resend too big getattr request is client isn't have info about ost.
 * Bugzilla: 19526

Severity: normal Description: Prevent inconsistences between linux and lustre mount structures. Details: Wait indefinitely in server_wait_finished until mnt_count drops. Make the sleep interruptible.
 * Bugzilla: 19566

Severity: enhancement Description: Communicate OST degraded/readonly state via statfs to MDS Details: Flags in the statfs returned from OSTs indicate whether the OST is in a degraded RAID state, or if the filesystem has turned read-only after a filesystem error is detected.
 * Bugzilla: 18539

Severity: normal Frequency: rare Description: don't panic if EPROTO was hit when reading symlink Details: correctly handling request reference in error cases.
 * Bugzilla: 20122

Severity: normal Frequency: common Description: open sometimes returns ENOENT instead of EACCES Details: checking permission should be part of open part of mds_open, not lookup part. so server should be set DISP_OPEN_OPEN disposition before starting permission check. Also not need revalidate dentry if client already have LOOKUP lock.
 * Bugzilla: 17545

Severity: normal Frequency: on servers with multiple network interfaces Description: enable client interface failover Details: When a child reconnects from another NID, properly update export nid hash position and ldlm reverse import.
 * Bugzilla: 19854

Severity: enhancement Description: implemented direct I/O with arbitrary (nonaligned) memory addresses and file offsets.
 * Bugzilla: 18801

Severity: enhancement Description: added more recovery timeout options.
 * Bugzilla: 18948

Severity: enhancement Description: added llapi_file_open, llapi_file_create, llapi_file_get_stripe man pages.
 * Bugzilla: 16267

Severity: normal Frequency: only on systems with clients writing to an OST on the same node Description: Avoid deadlock for local client writes Details: Use new OBD_BRW_MEMALLOC flag to notify OST about writes in the memory freeing context. This allows OST threads to set the PF_MEMALLOC flag on task structures in order to allocate memory from reserved pools and complete IO. Use GFP_HIGHUSER for OST allocations for non-local client writes, so that the OST threads generate memory pressure and allow inactive pages to be reclaimed.
 * Bugzilla: 19529

Severity: normal Frequency: rare Description: lock ordering violation between &cli->cl_sem and _lprocfs_lock Details: .move ldlm namespace creation in setup phase to avoid grab _lprocfs_lock with cli_sem held
 * Bugzilla: 18380

Severity: normal Frequency: only during format of test systems Description: Unable to run several mkfs.lustre on loop devices at the same time Details: mkfs.lustre returns error 256 on the concurrent loop devices formatting. The solution is to proper handle the error.
 * Bugzilla: 18624

Severity: enhancement Description: implement async create (obd_async_create) method for osc, to avoid too long waiting new ost objects with holding ldlm lock.
 * Bugzilla: 18357

Severity: normal Frequency: occasionally during network problems Description: client not allowed to reconnect to OST because of active request Details: abort bulk requests received by the OST once the client has timed out since the client will resend the request anyway. The client also now retries to reconnect to the same server if a connect request failed with EBUSY or -EAGAIN.
 * Bugzilla: 18674

Severity: normal Frequency: rare, if used wide striped file and one ost in down. Description: don't return error if we created a subset of objects for file. Details: lov_update_create_set uses set->set_success as index for created objects, so if some requests failed, they will have hole at end of array and we can use qos_shrink_lsm for allocate correct lsm.
 * Bugzilla: 18382

Severity: normal Description: Slow stale export processing during normal start up Details: The global mgc lock prevents OST setup to be run in parallel. Replace the global lock with a per-config_llog_data semaphore.
 * Bugzilla: 20978

Severity: normal Description: Out or order replies might be lost on replay Details: In ptlrpc_retain_replayable_request if we cannot find retained request with tid smaller then one currently being added, add it to the start, not end of the list.
 * Bugzilla: 19128

Severity: normal Description: BUG: soft lockup - CPU#1 stuck for 10s! [ll_mdt_07:4523] Details: add cond_resched calls to avoid hogging the cpu for too long in the hash code. Make also lustre_hash_for_each_empty more efficient.
 * Bugzilla: 19557

Severity: enhancement Description: Performance improvements for debug messages with D_RPCTRACE, D_LDLM, D_QUOTA options.
 * Bugzilla: 17682

Severity: normal Frequency: only with NFS export Description: (lov_merge.c:74:lov_merge_lvb) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed (SR 71691004) Details: Fix a race in the nfs export code by populating inode info while the new inode is still locked
 * Bugzilla: 20989

Severity: enhancement Description: add a new file in procfs called force_lbug. Writting to this ile triggers a LBUG. Only for test purpose.
 * Bugzilla: 11680

Severity: normal Description: OOM killer causes node hang Details: really interrupt the sleep in osc_enter_cache on signals
 * Bugzilla: 18213

Severity: normal Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion) LBUG Details: fix race during quota release on the slave.
 * Bugzilla: 18630

Severity: enhancement Description: smaller hash bucket sizes, cleanups Details: increase hash table sizes and enabled rehashing for pools, uuid, nid & per-nid stats.
 * Bugzilla: 18690

Severity: enhancement Description: Add ldiskfs maxdirsize mount option Details: add max_dir size mount option
 * Bugzilla: 19673

Severity: normal Description: panic in ll_statahead_thread Details: prevent parent thread to be killed before its child
 * Bugzilla: 20139

Severity: normal Frequency: only with 16TB device Description: unable to perform "mount -t lustre" of 16TB OST device Details: Mounting 16TB LUNs failed due to three bugs in mkfs.lustre.
 * Bugzilla: 20301

Severity: normal Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0) failed Details: unregistering should be zero if no RPC inflight.
 * Bugzilla: 20456

Severity: normal Description: hyperion: Oops during metabench Details: Correct the refcount of lov_request_set
 * Bugzilla: 20607

Severity: enhancement Description: Add mptlinux and nxge drivers to Lustre builds
 * Bugzilla: 20617

Severity: enhancement Description: Fix watchdog timer message to be more clear Details: Make watchdog timer messages more clear and descriptive.
 * Bugzilla: 20722

Severity: normal Description: LNET soft lockups in socknal_cd thread Details: don't hog CPU for active-connecting if another connd is accepting connecting-requst from the same peer
 * Bugzilla: 21396

Severity: normal Description: recovery-small test_17 hang Details: Land several AT improvements & fixes.
 * Bugzilla: 21411

Severity: normal Description: MDS panic and hanging client processes Details: Replace exp_ops_stats with exp_nid_stats->nid_stats
 * Bugzilla: 21420

Severity: normal Description: OSS stuck in recovery. Details: fix race during recovery. class_unlink_export, class_set_export_delayed and target_queue_last_replay_reply may race while increasing/decreasing obd_recoverable_clients and obd_delayed_clients, causing recovery to wait forever.
 * Bugzilla: 21471

Severity: enhancement Description: add cascading_rw.c to lustre/tests
 * Bugzilla: 21547

Severity: normal Description: filter_last_id NULL deref Details: lprocfs_filter_rd_last_id should check for the fully setup obd device, before proceeding further.
 * Bugzilla: 21565

Severity: enhancement Description: Loadgen improvements Details: stacksize and locking fixes for loadgen
 * Bugzilla: 21571

Severity: normal Description: Quiet CERROR("dirty %d > system dirty_max %d\n" Details: The atomic_read allowing the atomic_inc are not covered by a lock. Thus they may safely race and trip this CERROR unless we add in a small fudge factor (+1).
 * Bugzilla: 21656

Severity: enhancement Description: shrink_slab: nr=-9223362083340912175 Details: fix spurious message from shrink_slab reporing negative nr
 * Bugzilla: 21800

Severity: normal Description: Quiet bogus previously committed transno error Details: suppress the "server went back in time" error message which is always printed even in the common case after a client eviction
 * Bugzilla: 21681

Severity: enhancement Description: Parallel statfs calls result in client eviction Details: cache statfs data for 1s.
 * Bugzilla: 20065

Severity: normal Description: parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Details: fix serveral issues in pinger code causing clients not to ping servers for too long, resulting in evictions.
 * Bugzilla: 21574

Severity: normal Description: e2fsck should warn when MMP update interval is extended Details: print mmp_check_interval and make it possible to abort mount operation in case it takes too long.
 * Bugzilla: 21564

Severity: normal Description: mdsrate-create-large.sh, BUG: soft lockup - CPU#0 stuck for 10s! Details: fix bug in the RHEL5's jbd2 callback patch.
 * Bugzilla: 21595

Severity: normal Description: drop number of active requests when queued for recovery Details: Now that we take a reference on the original request instead of making a copy of it for recovery. We need to drop the number of active requests or the queued requests will prevent all request processing when they exceed (srv->srv_threads_running - 1).
 * Bugzilla: 21828

Severity: enhancement Description: refuse to invalidate operational quota files when they are in use Details: an attempt to invalidate operational quota files on the quota master is not actually permitted by VFS (returning -EPERM), but we should not depend on that and should return the error earlier.
 * Bugzilla: 21826

Severity: normal Description: Applications stuck in jbd2_log_wait_commit during exit Details: fix deadlock between kjournald2 trying to acquire the page lock owned by an ost_io thread waiting for journal commit.
 * Bugzilla: 21406

=Changes from v1.8.1 to v1.8.1.1= Support for networks:
 * socklnd - any kernel supported by Lustre™
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.1 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.42.4 (SLES 10)
 * 2.6.27.29-0.1 (SLES11, i686 & x84_64 only)
 * 2.6.18-128.7.1.el5 (RHEL 5)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.27 vanilla (kernel.org)

Recommended e2fsprogs version: 1.41.6.sun1

File join has been disabled in this release, refer to bugzilla 16929

NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre file system with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630

ext4 support for RHEL5 is experimental and thus should not be used in production.

Severity: enhancement Description: Add OEL5 support.
 * Bugzilla: 20539

Severity: enhancement Description: Update kernel to SLES11 2.6.27.29-0.1.
 * Bugzilla: 19848

Severity: major Description: File checksum failures with OST read cache on Details: Disable page poisoning when the bulk transfer has to be aborted because the client got evicted.
 * Bugzilla: 20560

Severity: normal Description: Don't allow make backward step on assiging osc next id. Details: race between allocation next id and ll_sync thread can be cause of set wrong osc next id and can be kill valid ost objects.
 * Bugzilla: 19557

Severity: enhancement Description: Update kernel to RHEL5 2.6.18-128.7.1.el5.
 * Bugzilla: 20400

Severity: enhancement Description: Update kernel to SLES10 SP2 2.6.16.60-0.42.4.
 * Bugzilla: 20758

Severity: normal Description: Changes in raid5-large-io-rhel5.patch to calculate sectors properly
 * Bugzilla: 20533

Severity: normal Description: Increase the default BLK_DEF_MAX_SECTORS value for RHEL5 and SLES11
 * Bugzilla: 20533

Severity: normal Description: Do not send statfs requests to OSTs disabled by administrator. Details: Check in lov_prep_statfs_set for non-NULL ltd_exp.
 * Bugzilla: 20482

Severity: normal Description: Error handling in osc_statfs_interpret has been improved. Details: Check in osc_statfs_interpret for EBADR.
 * Bugzilla: 20482

Severity: normal Description: Do not update ctime for the deleted inode. Details: Check in mds_reint_unlink before calling fsfilt_setattr.
 * Bugzilla: 20146

Severity: normal Description: Increase of the size of the LDLM resource hash. Details: Bump up RES_HASH_BITS=12.
 * Bugzilla: 20146

Severity: normal Description: correctly send lsm on open replay Details: MDS is trust to LSM size on replay open, but client can set wrong size of lsm buffer.
 * Bugzilla: 19934

Severity: normal Description: Deadlock between filter_destroy and filter_commitrw_write. Details: filter_destroy does not hold the DLM lock over the whole operation. If the DLM lock is dropped, filter_commitrw can go through, causing the deadlock between page lock and i_mutex. The i_alloc_sem should also be hold in filter_destroy while truncating the file.
 * Bugzilla: 20321

Severity: normal Description: truncate starts GFP_FS allocation under transaction causing deadlock Details: ldiskfs_truncate calls grab_cache_page which may start page allocation under an open transaction. This may lead to calling prune_icache with consequent lustre reentrance.
 * Bugzilla: 20008

Severity: normal Frequency: only when down/upgrading the MDS to 1.6/1.8 while 1.8 clients are still up and when the OST pool feature is used Description: interop testing got LBUG when run dd with OST pool :LustreError: 30032:0:(llite_lib.c:1913:ll_replace_lsm) LBUG Details: down/upgrading the MDS to a version that doesn't/does support OST pool can cause clients to crash because the lsm has changed behind their back.
 * Bugzilla: 20318

Severity: normal Description: missing tree_status on 1.8.1 RPM build Details: make rpms failed due because the tree_status file is missing.
 * Bugzilla: 20550

Severity: normal Description: continuing LustreError "mds adjust qunit failed!" Details: don't print message on the console when ->adjust_qunit fails.
 * Bugzilla: 19551

Severity: normal Description: don't increase ldlm timeout if previous client was evicted Details: if a client doesn't respond to a blocking callback within the adaptive ldlm enqueue timeout, don't adjust the adaptive estimate when the lock is next granted.
 * Bugzilla: 18618

Severity: normal Description: ost is being unmounted w/o all writes to last_rcvd landing on disk. affects recovery negatively. Details: make sure all exports have been properly destroyed by the zombie thread processed before stopping the target.
 * Bugzilla: 20518

Severity: normal Description: Performance degradation with O_DIRECT between 1.6 & 1.8.1 b190 Details: disable write barrier for ext4/SLES11.
 * Bugzilla: 20205

Severity: normal Description: Kernel panic - not syncing: Out of memory and no killable processes... on OSS when iozone Details: fix memory leak in the journal checksum patch.
 * Bugzilla: 18571

Severity: normal Description: group quota "too many blocks" OSS crashes Details: we should keep the same uid/gid for lquota_chkquota and             lquota_pending_commit
 * Bugzilla: 18793

Severity: normal Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion) LBUG Details: don't LBUG on release quota error. Just a workaround until the problem is understood.
 * Bugzilla: 18630

=Changes from v1.8.0.1 to v1.8.1= Support for networks:
 * socklnd - any kernel supported by Lustre
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.1 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.39.3 (SLES 10)
 * 2.6.27.23-0.1 (SLES11, i686 & x84_64 only)
 * 2.6.18-128.1.14.el5 (RHEL 5)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.27 vanilla (kernel.org)

Recommended e2fsprogs version: 1.41.6.sun1

File join has been disabled in this release, refer to bugzilla 16929

NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630

ext4 support for RHEL5 is experimental and thus should not be used in production.

Severity: normal Description: router_proc.c is rewritten to use sysctl-interface for parameters residing in /proc/sys/lnet
 * Bugzilla: 18102

Severity: normal Description: LNet selftest fixes and enhancements
 * Bugzilla: 18075

Severity: enhancement Description: MXLND: eliminate hosts file, use arp for peer nic_id resolution Details: an update from the upstream developer Scott Atchley.
 * Bugzilla: 18654

Severity: enhancement Description: add a new LND optiion to control peer buffer credits on routers
 * Bugzilla: 15332

Severity: normal Description: Fixing deadlock in usocklnd Details: A deadlock was possible in usocklnd due to race condition while tearing connection down. The problem resulted from erroneous assumption that lnet_finalize could have been called holding some lnd-level locks.
 * Bugzilla: 18844

Severity: major Description: Protocol V2 of o2iblnd Details: o2iblnd V2 has several new features:
 * Bugzilla: 13621
 * Bugzilla: 15983
 * map-on-demand: map-on-demand is disabled by default, it can be enabled by using modparam "map_on_demand=@value@", @value@ should >= 0 and < 256, 0 will disable map-on-demand, any other valid value will enable map-on-demand.
 * Oi2blnd will create FMR or physical MR for RDMA if fragments of RD > @value@.
 * Enable map-on-demand will take less memory for new connection, but a little more CPU for RDMA.
 * iWARP : to support iWARP, please enable map-on-demand, 32 and 64 are recommanded value. iWARP will probably fail for value >=128.
 * OOB NOOP message: to resolve deadlock on router.
 * tunable peer_credits_hiw: (high water to return credits), default value of peer_credits_hiw equals to (peer_credits -1), user can change it between peer_credits/2 and (peer_credits - 1).  Lower value is recommended for high latency network.
 * tunable message queue size: it always equals to peer_credits, higher value is recommended for high latency network.
 * It's compatible with earlier version of o2iblnd

Severity: normal Description: Fixing 'running out of ports' issue Details: Add a delay before next reconnect attempt in ksocklnd in the case of lost race. Limit the frequency of query-requests in lnet. Improved handling of 'dead peer' notifications in lnet.
 * Bugzilla: 18414

Severity: normal Description: Change ptllnd timeout and watchdog timers Details: Add ptltrace_on_nal_failed and bump ptllnd timeout to match Portals wire timeout.
 * Bugzilla: 16034

Severity: normal Description: One down Lustre FS hangs ALL mounted Lustre filesystems Details: Shared routing enhancements - peer health detection.
 * Bugzilla: 16186

Severity: minor Description: IB path MTU mistakenly set to 1st path MTU when ib_mtu is off Details: See comment 46 in bug 11245 for details - it's indeed a bug introduced by the original 11245 fix.
 * Bugzilla: 11245

Severity: minor Description: uptllnd credit overflow fix Details: kptl_msg_t::ptlm_credits could be overflown by uptllnd since it is only a __u8.
 * Bugzilla: 15984

Severity: major Description: socklnd protocol version 3 Details: With current protocol V2, connections on router can be blocked and can't receive any incoming messages when there is no more router buffer, so ZC-ACK can't be handled (LNet message can't be finalized) and will cause deadlock on router. Protocol V3 has a dedicated connection for emergency messages like ZC-ACK to router, messages on this dedicated connection don't need any credit so will never be blocked. Also, V3 can send keepalive ping in specified period for router healthy checking.
 * Bugzilla: 14634

Severity: minor Frequency: in recovery Description: don't mix llog inodes with normal. Details: allocate inodes for log in last inode group
 * Bugzilla: 18192

Severity: normal Description: Deadlock between filter_destroy and filter_commitrw_write. Details: filter_destroy does not hold the DLM lock over the wholeoperation. If the DLM lock is dropped, filter_commitrw can gothrough, causing the deadlock between page lock and i_mutex.
 * Bugzilla: 20321

Severity: enhancement Description: Description: Update
 * Bugzilla: 19847

Severity: normal Frequency: with 1.8 server and 1.6 clients Description: correctly shrink reply for avoid send too big message to client. Details: 1.8 mds is allocate to big buffer to LOV EA data and this produce some problems with sending this reply to 1.6 client.
 * Bugzilla: 20020

Severity: normal Description: Repeated atomic allocation failures. Details: Use GFP_HIGHUSER | __GFP_NOMEMALLOC flags for memory allocations to generate memory pressure and allow reclaiming of inactive pages. At the same time, do not allow to exhaust emergency pools. For local clients the use of GFP_NOFS will be introduced in 1.8.2
 * Bugzilla: 19917

Severity: enhancement Description: Update kernel to RHEL5 2.6.18-128.1.14.el5.
 * Bugzilla: 19846
 * Bugzilla: 18289

Severity: enhancement Description: Add support for SLES11 2.6.27.23-0.1.
 * Bugzilla: 19625
 * Bugzilla: 16893
 * Bugzilla: 18668
 * Bugzilla: 19848

Severity: enhancement Description: Update client support to vanila kernels up to 2.6.27.
 * Bugzilla: 14250

Severity: enhancement Description: Update kernel to SLES10 SP2 2.6.16.60-0.37.
 * Bugzilla: 19212

Severity: enhancement Description: Compile with -Werror by default for i686 and x86_64.
 * Bugzilla: 15981

Severity: normal Description: resolve race between obd_disconnect and class_disconnect_exports Details: if obd_disconnect will be called to already disconnected export he forget release one reference and osc module can't unloaded.
 * Bugzilla: 19528

Severity: enhancement Description: move AT tunable parameters for more consistent usage Details: add AT tunables under /proc/sys/lustre, add to conf_param parsing
 * Bugzilla: 19293

Severity: normal Description: correctly skip time estimate if in recovery Details: rq_send_state insn't bitmask so using bitwise ops is forbid.
 * Bugzilla: 19223

Severity: normal Description: OSS DeadLock Details: Use trylock to prevent deadlock when shrink icache.
 * Bugzilla: 18399

Severity: enhancement Description: Allow tuning service thread via /proc Details: For each service a new /proc/fs/lustre/{service}/*/thread_{min,max,started} entry is created that can be used to set min/max thread counts, and get the current number of running threads.
 * Bugzilla: 18688

Severity: enhancement Description: Add state history info file, enhance import info file Details: Track import connection state changes in a new osc/mdc proc file; add overview-type data to the osc/mdc import proc file.
 * Bugzilla: 18798

Severity: normal Description: Reduce small size read RPC Details: Set read-ahead limite for every file and only do read-ahead when available read-ahead pages are bigger than 1M to avoid small size read RPC.
 * Bugzilla: 18645

Severity: normal Description: free_entry erroneously used groups_free instead of put_group_info
 * Bugzilla: 18204

Severity: enhancement Description: Make read-ahead stripe size aligned.
 * Bugzilla: 17817

Severity: enhancement Description: MDS create should not wait for statfs RPC while holding DLM lock.
 * Bugzilla: 17536

Severity: normal Frequency: rare, connect and disconnect target at same time Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0 Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
 * Bugzilla: 17310

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: ll_sync thread stay in waiting mds<>ost recovery finished Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
 * Bugzilla: 16839

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: aborting recovery hang on MDS Details: don't throttle destroy RPCs for the MDT.
 * Bugzilla: 18049

Severity: low Description: Slow reads beyond 8Tb offsets. Details: Page index integer overflow in ll_read_ahead_page
 * Bugzilla: 18016

Severity: normal Description: MSG_CONNECT_INITIAL is not set on the initial MDS->OST connect. Details: MSG_CONNECT_INITIAL is not set on the initial MDS->OST connect. As a conseqence, the patch from bug 18224 is not operational and the MDS export cannot be reused on the OSTs until it gets evicted.
 * Bugzilla: 18304

Severity: major Frequency: rare, only if using MMP with Linux RAID Description: MMP doesn't work with Linux RAID Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
 * Bugzilla: 17895

Severity: minor Frequency: rare, during recovery Description: Assertion failure in ldlm_lock_put Details: Do not put cancelled locks into replay list, hold references on locks in replay list
 * Bugzilla: 17895

Severity: normal Description: 1.6.5 mdsrate performance is slower than 1.4.11/12 (MDS is not cpu bound!) Details: create_count always drops to the min value (=32) because grow_count is being changed before the precreate RPC completes.
 * Bugzilla: 18577

Severity: normal Frequency: Only in RHEL5 when mounting multiple ext3 filesystems simultaneously Description: kmem_cache_create: duplicate cache jbd_4k" error message Details: add proper locking for creation of jbd_4k slab cache
 * Bugzilla: 19184

Severity: normal Description: MMP check in ext3_remount fails without displaying any error Details: When multiple mount protection fails during remount, proper error should be returned
 * Bugzilla: 19058

Severity: Low Description: Rare Client crash on resend if the file was deleted. Details: When file is opened, but open reply is lost and file is subsequently deleted before resend, resend processing logic breaks trying to open the file again, should not try to open.
 * Bugzilla: 15010

Severity: high Description: add check for >8TB ldiskfs filesystems Details: ext3-based ldiskfs does not support greater than 8TB LUNs. Don't allow >8TB ldiskfs filesystems to be mounted without force_over_8tb mount option
 * Bugzilla: 17569

Severity: normal Description: Client locked up when running multiple instances of an app. on multiple mount points Details: ll_shrink_cache can sleep while holding the ll_sb_lock. Convert ll_sb_lock to a read/write semaphore to fix the problem.
 * Bugzilla: 20011

Severity: normal Description: Cannot acces an NFS-mounted Lustre filesystem Details: An NFS client cannot access the Lustre filesystem NFS-mounted from a Lustre-client exporting the Lustre filesystem via NFS.
 * Bugzilla: 19559

Severity: normal Description: panic in ll_statahead_thread Details: grab dentry reference in parent process.
 * Bugzilla: 20139

=Changes from v1.8.0 to v1.8.0.1= Support for networks:
 * socklnd - any kernel supported by Lustre
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, 1.3 and 1.4.1
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.1 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.37 (SLES 10)
 * 2.6.18-128.1.6.el5 (RHEL 5)
 * 2.6.22.14 vanilla (kernel.org)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.22 vanilla (kernel.org)

Recommended e2fsprogs version: 1.40.11-sun1

File join has been disabled in this release, refer to bugzilla 16929

A new Lustre ADIO driver is available for MPICH2-1.0.7.

NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630

Severity: major Description: Handle new CM events in OFED 1.4
 * Bugzilla: 19520

Severity: enhancement Description: Update OFED release to 1.4.1 RC4
 * Bugzilla: 17671

Severity: enhancement Description: Update kernel to SLES10 SP2 2.6.16.60-0.37.
 * Bugzilla: 19212

Severity: enhancement Description: Update to RHEL5.3 kernel-2.6.18-128.1.6.el5.
 * Bugzilla: 19024

Severity: enhancement Description: Add support for OFED 1.4.1.
 * Bugzilla: 17671

Severity: enhancement Description: build ofed 1.4.1 with mlx4_en (Mellanox ConnectX drivers in 10GbE mode) enabled
 * Bugzilla: 19731

Severity: major (SLES10/OFED 1.4.1 only) Description: BUG: soft lockup - CPU#7 stuck for 10s! [ll_imp_inval:18451] Details: ll_imp_inval can sleep on waiting for a semaphore while holding a spinlock. Convert lco_lock to a semaphore to address the problem.
 * Bugzilla: 19553

Severity: major, only with big OST Description: Very poor metadata performance on Infiniband lustre configuration Details: OST object precreation becomes very slow on big OSTs. This is due to the ialloc patch spending too much time scanning groups.
 * Bugzilla: 18518

Severity: normal Frequency: during recovery Description: don't mix llog inodes with normal. Details: allocate inodes for log in last inode group
 * Bugzilla: 18192

Severity: major Frequency: rare Description: fix lqs' reference which won't be put in some situations Details: This patch fixes: 1. In quota_check_common, this function will check quota for user and group, but only send one return via "pending". In most cases, the pendings should be same. But that is not always the case. 2. If quotaoff runs between lquota_chkquota and lquota_pending_commit, the same thing will happen too. That is why it comes: -       if (!ll_sb_any_quota_active(qctxt->lqc_sb)) -               RETURN(0);
 * Bugzilla: 19495

Severity: enhancement Description: improve lctl set/get_param Details: handle the bad options, support more than one arguments, add '-F' option to append the indicator to the parameters.
 * Bugzilla: 18775

=Changes from v1.6.7.1 to v1.8.0= Support for networks:
 * socklnd - any kernel supported by Lustre
 * qswlnd - Qsnet kernel modules 5.20 and later
 * openiblnd - IbGold 1.8.2
 * o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3.1
 * viblnd - Voltaire ibhost 3.4.5 and later
 * ciblnd - Topspin 3.2.0
 * iiblnd - Infiniserv 3.3 + PathBits patch
 * gmlnd - GM 2.1.22 and later
 * mxlnd - MX 1.2.1 or later
 * ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
 * 2.6.16.60-0.31 (SLES 10)
 * 2.6.18-92.1.17.el5 (RHEL 5)
 * 2.6.22.14 vanilla (kernel.org)

Client support for unpatched kernels: (see Patchless Client)
 * 2.6.16 - 2.6.22 vanilla (kernel.org)

Recommended e2fsprogs version: 1.40.11-sun1

File join has been disabled in this release, refer to bugzilla 16929

A new Lustre ADIO driver is available for MPICH2-1.0.7.

NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630

Severity: minor Description: minor fixes and cleanups Details: use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK. Initialize 'ix' variable in extents patch to stop compiler warning.
 * Bugzilla: 16114

Severity: feature Description: update FIEMAP ioctl to match upstream kernel version Details: the FIEMAP block-mapping ioctl had a prototype version in ldiskfs 3.0.7 but this release updates it to match the interface in the upstream kernel, with a new ioctl number.
 * Bugzilla: 17942

Severity: normal Frequency: only if MMP is active and detects filesystem is in use Description: if MMP startup fails, an oops is triggered Details: if ldiskfs mounting doesn't succeed the error handling doesn't clean up the MMP data correctly, causing an oops.
 * Bugzilla: 18173

Severity: enhancement Description: Caching OSS Details: introduce data caching on the OSS. The OSS now relies on the linux kernel page cache to keep recently accessed data in memory. It is worth noting that all write requests are still flushed synchronously as in lustre 1.6.
 * Bugzilla: 12182

Severity: enhancement Description: version based recovery Details: introduce finer grained recovery able to detect transaction dependencies and can deal with transaction gaps caused by clients failing at the same time as the server.
 * Bugzilla: 10609

Severity: enhancement Description: Enable adaptive timeouts by default Details: The Lustre timeout value in /proc/sys/lustre/timeout is now managed dynamically based on server load and should not need to be tuned manually based on cluster size. This allows Lustre to work under a wider variety of system sizes and loads, without unnecessarily causing lengthy recovery times.
 * Bugzilla: 3055

Severity: enhancement Description: Add OST Pools support Details: File striping can now be set to use an arbitrary pool of OSTs
 * Bugzilla: 15899

Severity: enhancement Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs Details: allow skip disconnected ost for send statfs request and hide error in this case.
 * Bugzilla: 17974

Severity: normal Frequency: rare, on llog test 6 Description: don't allow connect to already connected import Details: allowing connect to already connected import is hide connecting problem.
 * Bugzilla: 16839

Severity: normal Frequency: rare, connect and disconnect target at same time Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0 Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
 * Bugzilla: 17310

Severity: normal Frequency: rare, on failed llog setup Description: don't leak obd reference on failed llog setup Details: for failed llog setup - mgc forget call class_destroy_import for client import, move destroy import to more generic place.
 * Bugzilla: 18896

Severity: normal Frequency: rare Description: allow kill process which wait statahead result Details: for some reasons 'ls' can stick in waiting result from statahead, in this case need way for kill this process.
 * Bugzilla: 18902

Severity: normal Frequency: rare Description: don't lose wakeup for imp_recovery_waitq Details: recover_import_no_retry or invalidate_import and import_close can both sleep on imp_recovery_waitq, but we was send only one wakeup to sleep queue.
 * Bugzilla: 18154

Severity: normal Frequency: rare, at shutdown Description: panic at umount Details: llap_shrinker can be raced with killing super block from list and this produce panic with access to already freeded pointer
 * Bugzilla: 18773

Severity: normal Frequency: rare Description: panic in mds_open Details: don't confuse mds_finish_transno with PTR_ERR(-ENOENT)
 * Bugzilla: 18238

Severity: normal Frequency: rare Description: stuck in cache_remove_extent or panic with accessing to already freed look. Details: release lock refernce only after add page to pages list.
 * Bugzilla: 17972

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: ll_sync thread stay in waiting mds<>ost recovery finished Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
 * Bugzilla: 16839

Severity: normal Frequency: always with long access acl Description: mds can't pack reply with long acl. Details: mds don't control size of acl but they limited by reint/getattr reply buffer.
 * Bugzilla: 17636

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: aborting recovery hang on MDS Details: don't throttle destroy RPCs for the MDT.
 * Bugzilla: 18049

Severity: major Frequency: on remount Description: external journal device not working after the remount Details: clear dev_rdonly flag for external journal devices in blkdev_put
 * Bugzilla: 18018

Severity: minor Frequency: rare Description: shutdown vs evict race Details: client_disconnect_export vs connect request race. if client will evicted at this time - we start invalidate thread without referece to import and import can be freed at same time.
 * Bugzilla: 17802

Severity: minor Frequency: always Description: shrink LOV EAs before replying Details: correctly adjust LOV EA buffer for reply.
 * Bugzilla: 16693

Severity: normal Frequency: rare Description: don't skip ost target if they assigned to file Details: Drop slow OSCs if we can, but not for requested start idx. This means "if OSC is slow and it is not the requested start OST, then it can be skipped, otherwise skip it only if it is inactive/recovering/out-of-space.
 * Bugzilla: 16081

Severity: enhancement Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.
 * Bugzilla: 17201

Severity: enhancement Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.
 * Bugzilla: 17458

Severity: normal Frequency: rare, need acl's on inode. Description: client can't handle ost additional correctly Details: if ost was added after client connected to mds client can have hit lnet_try_match_md ... to big messages to wide striped files. in this case need teach client to handle config events about add lov target and update client max ea size at that event.
 * Bugzilla: 16492

Severity: normal Frequency: Create a symlink file with a very long name Description: ldlm_cancel_pack) ASSERTION(max >= dlm->lock_count + count) Details: If there is no extra space in the request for early cancels, ldlm_req_handles_avail returns 0 instead of a negative value.
 * Bugzilla: 16578

Severity: major Frequency: rare Description: mds is deadlocked Details: in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order.
 * Bugzilla: 16492

Severity: enhancement Description: Add /proc entry for import status Details: The mdc, osc, and mgc import directories now have an import directory that contains useful import data for debugging connection problems.
 * Bugzilla: 1819

Severity: enhancement Description: Re-disable certain /proc logging Details: Enable and disable client's offset_stats, extents_stats and extents_stats_per_process stats logging on the fly.
 * Bugzilla: 15966

Severity: major Frequency: Only on FC kernels 2.6.22+ Description: oops in statahead Details: Do not drop reference count for the dentry from VFS when lookup, VFS will do that by itself.
 * Bugzilla: 16303

Severity: enhancement Description: Generic /proc file permissions Details: Set /Proc file permissions in a more generic way to enable non-root users operate on some /proc files.
 * Bugzilla: 16643

Severity: major Description: Hitting mdc_commit_close ASSERTION Details: Properly handle request reference release in ll_release_openhandle.
 * Bugzilla: 16561

Severity: normal Description: only patchless client Details: add workaround for race between add/remove dentry from hash
 * Bugzilla: 15975

Severity: enhancement Description: Allow OST glimpses to return PW locks
 * Bugzilla: 16845

Severity: minor Description: LBUG when llog conf file is full Details: When llog bitmap is full, ENOSPC should be returned for plain log.
 * Bugzilla: 16717

Severity: normal Description: Prevent import from entering FULL state when server in recovery
 * Bugzilla: 16907

Severity: major Description: service mount cannot take device name with ":" Details: Only when device name contains ":/" will mount treat it as client mount.
 * Bugzilla: 16750

Severity: normal Frequency: rare Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback
 * Bugzilla: 15927

Severity: normal Frequency: on recovery Description: I/O failures after umount during fail back Details: if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks.
 * Bugzilla: 16611

Severity: normal Description: Kernel BUG tries to release flock Details: Lustre does not destroy flock lock before last reference goes away. So always drop flock locks when client is evicted and perform unlock regardless of successfulness of speaking to MDS.
 * Bugzilla: 15825

Severity: enhancement Description: Upcall on Lustre log has been dumped Details: Allow for a user mode script to be called once a Lustre log has been dumped. It passes the filename of the dumped log to the script, the location of the script can be specified via /proc/sys/lnet/debug_log_upcall.
 * Bugzilla: 16566

Severity: minor Frequency: rare Description: avoid messages about idr_remove called for id that is not allocated Details: Move assigment s_dev for clustered nfs to end of initialization, for avoid problem with error handling.
 * Bugzilla: 16583

Severity: minor Frequency: rare Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages Details: When connection is reused this not moved from CONN_UNUSED_HASH into CONN_USED_HASH and this prodice warning when put connection again in unused hash.
 * Bugzilla: 16109

Severity: normal Frequency: rare Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed Details: release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export.
 * Bugzilla: 15139

Severity: normal Description: more cleanup in mds_lov Details: add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer.
 * Bugzilla: 16679

Severity: enhancement Description: Export bytes_read/bytes_write count on OSC/OST.
 * Bugzilla: 16573

Severity: normal Description: Early reply size mismatch, MGC loses connection Details: Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so the connect flags are properly negotiated.
 * Bugzilla: 16237

Severity: normal Description: Properly propagate oinfo flags from lov to osc for statfs Details: restore missing copy oi_flags to lov requests.
 * Bugzilla: 16006

Severity: normal Description: exports in /proc are broken Details: recreate /proc entries for clients when they reconnect.
 * Bugzilla: 16317

Severity: enhancement Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8) Details: included man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
 * Bugzilla: 16581

Severity: enhancement Description: Implement lustre ll_show_options method.
 * Bugzilla: 16208

Severity: normal Description: exports in /proc are broken Details: recreate /proc entries for clients when they reconnect.
 * Bugzilla: 16317

Severity: normal Description: don't fail open with -ERANGE Details: if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA.
 * Bugzilla: 16080

Severity: normal Description: Resolve device initialization race Details: Prevent proc handler from accessing devices added to the obd_devs array but yet be intialized.
 * Bugzilla: 15576

Severity: enhancement Description: configure's --enable-quota should check the kernel .config for CONFIG_QUOTA Details: configure is terminated if --enable-quota is passed but no quota support is in kernel
 * Bugzilla: 16091

Severity: normal Frequency: rare, on PPC clients Description: don't swab ost objects in response about directory, because this not exist. Details: bug similar bug 14856, but in different function.
 * Bugzilla: 16318

Severity: enhancement Description: lfs quota tool enhancement Details: added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page.
 * Bugzilla: 15754

Severity: enhancement Description: *optional* service tags registration Details: if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more information about the Service Tags asset management system.
 * Bugzilla: 15625

Severity: normal Description: Client runs out of low memory Details: Consider only lowmem when counting initial number of llap pages
 * Bugzilla: 16037

Severity: normal Frequency: occasional Description: add refcount for osc callbacks, so avoid panic on shutdown
 * Bugzilla: 15210

Severity: normal Frequency: testing only Description: sanity test 65a fails if stripecount of -1 is set Details: handle -1 striping on filesystem in ll_dirstripe_verify
 * Bugzilla: 12653

Severity: normal Frequency: only in unusual configurations Description: Kernel panic with find ost index. Details: lov_obd have panic if some OST's have sparse indexes.
 * Bugzilla: 16014

Severity: major Frequency: rarely, if filesystem is mounted with -o flock Description: do not process already freed flock Details: flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast.
 * Bugzilla: 15924

Severity: normal Frequency: rarely, if filesystem is mounted with -o flock Description: LBUG during stress test Details: Need properly lock accesses the flock deadlock detection list.
 * Bugzilla: 14480

Severity: minor Frequency: rarely, if binaries are being run from Lustre Description: oops in page fault handler Details: kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
 * Bugzilla: 

Severity: minor Frequency: rarely, during shutdown Description: timeout with invalidate import. Details: ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be handled by ptlrpcd. This produce long age waiting and -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.
 * Bugzilla: 15716

Severity: normal Frequency: rarely Description: ASSERTION(CheckWriteback(page,cmd)) failed Details: badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion.
 * Bugzilla: 14742

Severity: normal Frequency: only with broken builds/installations Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions Details: just return an error to a user, put a console error message
 * Bugzilla: 15779

Severity: enhancement Description: enable MGS and MDT services start separately Details: add a 'nomgs' option in mount.lustre to enable start a MDT with a co-located MGS without starting the MGS, which is a complement to 'nosvc' mount option.
 * Bugzilla: 14134

Severity: normal Frequency: always, on big-endian systems Description: cleanup in ptlrpc code, related to PPC platform Details: store magic in native order avoid panic's in recovery on PPC node and forbid from this error in future. Also fix possibly of twice swab data. Fix get lov striping to userpace.
 * Bugzilla: 14856

Severity: normal Frequency: rarely, if replay get lost on server Description: server incorrectly drop resent replays lead to recovery failure. Details: do not drop replay according to msg flags, instead we check the per-export recovery request queue for duplication of transno.
 * Bugzilla: 15756

Severity: normal Frequency: after recovery Description: precreate to many object's after del orphan. Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 * Bugzilla: 14835

Severity: normal Frequency: after recovery Description: precreate to many object's after del orphan. Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 * Bugzilla: 14835

Severity: normal Frequency: rare, on clear nid stats Description: ASSERTION(client_stat->nid_exp_ref_count == 0) Details: when clean nid stats sometimes try destroy live entry, and this produce panic in free.
 * Bugzilla: 15139

Severity: major Frequency: occasionally since 1.6.4 Description: Stack overflow during MDS log replay Details: ease stack pressure by using a thread dealing llog_process.
 * Bugzilla: 15575

Severity: minor Frequency: very rare Description: MDT cannot be unmounted, reporting "Mount still busy" Details: Mountpoint references were being leaked during open reply reconstruction after an MDS restart. Drop mountpoint reference in reconstruct_open and free dentry reference also.
 * Bugzilla: 13380

Severity: normal Frequency: rare Description: wait until IO finished before start new when do lock cancel. Details: VM protocol want old IO finished before start new, in this case need wait until PG_writeback is cleared until check dirty flag and call writepages in lock cancel callback.
 * Bugzilla: 15443

Severity: normal Frequency: rare Description: mds_mfd_close ASSERTION(rc == 0) Details: In mds_mfd_close, we need protect inode's writecount change within its orphan write semaphore to prevent possible races.
 * Bugzilla: 12888

Severity: minor Frequency: rare, on shutdown ost Description: don't hit live lock with umount ost. Details: shrink_dcache_parent can be in long loop with destroy dentries, use shrink_dcache_sb instead.
 * Bugzilla: 14645

Severity: minor Frequency: only when echo_client is used Description: don't panic with use echo_client Details: echo client pass NULL as client nid pointer and this produce NULL pointer dereference.
 * Bugzilla: 14949

Severity: normal Frequency: Always on 32-bit PowerPC systems Description: fix build on PPC32 Details: compile code with -m64 flag produce wrong object file for PPC32.
 * Bugzilla: 15278

Severity: normal Frequency: rare Description: MDS LBUG: ASSERTION(!IS_ERR(dchild)) Details: In reconstruct_* functions, LASSERTs on both the data supplied by a client, and the data on disk are dangerous and incorrect. Change them with client eviction.
 * Bugzilla: 15574

Severity: enhancement Description: skiplist implementation simplification Details: skiplists are used to group compatible locks on granted list that was implemented as tracking first and last lock of each lock group the patch changes that to using doubly linked lists
 * Bugzilla: 15346

Severity: normal Description: delete compatibility for 32bit qdata Details: as planned, when lustre is beyond b1_8, lquota won't support 32bit qunit. That means servers of b1_4 and servers of b1_8 can't be used together if users want to use quota.
 * Bugzilla: 15933

Severity: normal Frequency: only with administrator action Description: mount failure if config log has invalid conf_param setting Details: If administrator specified an incorrect configuration parameter with "lctl conf_param" this would cause an error during future client mounts. Instead, ignore the bad configuration parameter.
 * Bugzilla: 14693

Severity: normal Frequency: blocks per group < blocksize*8 and uninit_groups is enabled Description: ldiskfs error: XXX blocks in bitmap, YYY in gd Details: If blocks per group is less than blocksize*8, set rest of the bitmap to 1.
 * Bugzilla: 15932

Severity: major Frequency: Application do stride read on lustre Description: The read performance will drop a lot if the application does stride read. Details: Because the stride_start_offset are missing in stride read-ahead, it will cause clients read a lot of unused pages in read-ahead, then the read-performance drops.
 * Bugzilla: 16172

Severity: normal Description: more ldlm soft lockups Details: In ldlm_resource_add_lock, call to ldlm_resource_dump starve other threads from the resource lock for a long time in case of long waiting queue, so change the debug level from D_OTHER to the less frequently used D_INFO.
 * Bugzilla: 15953

Severity: enhancement Description: add -gid, -group, -uid, -user options to lfs find
 * Bugzilla: 13128

Severity: enhancement Description: ll_recover_lost_found_objs - recover objects in lost+found Details: OST corruption and subsequent e2fsck can leave objects in the lost+found directory. Using the "ll_recover_lost_found_objs" tool, these objects can be retrieved and data can be salvaged by using the object ID saved in the fid EA on each object.
 * Bugzilla: 15284

Severity: minor Frequency: rare Description: this bug _only_ happens when inode quota limitation is very low (less than 12), so that inode quota unit is 1 at initialization. Details: if remaining quota equates 1, it is a sign to demonstate that quota is effective now. So least quota qunit should be 2.
 * Bugzilla: 15758

Severity: normal Description: Hung threads in invalidate_inode_pages2_range Details: The direct IO path doesn't call check_rpcs to submit a new RPC once one is completed. As a result, some RPCs are stuck in the queue and are never sent.
 * Bugzilla: 15950

Severity: normal Description: Procfs and llog threads access destoryed import sometimes. Details: Sync the import destoryed process with procfs and llog threads by the import refcount and semaphore.
 * Bugzilla: 15684

Severity: major Description: mds fails to respond, threads stuck in ldlm_completion_ast Details: Sort source/child resource pair after updating child resource.
 * Bugzilla: 15674

Severity: major Frequency: rare Description: kernel BUG at ldiskfs2_ext_new_extent_cb Details: If insertion of an extent fails, then discard the inode preallocation and free data blocks else it can lead to duplicate blocks.
 * Bugzilla: 16226

Severity: normal Description: don't always update ctime in ext3_xattr_set_handle Details: Current xattr code updates inode ctime in ext3_xattr_set_handle In some cases the ctime should not be updated, for example for 2.0->1.8 compatibility it is necessary to delete an xattr and it should not update the ctime.
 * Bugzilla: 16199

Severity: normal Description: add quota statistics Details: 1. sort out quota proc entries and proc code. 2. add quota statistics
 * Bugzilla: 15058

Severity: normal Frequency: often Description: quotas are not honored with O_DIRECT Details: all writes with the flag O_DIRECT will use grants which leads to this problem. Now using OBD_BRW_SYNC to guard this.
 * Bugzilla: 16125

Severity: major Frequency: rare Description: Assertion in iopen_connect_dentry in 1.6.3 Details: looking up an inode via iopen with the wrong generation number can populate the dcache with a disconneced dentry while the inode number is in the process of being reallocated. This causes an assertion failure in iopen since the inode's dentry list contains both a connected and disconnected dentry.
 * Bugzilla: 15713
 * Bugzilla: 16362

Severity: normal Description: assertion failure in ldlm_handle2lock Details: fix a race between class_handle_unhash and class_handle2object introduced in lustre 1.6.5 by bug 13622.
 * Bugzilla: 16496

Severity: enhancement Description: superblock lock contention with many SMP cores on one client Details: several client filesystem locks were highly contended on SMP NUMA systems with 8 or more cores. Per-CPU datastructure and more efficient locking implemented to reduce contention.
 * Bugzilla: 11817

Severity: minor Frequency: rare Description: Kernel BUG: sd_iostats_bump: unexpected disk index Details: remove the limit of 256 scsi disks in the sd_iostat patch
 * Bugzilla: 12755

Severity: minor Frequency: rare Description: oops in sd_iostats_seq_show Details: unloading/reloading the scsi low level driver triggers a kernel bug when trying to access the sd iostat file.
 * Bugzilla: 16494

Severity: major Frequency: rare Description: Kernel panics during QLogic driver reload Details: REQ_BLOCK_PC requests are not handled properly in the sd iostat patch, causing memory corruption.
 * Bugzilla: 16404

Severity: minor Frequency: rare Description: journal_dev option does not work in b1_6 Details: pass mount option during pre-mount.
 * Bugzilla: 16140

Severity: enhancement Frequency: Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs Details: FIEMAP ioctl will allow an application to efficiently fetch the extent information of a file. It can be used to map logical blocks in a file to physical blocks in the block device.
 * Bugzilla: 10555

Severity: normal Frequency: only with adaptive timeout enabled Description: DEBUG_REQ bad paging request Details: ptlrpc_at_recv_early_reply should not modify req->rq_repmsg because it can be accessed by reply_in_callback without the rq_lock held.
 * Bugzilla: 16972

Severity: normal Frequency: only on Cray X2 Description: X2 build failures Details: fix build failures on Cray X2.
 * Bugzilla: 16813

Severity: normal Description: xid & resent requests Details: Initialize RPC XID from clock at startup (randomly if clock is bad).
 * Bugzilla: 2066

Severity: major Description: quota recovery deadlock during mds failover Details: This patch includes att18982, att18236, att18237 in bz14840. Solve the problems: 1. fix osts hang when mds does failover with quotaon 2. prevent watchdog storm when osts threads wait for the recovery of mds
 * Bugzilla: 14840

Severity: normal Description: kernel panic on racer Details: Do not access dchild->d_inode when IS_ERR(dchild) is true.
 * Bugzilla: 16695

Severity: enhancement Description: Add lustre_start utility to start or stop multiple Lustre servers from a CSV file.
 * Bugzilla: 14095

Severity: major Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373} Details: In case of memory pressure, list_del can be called twice on req->rq_history_list, causing a kernel oops.
 * Bugzilla: 17024

Severity: normal Description: kptllnd_peer_check_sends) ASSERTION(!in_interrupt) failed Details: fix stack overflow in the distributed lock manager by defering export eviction after a failed ast to the elt thread instead of handling it in the dlm interpret routine.
 * Bugzilla: 17026

Severity: enhancement Description: More exported tunables for mballoc Details: Add support for tunable preallocation window and new tunables for large/small requests
 * Bugzilla: 12800

Severity: normal Description: Detect corruption of block bitmap and checking for preallocations Details: Checks validity of on-disk block bitmap. Also it does better checking of number of applied preallocations. When corruption is found, it turns filesystem readonly to prevent further corruptions.
 * Bugzilla: 16680

Severity: normal Frequency: only for big-endian servers Description: Check if big-endian system while mounting fs with extents feature Details: Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. Can be overridden with "bigendian_extents" mount option.
 * Bugzilla: 16438

Severity: normal Description: Excessive recovery window Details: With AT enabled, the recovery window can be excessively long (6000+ seconds). To address this problem, we no longer use OBD_RECOVERY_FACTOR when extending the recovery window (the connect timeout no longer depends on the service time, it is set to INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time.
 * Bugzilla: 16860

Severity: normal Description: Watchdog triggered on MDS failover Details: enable OBD_CONNECT_MDT flag when connecting from the MDS so that the OSTs know that the MDS "UUID" can be reused for the same export from a different NID, so we do not need to wait for the export to be evicted.
 * Bugzilla: 16522

Severity: enhancement Description: Don't sync journal after every i/o Details: Implement write RPC replay to allow server replies for write RPCs before data is on disk. However, this feature is disabled by default since some issues leading to data corruptions have been found during recovery (e.g. bug 19128). This feature can be enabled by running the following command on the OSSs: lctl set_param obdfilter.*.sync_journal=0
 * Bugzilla: 16919

Severity: low Description: Slow reads beyond 8Tb offsets. Details: Page index integer overflow in ll_read_ahead_page
 * Bugzilla: 18016

Severity: major Frequency: rare, only if using MMP with Linux RAID Description: MMP doesn't work with Linux RAID Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
 * Bugzilla: 17895

Severity: minor Frequency: rare, during recovery Description: Assertion failure in ldlm_lock_put Details: Do not put cancelled locks into replay list, hold references on locks in replay list
 * Bugzilla: 17895

Severity: critical Description: Lustre detected file system corruption with inode out of bounds Details: don't update i_size on MDS_CLOSE for directories. This causes directory corruptions on the MDT.
 * Bugzilla: 18695

Severity: normal Description: client doesn't try to reconnect Details: correctly skip time estimate if in recovery
 * Bugzilla: 19223