Change Log 1.8

=Changes from v1.6.7.1 to v1.8.0= Support for networks:  socklnd - any kernel supported by Lustre  qswlnd - Qsnet kernel modules 5.20 and later  openiblnd - IbGold 1.8.2  o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3.  viblnd - Voltaire ibhost 3.4.5 and later  ciblnd - Topspin 3.2.0  iiblnd - Infiniserv 3.3 + PathBits patch  gmlnd - GM 2.1.22 and later  mxlnd - MX 1.2.1 or later  ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:  2.6.16.60-0.31 (SLES 10)  2.6.18-92.1.17.el5 (RHEL 5)  2.6.22.14 vanilla (kernel.org)

Client support for unpatched kernels: (see Patchless_Client)  2.6.16 - 2.6.22 vanilla (kernel.org)

Recommended e2fsprogs version: 1.40.11-sun1

File join has been disabled in this release, refer to bugzilla 16929

A new Lustre ADIO driver is available for MPICH2-1.0.7.

'''NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630'''

Severity: minor Description: minor fixes and cleanups Details: use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK. Initialize 'ix' variable in extents patch to stop compiler warning.
 * Bugzilla: 16114

Severity: feature Description: update FIEMAP ioctl to match upstream kernel version Details: the FIEMAP block-mapping ioctl had a prototype version in ldiskfs 3.0.7 but this release updates it to match the interface in the upstream kernel, with a new ioctl number.
 * Bugzilla: 17942

Severity: normal Frequency: only if MMP is active and detects filesystem is in use Description: if MMP startup fails, an oops is triggered Details: if ldiskfs mounting doesn't succeed the error handling doesn't clean up the MMP data correctly, causing an oops.
 * Bugzilla: 18173

Severity: enhancement Description: Caching OSS Details: introduce data caching on the OSS. The OSS now relies on the linux kernel page cache to keep recently accessed data in memory. It is worth noting that all write requests are still flushed synchronously as in lustre 1.6.
 * Bugzilla: 12182

Severity: enhancement Description: version based recovery Details: introduce finer grained recovery able to detect transaction dependencies and can deal with transaction gaps caused by clients failing at the same time as the server.
 * Bugzilla: 10609

Severity: enhancement Description: Enable adaptive timeouts by default Details: The Lustre timeout value in /proc/sys/lustre/timeout is now managed dynamically based on server load and should not need to be tuned manually based on cluster size. This allows Lustre to work under a wider variety of system sizes and loads, without unnecessarily causing lengthy recovery times.
 * Bugzilla: 3055

Severity: enhancement Description: Add OST Pools support Details: File striping can now be set to use an arbitrary pool of OSTs
 * Bugzilla: 15899

Severity: enhancement Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs Details: allow skip disconnected ost for send statfs request and hide error in this case.
 * Bugzilla: 17974

Severity: normal Frequency: rare, on llog test 6 Description: don't allow connect to already connected import Details: allowing connect to already connected import is hide connecting problem.
 * Bugzilla: 16839

Severity: normal Frequency: rare, connect and disconnect target at same time Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0 Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
 * Bugzilla: 17310

Severity: normal Frequency: rare, on failed llog setup Description: don't leak obd reference on failed llog setup Details: for failed llog setup - mgc forget call class_destroy_import for client import, move destroy import to more generic place.
 * Bugzilla: 18896

Severity: normal Frequency: rare Description: allow kill process which wait statahead result Details: for some reasons 'ls' can stick in waiting result from statahead, in this case need way for kill this process.
 * Bugzilla: 18902

Severity: normal Frequency: rare Description: don't lose wakeup for imp_recovery_waitq Details: recover_import_no_retry or invalidate_import and import_close can both sleep on imp_recovery_waitq, but we was send only one wakeup to sleep queue.
 * Bugzilla: 18154

Severity: normal Frequency: rare, at shutdown Description: panic at umount Details: llap_shrinker can be raced with killing super block from list and this produce panic with access to already freeded pointer
 * Bugzilla: 18773

Severity: normal Frequency: rare Description: panic in mds_open Details: don't confuse mds_finish_transno with PTR_ERR(-ENOENT)
 * Bugzilla: 18238

Severity: normal Frequency: rare Description: stuck in cache_remove_extent or panic with accessing to already freed look. Details: release lock refernce only after add page to pages list.
 * Bugzilla: 17972

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: ll_sync thread stay in waiting mds<>ost recovery finished Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
 * Bugzilla: 16839

Severity: normal Frequency: always with long access acl Description: mds can't pack reply with long acl. Details: mds don't control size of acl but they limited by reint/getattr reply buffer.
 * Bugzilla: 17636

Severity: normal Frequency: start MDS on uncleanly shutdowned MDS device Description: aborting recovery hang on MDS Details: don't throttle destroy RPCs for the MDT.
 * Bugzilla: 18049

Severity: major Frequency: on remount Description: external journal device not working after the remount Details: clear dev_rdonly flag for external journal devices in blkdev_put
 * Bugzilla: 18018

Severity: minor Frequency: rare Description: shutdown vs evict race Details: client_disconnect_export vs connect request race. if client will evicted at this time - we start invalidate thread without referece to import and import can be freed at same time.
 * Bugzilla: 17802

Severity: minor Frequency: always Description: shrink LOV EAs before replying Details: correctly adjust LOV EA buffer for reply.
 * Bugzilla: 16693

Severity: normal Frequency: rare Description: don't skip ost target if they assigned to file Details: Drop slow OSCs if we can, but not for requested start idx. This means "if OSC is slow and it is not the requested start OST, then it can be skipped, otherwise skip it only if it is inactive/recovering/out-of-space.
 * Bugzilla: 16081

Severity: enhancement Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.
 * Bugzilla: 17201

Severity: enhancement Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.
 * Bugzilla: 17458

Severity: normal Frequency: rare, need acl's on inode. Description: client can't handle ost additional correctly Details: if ost was added after client connected to mds client can have hit lnet_try_match_md ... to big messages to wide striped files. in this case need teach client to handle config events about add lov target and update client max ea size at that event.
 * Bugzilla: 16492

Severity: normal Frequency: Create a symlink file with a very long name Description: ldlm_cancel_pack) ASSERTION(max >= dlm->lock_count + count) Details: If there is no extra space in the request for early cancels, ldlm_req_handles_avail returns 0 instead of a negative value.
 * Bugzilla: 16578

Severity: major Frequency: rare Description: mds is deadlocked Details: in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order.
 * Bugzilla: 16492

Severity: enhancement Description: Add /proc entry for import status Details: The mdc, osc, and mgc import directories now have an import directory that contains useful import data for debugging connection problems.
 * Bugzilla: 1819

Severity: enhancement Description: Re-disable certain /proc logging Details: Enable and disable client's offset_stats, extents_stats and extents_stats_per_process stats logging on the fly.
 * Bugzilla: 15966

Severity: major Frequency: Only on FC kernels 2.6.22+ Description: oops in statahead Details: Do not drop reference count for the dentry from VFS when lookup, VFS will do that by itself.
 * Bugzilla: 16303

Severity: enhancement Description: Generic /proc file permissions Details: Set /Proc file permissions in a more generic way to enable non-root users operate on some /proc files.
 * Bugzilla: 16643

Severity: major Description: Hitting mdc_commit_close ASSERTION Details: Properly handle request reference release in ll_release_openhandle.
 * Bugzilla: 16561

Severity: normal Description: only patchless client Details: add workaround for race between add/remove dentry from hash
 * Bugzilla: 15975

Severity: enhancement Description: Allow OST glimpses to return PW locks
 * Bugzilla: 16845

Severity: minor Description: LBUG when llog conf file is full Details: When llog bitmap is full, ENOSPC should be returned for plain log.
 * Bugzilla: 16717

Severity: normal Description: Prevent import from entering FULL state when server in recovery
 * Bugzilla: 16907

Severity: major Description: service mount cannot take device name with ":" Details: Only when device name contains ":/" will mount treat it as client mount.
 * Bugzilla: 16750

Severity: normal Frequency: rare Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback
 * Bugzilla: 15927

Severity: normal Frequency: on recovery Description: I/O failures after umount during fail back Details: if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks.
 * Bugzilla: 16611

Severity: normal Description: Kernel BUG tries to release flock Details: Lustre does not destroy flock lock before last reference goes away. So always drop flock locks when client is evicted and perform unlock regardless of successfulness of speaking to MDS.
 * Bugzilla: 15825

Severity: enhancement Description: Upcall on Lustre log has been dumped Details: Allow for a user mode script to be called once a Lustre log has been dumped. It passes the filename of the dumped log to the script, the location of the script can be specified via /proc/sys/lnet/debug_log_upcall.
 * Bugzilla: 16566

Severity: minor Frequency: rare Description: avoid messages about idr_remove called for id that is not allocated Details: Move assigment s_dev for clustered nfs to end of initialization, for avoid problem with error handling.
 * Bugzilla: 16583

Severity: minor Frequency: rare Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages Details: When connection is reused this not moved from CONN_UNUSED_HASH into CONN_USED_HASH and this prodice warning when put connection again in unused hash.
 * Bugzilla: 16109

Severity: normal Frequency: rare Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed Details: release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export.
 * Bugzilla: 15139

Severity: normal Description: more cleanup in mds_lov Details: add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer.
 * Bugzilla: 16679

Severity: enhancement Description: Export bytes_read/bytes_write count on OSC/OST.
 * Bugzilla: 16573

Severity: normal Description: Early reply size mismatch, MGC loses connection Details: Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so the connect flags are properly negotiated.
 * Bugzilla: 16237

Severity: normal Description: Properly propagate oinfo flags from lov to osc for statfs Details: restore missing copy oi_flags to lov requests.
 * Bugzilla: 16006

Severity: normal Description: exports in /proc are broken Details: recreate /proc entries for clients when they reconnect.
 * Bugzilla: 16317

Severity: enhancement Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8) Details: included man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
 * Bugzilla: 16581

Severity: enhancement Description: Implement lustre ll_show_options method.
 * Bugzilla: 16208

Severity: normal Description: exports in /proc are broken Details: recreate /proc entries for clients when they reconnect.
 * Bugzilla: 16317

Severity: normal Description: don't fail open with -ERANGE Details: if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA.
 * Bugzilla: 16080

Severity: normal Description: Resolve device initialization race Details: Prevent proc handler from accessing devices added to the obd_devs array but yet be intialized.
 * Bugzilla: 15576

Severity: enhancement Description: configure's --enable-quota should check the kernel .config for CONFIG_QUOTA Details: configure is terminated if --enable-quota is passed but no quota support is in kernel
 * Bugzilla: 16091

Severity: normal Frequency: rare, on PPC clients Description: don't swab ost objects in response about directory, because this not exist. Details: bug similar bug 14856, but in different function.
 * Bugzilla: 16318

Severity: enhancement Description: lfs quota tool enhancement Details: added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page.
 * Bugzilla: 15754

Severity: enhancement Description: *optional* service tags registration Details: if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more information about the Service Tags asset management system.
 * Bugzilla: 15625

Severity: normal Description: Client runs out of low memory Details: Consider only lowmem when counting initial number of llap pages
 * Bugzilla: 16037

Severity: normal Frequency: occasional Description: add refcount for osc callbacks, so avoid panic on shutdown
 * Bugzilla: 15210

Severity: normal Frequency: testing only Description: sanity test 65a fails if stripecount of -1 is set Details: handle -1 striping on filesystem in ll_dirstripe_verify
 * Bugzilla: 12653

Severity: normal Frequency: only in unusual configurations Description: Kernel panic with find ost index. Details: lov_obd have panic if some OST's have sparse indexes.
 * Bugzilla: 16014

Severity: major Frequency: rarely, if filesystem is mounted with -o flock Description: do not process already freed flock Details: flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast.
 * Bugzilla: 15924

Severity: normal Frequency: rarely, if filesystem is mounted with -o flock Description: LBUG during stress test Details: Need properly lock accesses the flock deadlock detection list.
 * Bugzilla: 14480

Severity: minor Frequency: rarely, if binaries are being run from Lustre Description: oops in page fault handler Details: kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
 * Bugzilla: 

Severity: minor Frequency: rarely, during shutdown Description: timeout with invalidate import. Details: ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be handled by ptlrpcd. This produce long age waiting and -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.
 * Bugzilla: 15716

Severity: normal Frequency: rarely Description: ASSERTION(CheckWriteback(page,cmd)) failed Details: badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion.
 * Bugzilla: 14742

Severity: normal Frequency: only with broken builds/installations Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions Details: just return an error to a user, put a console error message
 * Bugzilla: 15779

Severity: enhancement Description: enable MGS and MDT services start separately Details: add a 'nomgs' option in mount.lustre to enable start a MDT with a co-located MGS without starting the MGS, which is a complement to 'nosvc' mount option.
 * Bugzilla: 14134

Severity: normal Frequency: always, on big-endian systems Description: cleanup in ptlrpc code, related to PPC platform Details: store magic in native order avoid panic's in recovery on PPC node and forbid from this error in future. Also fix possibly of twice swab data. Fix get lov striping to userpace.
 * Bugzilla: 14856

Severity: normal Frequency: rarely, if replay get lost on server Description: server incorrectly drop resent replays lead to recovery failure. Details: do not drop replay according to msg flags, instead we check the per-export recovery request queue for duplication of transno.
 * Bugzilla: 15756

Severity: normal Frequency: after recovery Description: precreate to many object's after del orphan. Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 * Bugzilla: 14835

Severity: normal Frequency: after recovery Description: precreate to many object's after del orphan. Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 * Bugzilla: 14835

Severity: normal Frequency: rare, on clear nid stats Description: ASSERTION(client_stat->nid_exp_ref_count == 0) Details: when clean nid stats sometimes try destroy live entry, and this produce panic in free.
 * Bugzilla: 15139

Severity: major Frequency: occasionally since 1.6.4 Description: Stack overflow during MDS log replay Details: ease stack pressure by using a thread dealing llog_process.
 * Bugzilla: 15575

Severity: minor Frequency: very rare Description: MDT cannot be unmounted, reporting "Mount still busy" Details: Mountpoint references were being leaked during open reply reconstruction after an MDS restart. Drop mountpoint reference in reconstruct_open and free dentry reference also.
 * Bugzilla: 13380

Severity: normal Frequency: rare Description: wait until IO finished before start new when do lock cancel. Details: VM protocol want old IO finished before start new, in this case need wait until PG_writeback is cleared until check dirty flag and call writepages in lock cancel callback.
 * Bugzilla: 15443

Severity: normal Frequency: rare Description: mds_mfd_close ASSERTION(rc == 0) Details: In mds_mfd_close, we need protect inode's writecount change within its orphan write semaphore to prevent possible races.
 * Bugzilla: 12888

Severity: minor Frequency: rare, on shutdown ost Description: don't hit live lock with umount ost. Details: shrink_dcache_parent can be in long loop with destroy dentries, use shrink_dcache_sb instead.
 * Bugzilla: 14645

Severity: minor Frequency: only when echo_client is used Description: don't panic with use echo_client Details: echo client pass NULL as client nid pointer and this produce NULL pointer dereference.
 * Bugzilla: 14949

Severity: normal Frequency: Always on 32-bit PowerPC systems Description: fix build on PPC32 Details: compile code with -m64 flag produce wrong object file for PPC32.
 * Bugzilla: 15278

Severity: normal Frequency: rare Description: MDS LBUG: ASSERTION(!IS_ERR(dchild)) Details: In reconstruct_* functions, LASSERTs on both the data supplied by a client, and the data on disk are dangerous and incorrect. Change them with client eviction.
 * Bugzilla: 15574

Severity: enhancement Description: skiplist implementation simplification Details: skiplists are used to group compatible locks on granted list that was implemented as tracking first and last lock of each lock group the patch changes that to using doubly linked lists
 * Bugzilla: 15346

Severity: normal Description: delete compatibility for 32bit qdata Details: as planned, when lustre is beyond b1_8, lquota won't support 32bit qunit. That means servers of b1_4 and servers of b1_8 can't be used together if users want to use quota.
 * Bugzilla: 15933

Severity: normal Frequency: only with administrator action Description: mount failure if config log has invalid conf_param setting Details: If administrator specified an incorrect configuration parameter with "lctl conf_param" this would cause an error during future client mounts. Instead, ignore the bad configuration parameter.
 * Bugzilla: 14693

Severity: normal Frequency: blocks per group < blocksize*8 and uninit_groups is enabled Description: ldiskfs error: XXX blocks in bitmap, YYY in gd Details: If blocks per group is less than blocksize*8, set rest of the bitmap to 1.
 * Bugzilla: 15932

Severity: major Frequency: Application do stride read on lustre Description: The read performance will drop a lot if the application does stride read. Details: Because the stride_start_offset are missing in stride read-ahead, it will cause clients read a lot of unused pages in read-ahead, then the read-performance drops.
 * Bugzilla: 16172

Severity: normal Description: more ldlm soft lockups Details: In ldlm_resource_add_lock, call to ldlm_resource_dump starve other threads from the resource lock for a long time in case of long waiting queue, so change the debug level from D_OTHER to the less frequently used D_INFO.
 * Bugzilla: 15953

Severity: enhancement Description: add -gid, -group, -uid, -user options to lfs find
 * Bugzilla: 13128

Severity: enhancement Description: ll_recover_lost_found_objs - recover objects in lost+found Details: OST corruption and subsequent e2fsck can leave objects in the lost+found directory. Using the "ll_recover_lost_found_objs" tool, these objects can be retrieved and data can be salvaged by using the object ID saved in the fid EA on each object.
 * Bugzilla: 15284

Severity: minor Frequency: rare Description: this bug _only_ happens when inode quota limitation is very low (less than 12), so that inode quota unit is 1 at initialization. Details: if remaining quota equates 1, it is a sign to demonstate that quota is effective now. So least quota qunit should be 2.
 * Bugzilla: 15758

Severity: normal Description: Hung threads in invalidate_inode_pages2_range Details: The direct IO path doesn't call check_rpcs to submit a new RPC once one is completed. As a result, some RPCs are stuck in the queue and are never sent.
 * Bugzilla: 15950

Severity: normal Description: Procfs and llog threads access destoryed import sometimes. Details: Sync the import destoryed process with procfs and llog threads by the import refcount and semaphore.
 * Bugzilla: 15684

Severity: major Description: mds fails to respond, threads stuck in ldlm_completion_ast Details: Sort source/child resource pair after updating child resource.
 * Bugzilla: 15674

Severity: major Frequency: rare Description: kernel BUG at ldiskfs2_ext_new_extent_cb Details: If insertion of an extent fails, then discard the inode preallocation and free data blocks else it can lead to duplicate blocks.
 * Bugzilla: 16226

Severity: normal Description: don't always update ctime in ext3_xattr_set_handle Details: Current xattr code updates inode ctime in ext3_xattr_set_handle In some cases the ctime should not be updated, for example for 2.0->1.8 compatibility it is necessary to delete an xattr and it should not update the ctime.
 * Bugzilla: 16199

Severity: normal Description: add quota statistics Details: 1. sort out quota proc entries and proc code. 2. add quota statistics
 * Bugzilla: 15058

Severity: normal Frequency: often Description: quotas are not honored with O_DIRECT Details: all writes with the flag O_DIRECT will use grants which leads to this problem. Now using OBD_BRW_SYNC to guard this.
 * Bugzilla: 16125

Severity: major Frequency: rare Description: Assertion in iopen_connect_dentry in 1.6.3 Details: looking up an inode via iopen with the wrong generation number can populate the dcache with a disconneced dentry while the inode number is in the process of being reallocated. This causes an assertion failure in iopen since the inode's dentry list contains both a connected and disconnected dentry.
 * Bugzilla: 15713
 * Bugzilla: 16362

Severity: normal Description: assertion failure in ldlm_handle2lock Details: fix a race between class_handle_unhash and class_handle2object introduced in lustre 1.6.5 by bug 13622.
 * Bugzilla: 16496

Severity: enhancement Description: superblock lock contention with many SMP cores on one client Details: several client filesystem locks were highly contended on SMP NUMA systems with 8 or more cores. Per-CPU datastructure and more efficient locking implemented to reduce contention.
 * Bugzilla: 11817

Severity: minor Frequency: rare Description: Kernel BUG: sd_iostats_bump: unexpected disk index Details: remove the limit of 256 scsi disks in the sd_iostat patch
 * Bugzilla: 12755

Severity: minor Frequency: rare Description: oops in sd_iostats_seq_show Details: unloading/reloading the scsi low level driver triggers a kernel bug when trying to access the sd iostat file.
 * Bugzilla: 16494

Severity: major Frequency: rare Description: Kernel panics during QLogic driver reload Details: REQ_BLOCK_PC requests are not handled properly in the sd iostat patch, causing memory corruption.
 * Bugzilla: 16404

Severity: minor Frequency: rare Description: journal_dev option does not work in b1_6 Details: pass mount option during pre-mount.
 * Bugzilla: 16140

Severity: enhancement Frequency: Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs Details: FIEMAP ioctl will allow an application to efficiently fetch the extent information of a file. It can be used to map logical blocks in a file to physical blocks in the block device.
 * Bugzilla: 10555

Severity: normal Frequency: only with adaptive timeout enabled Description: DEBUG_REQ bad paging request Details: ptlrpc_at_recv_early_reply should not modify req->rq_repmsg because it can be accessed by reply_in_callback without the rq_lock held.
 * Bugzilla: 16972

Severity: normal Frequency: only on Cray X2 Description: X2 build failures Details: fix build failures on Cray X2.
 * Bugzilla: 16813

Severity: normal Description: xid & resent requests Details: Initialize RPC XID from clock at startup (randomly if clock is bad).
 * Bugzilla: 2066

Severity: major Description: quota recovery deadlock during mds failover Details: This patch includes att18982, att18236, att18237 in bz14840. Solve the problems: 1. fix osts hang when mds does failover with quotaon 2. prevent watchdog storm when osts threads wait for the recovery of mds
 * Bugzilla: 14840

Severity: normal Description: kernel panic on racer Details: Do not access dchild->d_inode when IS_ERR(dchild) is true.
 * Bugzilla: 16695

Severity: enhancement Description: Add lustre_start utility to start or stop multiple Lustre servers from a CSV file.
 * Bugzilla: 14095

Severity: major Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373} Details: In case of memory pressure, list_del can be called twice on req->rq_history_list, causing a kernel oops.
 * Bugzilla: 17024

Severity: normal Description: kptllnd_peer_check_sends) ASSERTION(!in_interrupt) failed Details: fix stack overflow in the distributed lock manager by defering export eviction after a failed ast to the elt thread instead of handling it in the dlm interpret routine.
 * Bugzilla: 17026

Severity: enhancement Description: More exported tunables for mballoc Details: Add support for tunable preallocation window and new tunables for large/small requests
 * Bugzilla: 12800

Severity: normal Description: Detect corruption of block bitmap and checking for preallocations Details: Checks validity of on-disk block bitmap. Also it does better checking of number of applied preallocations. When corruption is found, it turns filesystem readonly to prevent further corruptions.
 * Bugzilla: 16680

Severity: normal Frequency: only for big-endian servers Description: Check if big-endian system while mounting fs with extents feature Details: Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. Can be overridden with "bigendian_extents" mount option.
 * Bugzilla: 16438

Severity: normal Description: Excessive recovery window Details: With AT enabled, the recovery window can be excessively long (6000+ seconds). To address this problem, we no longer use OBD_RECOVERY_FACTOR when extending the recovery window (the connect timeout no longer depends on the service time, it is set to INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time.
 * Bugzilla: 16860

Severity: normal Description: Watchdog triggered on MDS failover Details: enable OBD_CONNECT_MDT flag when connecting from the MDS so that the OSTs know that the MDS "UUID" can be reused for the same export from a different NID, so we do not need to wait for the export to be evicted.
 * Bugzilla: 16522

Severity: enhancement Description: Don't sync journal after every i/o Details: Implement write RPC replay to allow server replies for write RPCs before data is on disk. However, this feature is disabled by default since some issues leading to data corruptions have been found during recovery (e.g. bug 19128). This feature can be enabled by running the following command on the OSSs: lctl set_param obdfilter.*.sync_journal=0
 * Bugzilla: 16919

Severity: low Description: Slow reads beyond 8Tb offsets. Details: Page index integer overflow in ll_read_ahead_page
 * Bugzilla: 18016

Severity: major Frequency: rare, only if using MMP with Linux RAID Description: MMP doesn't work with Linux RAID Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
 * Bugzilla: 17895

Severity: minor Frequency: rare, during recovery Description: Assertion failure in ldlm_lock_put Details: Do not put cancelled locks into replay list, hold references on locks in replay list
 * Bugzilla: 17895

Severity: critical Description: Lustre detected file system corruption with inode out of bounds Details: don't update i_size on MDS_CLOSE for directories. This causes directory corruptions on the MDT.
 * Bugzilla: 18695

Severity: normal Description: client doesn't try to reconnect Details: correctly skip time estimate if in recovery
 * Bugzilla: 19223