[edit] WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Change Log 1.8

From Obsolete Lustre Wiki
(Difference between revisions)
Jump to: navigation, search
(Created page with 'Here you go.')
 
Line 1: Line 1:
Here you go.
+
=Changes from v1.6.7.1 to v1.8.0=
 +
'''Support for networks:'''<br>
 +
''' socklnd - any kernel supported by Lustre'''<br>
 +
''' qswlnd - Qsnet kernel modules 5.20 and later'''<br>
 +
''' openiblnd - IbGold 1.8.2'''<br>
 +
''' o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3.'''<br>
 +
''' viblnd - Voltaire ibhost 3.4.5 and later'''<br>
 +
''' ciblnd - Topspin 3.2.0'''<br>
 +
''' iiblnd - Infiniserv 3.3 + PathBits patch'''<br>
 +
''' gmlnd - GM 2.1.22 and later'''<br>
 +
''' mxlnd - MX 1.2.1 or later'''<br>
 +
''' ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x'''<br>
 +
 
 +
'''Support for kernels:'''<br>
 +
''' 2.6.16.60-0.31 (SLES 10)'''<br>
 +
''' 2.6.18-92.1.17.el5 (RHEL 5)'''<br>
 +
''' 2.6.22.14 vanilla (kernel.org)'''<br>
 +
 
 +
'''Client support for unpatched kernels: (see [[Patchless_Client]])'''<br>
 +
''' 2.6.16 - 2.6.22 vanilla (kernel.org)'''<br>
 +
 
 +
'''Recommended e2fsprogs version: 1.40.11-sun1'''
 +
 
 +
'''File join has been disabled in this release, refer to bugzilla [https://bugzilla.lustre.org/show_bug.cgi?id=16929 16929]'''
 +
 
 +
'''A new Lustre ADIO driver is available for MPICH2-1.0.7.'''
 +
 
 +
'''NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla [https://bugzilla.lustre.org/show_bug.cgi?id=17630 17630]'''
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16114 16114]'''
 +
Severity: minor<br>
 +
Description: minor fixes and cleanups<br>
 +
Details: use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK.  Initialize 'ix' variable in extents patch to stop compiler warning.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17942 17942]'''
 +
Severity: feature<br>
 +
Description: update FIEMAP ioctl to match upstream kernel version<br>
 +
Details: the FIEMAP block-mapping ioctl had a prototype version in ldiskfs 3.0.7 but this release updates it to match the interface in the upstream kernel, with a new ioctl number.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18173 18173]'''
 +
Severity: normal<br>
 +
Frequency: only if MMP is active and detects filesystem is in use<br>
 +
Description: if MMP startup fails, an oops is triggered<br>
 +
Details: if ldiskfs mounting doesn't succeed the error handling doesn't clean up the MMP data correctly, causing an oops.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=12182 12182]'''
 +
Severity: enhancement<br>
 +
Description: Caching OSS<br>
 +
Details: introduce data caching on the OSS. The OSS now relies on the linux kernel page cache to keep recently accessed data in memory. It is worth noting that all write requests are still flushed synchronously as in lustre 1.6.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=10609 10609]'''
 +
Severity: enhancement<br>
 +
Description: version based recovery<br>
 +
Details: introduce finer grained recovery able to detect transaction dependencies and can deal with transaction gaps caused by clients failing at the same time as the server.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=3055 3055]'''
 +
Severity: enhancement<br>
 +
Description: Enable adaptive timeouts by default<br>
 +
Details: The Lustre timeout value in /proc/sys/lustre/timeout is now managed dynamically based on server load and should not need to be tuned manually based on cluster size. This allows Lustre to work under a wider variety of system sizes and loads, without unnecessarily causing lengthy recovery times.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15899 15899]'''
 +
Severity: enhancement<br>
 +
Description: Add OST Pools support<br>
 +
Details: File striping can now be set to use an arbitrary pool of OSTs
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17974 17974]'''
 +
Severity: enhancement<br>
 +
Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs<br>
 +
Details: allow skip disconnected ost for send statfs request and hide error in this case.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16839 16839 ]'''
 +
Severity: normal<br>
 +
Frequency: rare, on llog test 6<br>
 +
Description: don't allow connect to already connected import<br>
 +
Details: allowing connect to already connected import is hide connecting problem.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17310 17310]'''
 +
Severity: normal<br>
 +
Frequency: rare, connect and disconnect target at same time<br>
 +
Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0<br>
 +
Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18896 18896]'''
 +
Severity: normal<br>
 +
Frequency: rare, on failed llog setup<br>
 +
Description: don't leak obd reference on failed llog setup<br>
 +
Details: for failed llog setup - mgc forget call class_destroy_import for client import, move destroy import to more generic place.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18902 18902]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: allow kill process which wait statahead result<br>
 +
Details: for some reasons 'ls' can stick in waiting result from statahead, in this case need way for kill this process.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18154 18154]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: don't lose wakeup for imp_recovery_waitq<br>
 +
Details: recover_import_no_retry or invalidate_import and import_close can both sleep on imp_recovery_waitq, but we was send only one wakeup to sleep queue.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18773 18773]'''
 +
Severity: normal<br>
 +
Frequency: rare, at shutdown<br>
 +
Description: panic at umount<br>
 +
Details: llap_shrinker can be raced with killing super block from list and this produce panic with access to already freeded pointer
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18238 18238]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: panic in mds_open<br>
 +
Details: don't confuse mds_finish_transno() with PTR_ERR(-ENOENT)
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17972 17972]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: stuck in cache_remove_extent() or panic with accessing to already freed look.<br>
 +
Details: release lock refernce only after add page to pages list.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16839 16839]'''
 +
Severity: normal<br>
 +
Frequency: start MDS on uncleanly shutdowned MDS device<br>
 +
Description: ll_sync thread stay in waiting mds<>ost recovery finished<br>
 +
Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17636 17636]'''
 +
Severity: normal<br>
 +
Frequency: always with long access acl<br>
 +
Description: mds can't pack reply with long acl.<br>
 +
Details: mds don't control size of acl but they limited by reint/getattr reply buffer.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18049 18049]'''
 +
Severity: normal<br>
 +
Frequency: start MDS on uncleanly shutdowned MDS device<br>
 +
Description: aborting recovery hang on MDS<br>
 +
Details: don't throttle destroy RPCs for the MDT.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18018 18018]'''
 +
Severity: major<br>
 +
Frequency: on remount<br>
 +
Description: external journal device not working after the remount<br>
 +
Details: clear dev_rdonly flag for external journal devices in blkdev_put()
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17802 17802]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: shutdown vs evict race<br>
 +
Details: client_disconnect_export vs connect request race. if client will evicted at this time - we start invalidate thread without referece to import and import can be freed at same time.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16693 16693]'''
 +
Severity: minor<br>
 +
Frequency: always<br>
 +
Description: shrink LOV EAs before replying<br>
 +
Details: correctly adjust LOV EA buffer for reply.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16081 16081]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: don't skip ost target if they assigned to file<br>
 +
Details: Drop slow OSCs if we can, but not for requested start idx. This means "if OSC is slow and it is not the requested start OST, then it can be skipped, otherwise skip it only if it is inactive/recovering/out-of-space.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17201 17201]'''
 +
Severity: enhancement<br>
 +
Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17458 17458]'''
 +
Severity: enhancement<br>
 +
Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16492 16492]'''
 +
Severity: normal<br>
 +
Frequency: rare, need acl's on inode.<br>
 +
Description: client can't handle ost additional correctly<br>
 +
Details: if ost was added after client connected to mds client can have hit lnet_try_match_md ... to big messages to wide striped files. in this case need teach client to handle config events about add lov target and update client max ea size at that event.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16578 16578]'''
 +
Severity: normal<br>
 +
Frequency: Create a symlink file with a very long name<br>
 +
Description: ldlm_cancel_pack()) ASSERTION(max >= dlm->lock_count + count)<br>
 +
Details: If there is no extra space in the request for early cancels, ldlm_req_handles_avail() returns 0 instead of a negative value.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16492 16492]'''
 +
Severity: major<br>
 +
Frequency: rare<br>
 +
Description: mds is deadlocked<br>
 +
Details: in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=1819 1819]'''
 +
Severity: enhancement<br>
 +
Description: Add /proc entry for import status<br>
 +
Details: The mdc, osc, and mgc import directories now have an import directory that contains useful import data for debugging connection problems.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15966 15966]'''
 +
Severity: enhancement<br>
 +
Description: Re-disable certain /proc logging<br>
 +
Details: Enable and disable client's offset_stats, extents_stats and extents_stats_per_process stats logging on the fly.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16303 16303]'''
 +
Severity: major<br>
 +
Frequency: Only on FC kernels 2.6.22+<br>
 +
Description: oops in statahead<br>
 +
Details: Do not drop reference count for the dentry from VFS when lookup, VFS will do that by itself.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16643 16643]'''
 +
Severity: enhancement<br>
 +
Description: Generic /proc file permissions<br>
 +
Details: Set /Proc file permissions in a more generic way to enable non-root users operate on some /proc files.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16561 16561]'''
 +
Severity: major<br>
 +
Description: Hitting mdc_commit_close() ASSERTION<br>
 +
Details: Properly handle request reference release in ll_release_openhandle().
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15975 15975]'''
 +
Severity: normal<br>
 +
Description: only patchless client<br>
 +
Details: add workaround for race between add/remove dentry from hash
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16845 16845]'''
 +
Severity: enhancement<br>
 +
Description: Allow OST glimpses to return PW locks<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16717 16717]'''
 +
Severity: minor<br>
 +
Description: LBUG when llog conf file is full<br>
 +
Details: When llog bitmap is full, ENOSPC should be returned for plain log.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16907 16907]'''
 +
Severity: normal<br>
 +
Description: Prevent import from entering FULL state when server in recovery<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16750 16750]'''
 +
Severity: major<br>
 +
Description: service mount cannot take device name with ":"<br>
 +
Details: Only when device name contains ":/" will mount treat it as client mount.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15927 15927]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16611 16611]'''
 +
Severity: normal<br>
 +
Frequency: on recovery<br>
 +
Description: I/O failures after umount during fail back<br>
 +
Details: if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15825 15825]'''
 +
Severity: normal<br>
 +
Description: Kernel BUG tries to release flock<br>
 +
Details: Lustre does not destroy flock lock before last reference goes away. So always drop flock locks when client is evicted and perform unlock regardless of successfulness of speaking to MDS.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16566 16566]'''
 +
Severity: enhancement<br>
 +
Description: Upcall on Lustre log has been dumped<br>
 +
Details: Allow for a user mode script to be called once a Lustre log has been dumped. It passes the filename of the dumped log to the script, the location of the script can be specified via /proc/sys/lnet/debug_log_upcall.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16583 16583]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: avoid messages about idr_remove called for id that is not allocated<br>
 +
Details: Move assigment s_dev for clustered nfs to end of initialization, for avoid problem with error handling.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16109 16109]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages<br>
 +
Details: When connection is reused this not moved from CONN_UNUSED_HASH into CONN_USED_HASH and this prodice warning when put connection again in unused hash.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15139 15139]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed<br>
 +
Details: release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16679 16679]'''
 +
Severity: normal<br>
 +
Description: more cleanup in mds_lov<br>
 +
Details: add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16573 16573]'''
 +
Severity: enhancement<br>
 +
Description: Export bytes_read/bytes_write count on OSC/OST.<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16237 16237]'''
 +
Severity: normal<br>
 +
Description: Early reply size mismatch, MGC loses connection<br>
 +
Details: Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so the connect flags are properly negotiated.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16006 16006]'''
 +
Severity: normal<br>
 +
Description: Properly propagate oinfo flags from lov to osc for statfs<br>
 +
Details: restore missing copy oi_flags to lov requests.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16317 16317]'''
 +
Severity: normal<br>
 +
Description: exports in /proc are broken<br>
 +
Details: recreate /proc entries for clients when they reconnect.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16581 16581]'''
 +
Severity: enhancement<br>
 +
Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)<br>
 +
Details: included man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16208 16208]'''
 +
Severity: enhancement<br>
 +
Description: Implement lustre ll_show_options method.<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16317 16317]'''
 +
Severity: normal<br>
 +
Description: exports in /proc are broken<br>
 +
Details: recreate /proc entries for clients when they reconnect.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16080 16080]'''
 +
Severity: normal<br>
 +
Description: don't fail open with -ERANGE<br>
 +
Details: if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15576 15576]'''
 +
Severity: normal<br>
 +
Description: Resolve device initialization race<br>
 +
Details: Prevent proc handler from accessing devices added to the obd_devs array but yet be intialized.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16091 16091]'''
 +
Severity: enhancement<br>
 +
Description: configure's --enable-quota should check the kernel .config for CONFIG_QUOTA<br>
 +
Details: configure is terminated if --enable-quota is passed but no quota support is in kernel
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16318 16318]'''
 +
Severity: normal<br>
 +
Frequency: rare, on PPC clients<br>
 +
Description: don't swab ost objects in response about directory, because this not exist.<br>
 +
Details: bug similar bug 14856, but in different function.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15754 15754]'''
 +
Severity: enhancement<br>
 +
Description: lfs quota tool enhancement<br>
 +
Details: added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15625 15625]'''
 +
Severity: enhancement<br>
 +
Description: *optional* service tags registration<br>
 +
Details: if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created.  See http://inventory.sun.com/ for more information about the Service Tags asset management system.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16037 16037]'''
 +
Severity: normal<br>
 +
Description: Client runs out of low memory<br>
 +
Details: Consider only lowmem when counting initial number of llap pages
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15210 15210]'''
 +
Severity: normal<br>
 +
Frequency: occasional<br>
 +
Description: add refcount for osc callbacks, so avoid panic on shutdown<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=12653 12653]'''
 +
Severity: normal<br>
 +
Frequency: testing only<br>
 +
Description: sanity test 65a fails if stripecount of -1 is set<br>
 +
Details: handle -1 striping on filesystem in ll_dirstripe_verify
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16014 16014]'''
 +
Severity: normal<br>
 +
Frequency: only in unusual configurations<br>
 +
Description: Kernel panic with find ost index.<br>
 +
Details: lov_obd have panic if some OST's have sparse indexes.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15924 15924]'''
 +
Severity: major<br>
 +
Frequency: rarely, if filesystem is mounted with -o flock<br>
 +
Description: do not process already freed flock<br>
 +
Details: flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14480 14480]'''
 +
Severity: normal<br>
 +
Frequency: rarely, if filesystem is mounted with -o flock<br>
 +
Description: LBUG during stress test<br>
 +
Details: Need properly lock accesses the flock deadlock detection list.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15837 ]'''
 +
Severity: minor<br>
 +
Frequency: rarely, if binaries are being run from Lustre<br>
 +
Description: oops in page fault handler<br>
 +
Details: kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15716 15716]'''
 +
Severity: minor<br>
 +
Frequency: rarely, during shutdown<br>
 +
Description: timeout with invalidate import.<br>
 +
Details: ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be handled by ptlrpcd. This produce long age waiting and -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14742 14742]'''
 +
Severity: normal<br>
 +
Frequency: rarely<br>
 +
Description: ASSERTION(CheckWriteback(page,cmd)) failed<br>
 +
Details: badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15779 15779]'''
 +
Severity: normal<br>
 +
Frequency: only with broken builds/installations<br>
 +
Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions<br>
 +
Details: just return an error to a user, put a console error message
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14134 14134]'''
 +
Severity: enhancement<br>
 +
Description: enable MGS and MDT services start separately<br>
 +
Details: add a 'nomgs' option in mount.lustre to enable start a MDT with a co-located MGS without starting the MGS, which is a complement to 'nosvc' mount option.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14856 14856]'''
 +
Severity: normal<br>
 +
Frequency: always, on big-endian systems<br>
 +
Description: cleanup in ptlrpc code, related to PPC platform<br>
 +
Details: store magic in native order avoid panic's in recovery on PPC node and forbid from this error in future. Also fix possibly of twice swab data. Fix get lov striping to userpace.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15756 15756]'''
 +
Severity: normal<br>
 +
Frequency: rarely, if replay get lost on server<br>
 +
Description: server incorrectly drop resent replays lead to recovery failure.<br>
 +
Details: do not drop replay according to msg flags, instead we check the per-export recovery request queue for duplication of transno.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14835 14835]'''
 +
Severity: normal<br>
 +
Frequency: after recovery<br>
 +
Description: precreate to many object's after del orphan.<br>
 +
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14835 14835]'''
 +
Severity: normal<br>
 +
Frequency: after recovery<br>
 +
Description: precreate to many object's after del orphan.<br>
 +
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15139 15139]'''
 +
Severity: normal<br>
 +
Frequency: rare, on clear nid stats<br>
 +
Description: ASSERTION(client_stat->nid_exp_ref_count == 0)<br>
 +
Details: when clean nid stats sometimes try destroy live entry, and this produce panic in free.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15575 15575]'''
 +
Severity: major<br>
 +
Frequency: occasionally since 1.6.4<br>
 +
Description: Stack overflow during MDS log replay<br>
 +
Details: ease stack pressure by using a thread dealing llog_process.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=13380 13380]'''
 +
Severity: minor<br>
 +
Frequency: very rare<br>
 +
Description: MDT cannot be unmounted, reporting "Mount still busy"<br>
 +
Details: Mountpoint references were being leaked during open reply reconstruction after an MDS restart.  Drop mountpoint reference in reconstruct_open() and free dentry reference also.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15443 15443]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: wait until IO finished before start new when do lock cancel.<br>
 +
Details: VM protocol want old IO finished before start new, in this case need wait until PG_writeback is cleared until check dirty flag  and call writepages in lock cancel callback.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=12888 12888]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: mds_mfd_close() ASSERTION(rc == 0)<br>
 +
Details: In mds_mfd_close(), we need protect inode's writecount change within its orphan write semaphore to prevent possible races.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14645 14645]'''
 +
Severity: minor<br>
 +
Frequency: rare, on shutdown ost<br>
 +
Description: don't hit live lock with umount ost.<br>
 +
Details: shrink_dcache_parent can be in long loop with destroy dentries, use shrink_dcache_sb instead.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14949 14949]'''
 +
Severity: minor<br>
 +
Frequency: only when echo_client is used<br>
 +
Description: don't panic with use echo_client<br>
 +
Details: echo client pass NULL as client nid pointer and this produce NULL pointer dereference.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15278 15278]'''
 +
Severity: normal<br>
 +
Frequency: Always on 32-bit PowerPC systems<br>
 +
Description: fix build on PPC32<br>
 +
Details: compile code with -m64 flag produce wrong object file for PPC32.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15574 15574]'''
 +
Severity: normal<br>
 +
Frequency: rare<br>
 +
Description: MDS LBUG: ASSERTION(!IS_ERR(dchild))<br>
 +
Details: In reconstruct_* functions, LASSERTs on both the data supplied by a client, and the data on disk are dangerous and incorrect. Change them with client eviction.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15346 15346]'''
 +
Severity: enhancement<br>
 +
Description: skiplist implementation simplification<br>
 +
Details: skiplists are used to group compatible locks on granted list that was implemented as tracking first and last lock of each lock group the patch changes that to using doubly linked lists
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15933 15933]'''
 +
Severity: normal<br>
 +
Description: delete compatibility for 32bit qdata<br>
 +
Details: as planned, when lustre is beyond b1_8, lquota won't support 32bit qunit. That means servers of b1_4 and servers of b1_8 can't be used together if users want to use quota.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14693 14693]'''
 +
Severity: normal<br>
 +
Frequency: only with administrator action<br>
 +
Description: mount failure if config log has invalid conf_param setting<br>
 +
Details: If administrator specified an incorrect configuration parameter with "lctl conf_param" this would cause an error during future client mounts.  Instead, ignore the bad configuration parameter.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15932 15932]'''
 +
Severity: normal<br>
 +
Frequency: blocks per group < blocksize*8 and uninit_groups is enabled<br>
 +
Description: ldiskfs error: XXX blocks in bitmap, YYY in gd<br>
 +
Details: If blocks per group is less than blocksize*8, set rest of the bitmap to 1.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16172 16172]'''
 +
Severity: major<br>
 +
Frequency: Application do stride read on lustre<br>
 +
Description: The read performance will drop a lot if the application does stride read.<br>
 +
Details: Because the stride_start_offset are missing in stride read-ahead, it will cause clients read a lot of unused pages in read-ahead, then the read-performance drops.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15953 15953]'''
 +
Severity: normal<br>
 +
Description: more ldlm soft lockups<br>
 +
Details: In ldlm_resource_add_lock(), call to ldlm_resource_dump() starve other threads from the resource lock for a long time in case of long waiting queue, so change the debug level from D_OTHER to the less frequently used D_INFO.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=13128 13128]'''
 +
Severity: enhancement<br>
 +
Description: add -gid, -group, -uid, -user options to lfs find<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15284 15284]'''
 +
Severity: enhancement<br>
 +
Description: ll_recover_lost_found_objs - recover objects in lost+found<br>
 +
Details: OST corruption and subsequent e2fsck can leave objects in the lost+found directory.  Using the "ll_recover_lost_found_objs" tool, these objects can be retrieved and data can be salvaged by using the object ID saved in the fid EA on each object.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15758 15758]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: this bug _only_ happens when inode quota limitation is very low (less than 12), so that inode quota unit is 1 at initialization.<br>
 +
Details: if remaining quota equates 1, it is a sign to demonstate that quota is effective now. So least quota qunit should be 2.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15950 15950]'''
 +
Severity: normal<br>
 +
Description: Hung threads in invalidate_inode_pages2_range<br>
 +
Details: The direct IO path doesn't call check_rpcs to submit a new RPC once one is completed. As a result, some RPCs are stuck in the queue  and are never sent.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15684 15684]'''
 +
Severity: normal<br>
 +
Description: Procfs and llog threads access destoryed import sometimes.<br>
 +
Details: Sync the import destoryed process with procfs and llog threads by the import refcount and semaphore.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15674 15674]'''
 +
Severity: major<br>
 +
Description: mds fails to respond, threads stuck in ldlm_completion_ast<br>
 +
Details: Sort source/child resource pair after updating child resource.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16226 16226]'''
 +
Severity: major<br>
 +
Frequency: rare<br>
 +
Description: kernel BUG at ldiskfs2_ext_new_extent_cb<br>
 +
Details: If insertion of an extent fails, then discard the inode preallocation and free data blocks else it can lead to duplicate blocks.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16199 16199]'''
 +
Severity: normal<br>
 +
Description: don't always update ctime in ext3_xattr_set_handle()<br>
 +
Details: Current xattr code updates inode ctime in ext3_xattr_set_handle() In some cases the ctime should not be updated, for example for 2.0->1.8 compatibility it is necessary to delete an xattr and it should not update the ctime.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15058 15058]'''
 +
Severity: normal<br>
 +
Description: add quota statistics<br>
 +
Details: 1. sort out quota proc entries and proc code. 2. add quota statistics
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16125 16125]'''
 +
Severity: normal<br>
 +
Frequency: often<br>
 +
Description: quotas are not honored with O_DIRECT<br>
 +
Details: all writes with the flag O_DIRECT will use grants which leads to this problem. Now using OBD_BRW_SYNC to guard this.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=15713 15713]'''
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16362 16362]'''
 +
Severity: major<br>
 +
Frequency: rare<br>
 +
Description: Assertion in iopen_connect_dentry in 1.6.3<br>
 +
Details: looking up an inode via iopen with the wrong generation number can populate the dcache with a disconneced dentry while the inode number is in the process of being reallocated. This causes an assertion failure in iopen since the inode's dentry list contains both a connected and disconnected dentry.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16496 16496]'''
 +
Severity: normal<br>
 +
Description: assertion failure in ldlm_handle2lock()<br>
 +
Details: fix a race between class_handle_unhash() and class_handle2object() introduced in lustre 1.6.5 by bug 13622.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=11817 11817]'''
 +
Severity: enhancement<br>
 +
Description: superblock lock contention with many SMP cores on one client<br>
 +
Details: several client filesystem locks were highly contended on SMP NUMA systems with 8 or more cores.  Per-CPU datastructure and more efficient locking implemented to reduce contention.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=12755 12755]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: Kernel BUG: sd_iostats_bump: unexpected disk index<br>
 +
Details: remove the limit of 256 scsi disks in the sd_iostat patch
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16494 16494]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: oops in sd_iostats_seq_show()<br>
 +
Details: unloading/reloading the scsi low level driver triggers a kernel bug when trying to access the sd iostat file.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16404 16404]'''
 +
Severity: major<br>
 +
Frequency: rare<br>
 +
Description: Kernel panics during QLogic driver reload<br>
 +
Details: REQ_BLOCK_PC requests are not handled properly in the sd iostat patch, causing memory corruption.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16140 16140]'''
 +
Severity: minor<br>
 +
Frequency: rare<br>
 +
Description: journal_dev option does not work in b1_6<br>
 +
Details: pass mount option during pre-mount.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=10555 10555]'''
 +
Severity: enhancement<br>
 +
Frequency: <br>
 +
Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs<br>
 +
Details: FIEMAP ioctl will allow an application to efficiently fetch the extent information of a file. It can be used to map logical blocks in a file to physical blocks in the block device.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16972 16972]'''
 +
Severity: normal<br>
 +
Frequency: only with adaptive timeout enabled<br>
 +
Description: DEBUG_REQ() bad paging request<br>
 +
Details: ptlrpc_at_recv_early_reply() should not modify req->rq_repmsg because it can be accessed by reply_in_callback() without the rq_lock held.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16813 16813]'''
 +
Severity: normal<br>
 +
Frequency: only on Cray X2<br>
 +
Description: X2 build failures<br>
 +
Details: fix build failures on Cray X2.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=2066 2066]'''
 +
Severity: normal<br>
 +
Description: xid & resent requests<br>
 +
Details: Initialize RPC XID from clock at startup (randomly if clock is bad).
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14840 14840]'''
 +
Severity: major<br>
 +
Description: quota recovery deadlock during mds failover<br>
 +
Details: This patch includes att18982, att18236, att18237 in bz14840. Solve the problems: 1. fix osts hang when mds does failover with quotaon 2. prevent watchdog storm when osts threads wait for the recovery of mds
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16695 16695]'''
 +
Severity: normal<br>
 +
Description: kernel panic on racer<br>
 +
Details: Do not access dchild->d_inode when IS_ERR(dchild) is true.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=14095 14095]'''
 +
Severity: enhancement<br>
 +
Description: Add lustre_start utility to start or stop multiple Lustre servers from a CSV file.<br>
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17024 17024]'''
 +
Severity: major<br>
 +
Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373}<br>
 +
Details: In case of memory pressure, list_del() can be called twice on req->rq_history_list, causing a kernel oops.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17026 17026]'''
 +
Severity: normal<br>
 +
Description: kptllnd_peer_check_sends()) ASSERTION(!in_interrupt()) failed<br>
 +
Details: fix stack overflow in the distributed lock manager by defering export eviction after a failed ast to the elt thread instead of handling it in the dlm interpret routine.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=12800 12800]'''
 +
Severity: enhancement<br>
 +
Description: More exported tunables for mballoc<br>
 +
Details: Add support for tunable preallocation window and new tunables for large/small requests
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16680 16680]'''
 +
Severity: normal<br>
 +
Description: Detect corruption of block bitmap and checking for preallocations<br>
 +
Details: Checks validity of on-disk block bitmap. Also it does better checking of number of applied preallocations. When corruption is found, it turns filesystem readonly to prevent further corruptions.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16438 16438]'''
 +
Severity: normal<br>
 +
Frequency: only for big-endian servers<br>
 +
Description: Check if big-endian system while mounting fs with extents feature<br>
 +
Details: Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems.  Can be overridden with "bigendian_extents" mount option.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16860 16860]'''
 +
Severity: normal<br>
 +
Description: Excessive recovery window<br>
 +
Details: With AT enabled, the recovery window can be excessively long (6000+ seconds). To address this problem, we no longer use OBD_RECOVERY_FACTOR when extending the recovery window (the connect timeout no longer depends on the service time, it is set to INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16522 16522]'''
 +
Severity: normal<br>
 +
Description: Watchdog triggered on MDS failover<br>
 +
Details: enable OBD_CONNECT_MDT flag when connecting from the MDS so that the OSTs know that the MDS "UUID" can be reused for the same export from a different NID, so we do not need to wait for the export to be evicted.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=16919 16919]'''
 +
Severity: enhancement<br>
 +
Description: Don't sync journal after every i/o<br>
 +
Details: Implement write RPC replay to allow server replies for write RPCs before data is on disk. However, this feature is disabled by default since some issues leading to data corruptions have been found during recovery (e.g. bug 19128). This feature can be enabled by running the following command on the OSSs: lctl set_param obdfilter.*.sync_journal=0
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18016 18016]'''
 +
Severity: low<br>
 +
Description: Slow reads beyond 8Tb offsets.<br>
 +
Details: Page index integer overflow in ll_read_ahead_page
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17895 17895]'''
 +
Severity: major<br>
 +
Frequency: rare, only if using MMP with Linux RAID<br>
 +
Description: MMP doesn't work with Linux RAID<br>
 +
Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=17895 17895]'''
 +
Severity: minor<br>
 +
Frequency: rare, during recovery<br>
 +
Description: Assertion failure in ldlm_lock_put<br>
 +
Details: Do not put cancelled locks into replay list, hold references on locks in replay list
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=18695 18695]'''
 +
Severity: critical<br>
 +
Description: Lustre detected file system corruption with inode out of bounds<br>
 +
Details: don't update i_size on MDS_CLOSE for directories. This causes directory corruptions on the MDT.
 +
 
 +
*'''Bugzilla: [https://bugzilla.lustre.org/show_bug.cgi?id=19223 19223]'''
 +
Severity: normal<br>
 +
Description: client doesn't try to reconnect<br>
 +
Details: correctly skip time estimate if in recovery

Revision as of 05:52, 6 May 2009

Changes from v1.6.7.1 to v1.8.0

Support for networks:
socklnd - any kernel supported by Lustre
qswlnd - Qsnet kernel modules 5.20 and later
openiblnd - IbGold 1.8.2
o2iblnd - OFED 1.1, 1.2.0, 1.2.5, and 1.3.
viblnd - Voltaire ibhost 3.4.5 and later
ciblnd - Topspin 3.2.0
iiblnd - Infiniserv 3.3 + PathBits patch
gmlnd - GM 2.1.22 and later
mxlnd - MX 1.2.1 or later
ptllnd - Portals 3.3 / UNICOS/lc 1.5.x, 2.0.x

Support for kernels:
2.6.16.60-0.31 (SLES 10)
2.6.18-92.1.17.el5 (RHEL 5)
2.6.22.14 vanilla (kernel.org)

Client support for unpatched kernels: (see Patchless_Client)
2.6.16 - 2.6.22 vanilla (kernel.org)

Recommended e2fsprogs version: 1.40.11-sun1

File join has been disabled in this release, refer to bugzilla 16929

A new Lustre ADIO driver is available for MPICH2-1.0.7.

NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630

Severity: minor
Description: minor fixes and cleanups
Details: use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK. Initialize 'ix' variable in extents patch to stop compiler warning.

Severity: feature
Description: update FIEMAP ioctl to match upstream kernel version
Details: the FIEMAP block-mapping ioctl had a prototype version in ldiskfs 3.0.7 but this release updates it to match the interface in the upstream kernel, with a new ioctl number.

Severity: normal
Frequency: only if MMP is active and detects filesystem is in use
Description: if MMP startup fails, an oops is triggered
Details: if ldiskfs mounting doesn't succeed the error handling doesn't clean up the MMP data correctly, causing an oops.

Severity: enhancement
Description: Caching OSS
Details: introduce data caching on the OSS. The OSS now relies on the linux kernel page cache to keep recently accessed data in memory. It is worth noting that all write requests are still flushed synchronously as in lustre 1.6.

Severity: enhancement
Description: version based recovery
Details: introduce finer grained recovery able to detect transaction dependencies and can deal with transaction gaps caused by clients failing at the same time as the server.

Severity: enhancement
Description: Enable adaptive timeouts by default
Details: The Lustre timeout value in /proc/sys/lustre/timeout is now managed dynamically based on server load and should not need to be tuned manually based on cluster size. This allows Lustre to work under a wider variety of system sizes and loads, without unnecessarily causing lengthy recovery times.

Severity: enhancement
Description: Add OST Pools support
Details: File striping can now be set to use an arbitrary pool of OSTs

Severity: enhancement
Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs
Details: allow skip disconnected ost for send statfs request and hide error in this case.

Severity: normal
Frequency: rare, on llog test 6
Description: don't allow connect to already connected import
Details: allowing connect to already connected import is hide connecting problem.

Severity: normal
Frequency: rare, connect and disconnect target at same time
Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0
Details: don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request.

Severity: normal
Frequency: rare, on failed llog setup
Description: don't leak obd reference on failed llog setup
Details: for failed llog setup - mgc forget call class_destroy_import for client import, move destroy import to more generic place.

Severity: normal
Frequency: rare
Description: allow kill process which wait statahead result
Details: for some reasons 'ls' can stick in waiting result from statahead, in this case need way for kill this process.

Severity: normal
Frequency: rare
Description: don't lose wakeup for imp_recovery_waitq
Details: recover_import_no_retry or invalidate_import and import_close can both sleep on imp_recovery_waitq, but we was send only one wakeup to sleep queue.

Severity: normal
Frequency: rare, at shutdown
Description: panic at umount
Details: llap_shrinker can be raced with killing super block from list and this produce panic with access to already freeded pointer

Severity: normal
Frequency: rare
Description: panic in mds_open
Details: don't confuse mds_finish_transno() with PTR_ERR(-ENOENT)

Severity: normal
Frequency: rare
Description: stuck in cache_remove_extent() or panic with accessing to already freed look.
Details: release lock refernce only after add page to pages list.

Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: ll_sync thread stay in waiting mds<>ost recovery finished
Details: stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state.

Severity: normal
Frequency: always with long access acl
Description: mds can't pack reply with long acl.
Details: mds don't control size of acl but they limited by reint/getattr reply buffer.

Severity: normal
Frequency: start MDS on uncleanly shutdowned MDS device
Description: aborting recovery hang on MDS
Details: don't throttle destroy RPCs for the MDT.

Severity: major
Frequency: on remount
Description: external journal device not working after the remount
Details: clear dev_rdonly flag for external journal devices in blkdev_put()

Severity: minor
Frequency: rare
Description: shutdown vs evict race
Details: client_disconnect_export vs connect request race. if client will evicted at this time - we start invalidate thread without referece to import and import can be freed at same time.

Severity: minor
Frequency: always
Description: shrink LOV EAs before replying
Details: correctly adjust LOV EA buffer for reply.

Severity: normal
Frequency: rare
Description: don't skip ost target if they assigned to file
Details: Drop slow OSCs if we can, but not for requested start idx. This means "if OSC is slow and it is not the requested start OST, then it can be skipped, otherwise skip it only if it is inactive/recovering/out-of-space.

Severity: enhancement
Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.

Severity: enhancement
Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.

Severity: normal
Frequency: rare, need acl's on inode.
Description: client can't handle ost additional correctly
Details: if ost was added after client connected to mds client can have hit lnet_try_match_md ... to big messages to wide striped files. in this case need teach client to handle config events about add lov target and update client max ea size at that event.

Severity: normal
Frequency: Create a symlink file with a very long name
Description: ldlm_cancel_pack()) ASSERTION(max >= dlm->lock_count + count)
Details: If there is no extra space in the request for early cancels, ldlm_req_handles_avail() returns 0 instead of a negative value.

Severity: major
Frequency: rare
Description: mds is deadlocked
Details: in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order.

Severity: enhancement
Description: Add /proc entry for import status
Details: The mdc, osc, and mgc import directories now have an import directory that contains useful import data for debugging connection problems.

Severity: enhancement
Description: Re-disable certain /proc logging
Details: Enable and disable client's offset_stats, extents_stats and extents_stats_per_process stats logging on the fly.

Severity: major
Frequency: Only on FC kernels 2.6.22+
Description: oops in statahead
Details: Do not drop reference count for the dentry from VFS when lookup, VFS will do that by itself.

Severity: enhancement
Description: Generic /proc file permissions
Details: Set /Proc file permissions in a more generic way to enable non-root users operate on some /proc files.

Severity: major
Description: Hitting mdc_commit_close() ASSERTION
Details: Properly handle request reference release in ll_release_openhandle().

Severity: normal
Description: only patchless client
Details: add workaround for race between add/remove dentry from hash

Severity: enhancement
Description: Allow OST glimpses to return PW locks

Severity: minor
Description: LBUG when llog conf file is full
Details: When llog bitmap is full, ENOSPC should be returned for plain log.

Severity: normal
Description: Prevent import from entering FULL state when server in recovery

Severity: major
Description: service mount cannot take device name with ":"
Details: Only when device name contains ":/" will mount treat it as client mount.

Severity: normal
Frequency: rare
Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback

Severity: normal
Frequency: on recovery
Description: I/O failures after umount during fail back
Details: if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks.

Severity: normal
Description: Kernel BUG tries to release flock
Details: Lustre does not destroy flock lock before last reference goes away. So always drop flock locks when client is evicted and perform unlock regardless of successfulness of speaking to MDS.

Severity: enhancement
Description: Upcall on Lustre log has been dumped
Details: Allow for a user mode script to be called once a Lustre log has been dumped. It passes the filename of the dumped log to the script, the location of the script can be specified via /proc/sys/lnet/debug_log_upcall.

Severity: minor
Frequency: rare
Description: avoid messages about idr_remove called for id that is not allocated
Details: Move assigment s_dev for clustered nfs to end of initialization, for avoid problem with error handling.

Severity: minor
Frequency: rare
Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages
Details: When connection is reused this not moved from CONN_UNUSED_HASH into CONN_USED_HASH and this prodice warning when put connection again in unused hash.

Severity: normal
Frequency: rare
Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed
Details: release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export.

Severity: normal
Description: more cleanup in mds_lov
Details: add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer.

Severity: enhancement
Description: Export bytes_read/bytes_write count on OSC/OST.

Severity: normal
Description: Early reply size mismatch, MGC loses connection
Details: Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so the connect flags are properly negotiated.

Severity: normal
Description: Properly propagate oinfo flags from lov to osc for statfs
Details: restore missing copy oi_flags to lov requests.

Severity: normal
Description: exports in /proc are broken
Details: recreate /proc entries for clients when they reconnect.

Severity: enhancement
Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
Details: included man pages for llobdstat(8), llstat(8), plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)

Severity: enhancement
Description: Implement lustre ll_show_options method.

Severity: normal
Description: exports in /proc are broken
Details: recreate /proc entries for clients when they reconnect.

Severity: normal
Description: don't fail open with -ERANGE
Details: if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA.

Severity: normal
Description: Resolve device initialization race
Details: Prevent proc handler from accessing devices added to the obd_devs array but yet be intialized.

Severity: enhancement
Description: configure's --enable-quota should check the kernel .config for CONFIG_QUOTA
Details: configure is terminated if --enable-quota is passed but no quota support is in kernel

Severity: normal
Frequency: rare, on PPC clients
Description: don't swab ost objects in response about directory, because this not exist.
Details: bug similar bug 14856, but in different function.

Severity: enhancement
Description: lfs quota tool enhancement
Details: added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page.

Severity: enhancement
Description: *optional* service tags registration
Details: if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more information about the Service Tags asset management system.

Severity: normal
Description: Client runs out of low memory
Details: Consider only lowmem when counting initial number of llap pages

Severity: normal
Frequency: occasional
Description: add refcount for osc callbacks, so avoid panic on shutdown

Severity: normal
Frequency: testing only
Description: sanity test 65a fails if stripecount of -1 is set
Details: handle -1 striping on filesystem in ll_dirstripe_verify

Severity: normal
Frequency: only in unusual configurations
Description: Kernel panic with find ost index.
Details: lov_obd have panic if some OST's have sparse indexes.

Severity: major
Frequency: rarely, if filesystem is mounted with -o flock
Description: do not process already freed flock
Details: flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast.

Severity: normal
Frequency: rarely, if filesystem is mounted with -o flock
Description: LBUG during stress test
Details: Need properly lock accesses the flock deadlock detection list.

Severity: minor
Frequency: rarely, if binaries are being run from Lustre
Description: oops in page fault handler
Details: kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.

Severity: minor
Frequency: rarely, during shutdown
Description: timeout with invalidate import.
Details: ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be handled by ptlrpcd. This produce long age waiting and -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.

Severity: normal
Frequency: rarely
Description: ASSERTION(CheckWriteback(page,cmd)) failed
Details: badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion.

Severity: normal
Frequency: only with broken builds/installations
Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions
Details: just return an error to a user, put a console error message

Severity: enhancement
Description: enable MGS and MDT services start separately
Details: add a 'nomgs' option in mount.lustre to enable start a MDT with a co-located MGS without starting the MGS, which is a complement to 'nosvc' mount option.

Severity: normal
Frequency: always, on big-endian systems
Description: cleanup in ptlrpc code, related to PPC platform
Details: store magic in native order avoid panic's in recovery on PPC node and forbid from this error in future. Also fix possibly of twice swab data. Fix get lov striping to userpace.

Severity: normal
Frequency: rarely, if replay get lost on server
Description: server incorrectly drop resent replays lead to recovery failure.
Details: do not drop replay according to msg flags, instead we check the per-export recovery request queue for duplication of transno.

Severity: normal
Frequency: after recovery
Description: precreate to many object's after del orphan.
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.

Severity: normal
Frequency: after recovery
Description: precreate to many object's after del orphan.
Details: del orphan st in oscc last_id == next_id and this triger growing count of precreated objects. Set flag LOW to skip increase count of precreated objects.

Severity: normal
Frequency: rare, on clear nid stats
Description: ASSERTION(client_stat->nid_exp_ref_count == 0)
Details: when clean nid stats sometimes try destroy live entry, and this produce panic in free.

Severity: major
Frequency: occasionally since 1.6.4
Description: Stack overflow during MDS log replay
Details: ease stack pressure by using a thread dealing llog_process.

Severity: minor
Frequency: very rare
Description: MDT cannot be unmounted, reporting "Mount still busy"
Details: Mountpoint references were being leaked during open reply reconstruction after an MDS restart. Drop mountpoint reference in reconstruct_open() and free dentry reference also.

Severity: normal
Frequency: rare
Description: wait until IO finished before start new when do lock cancel.
Details: VM protocol want old IO finished before start new, in this case need wait until PG_writeback is cleared until check dirty flag and call writepages in lock cancel callback.

Severity: normal
Frequency: rare
Description: mds_mfd_close() ASSERTION(rc == 0)
Details: In mds_mfd_close(), we need protect inode's writecount change within its orphan write semaphore to prevent possible races.

Severity: minor
Frequency: rare, on shutdown ost
Description: don't hit live lock with umount ost.
Details: shrink_dcache_parent can be in long loop with destroy dentries, use shrink_dcache_sb instead.

Severity: minor
Frequency: only when echo_client is used
Description: don't panic with use echo_client
Details: echo client pass NULL as client nid pointer and this produce NULL pointer dereference.

Severity: normal
Frequency: Always on 32-bit PowerPC systems
Description: fix build on PPC32
Details: compile code with -m64 flag produce wrong object file for PPC32.

Severity: normal
Frequency: rare
Description: MDS LBUG: ASSERTION(!IS_ERR(dchild))
Details: In reconstruct_* functions, LASSERTs on both the data supplied by a client, and the data on disk are dangerous and incorrect. Change them with client eviction.

Severity: enhancement
Description: skiplist implementation simplification
Details: skiplists are used to group compatible locks on granted list that was implemented as tracking first and last lock of each lock group the patch changes that to using doubly linked lists

Severity: normal
Description: delete compatibility for 32bit qdata
Details: as planned, when lustre is beyond b1_8, lquota won't support 32bit qunit. That means servers of b1_4 and servers of b1_8 can't be used together if users want to use quota.

Severity: normal
Frequency: only with administrator action
Description: mount failure if config log has invalid conf_param setting
Details: If administrator specified an incorrect configuration parameter with "lctl conf_param" this would cause an error during future client mounts. Instead, ignore the bad configuration parameter.

Severity: normal
Frequency: blocks per group < blocksize*8 and uninit_groups is enabled
Description: ldiskfs error: XXX blocks in bitmap, YYY in gd
Details: If blocks per group is less than blocksize*8, set rest of the bitmap to 1.

Severity: major
Frequency: Application do stride read on lustre
Description: The read performance will drop a lot if the application does stride read.
Details: Because the stride_start_offset are missing in stride read-ahead, it will cause clients read a lot of unused pages in read-ahead, then the read-performance drops.

Severity: normal
Description: more ldlm soft lockups
Details: In ldlm_resource_add_lock(), call to ldlm_resource_dump() starve other threads from the resource lock for a long time in case of long waiting queue, so change the debug level from D_OTHER to the less frequently used D_INFO.

Severity: enhancement
Description: add -gid, -group, -uid, -user options to lfs find

Severity: enhancement
Description: ll_recover_lost_found_objs - recover objects in lost+found
Details: OST corruption and subsequent e2fsck can leave objects in the lost+found directory. Using the "ll_recover_lost_found_objs" tool, these objects can be retrieved and data can be salvaged by using the object ID saved in the fid EA on each object.

Severity: minor
Frequency: rare
Description: this bug _only_ happens when inode quota limitation is very low (less than 12), so that inode quota unit is 1 at initialization.
Details: if remaining quota equates 1, it is a sign to demonstate that quota is effective now. So least quota qunit should be 2.

Severity: normal
Description: Hung threads in invalidate_inode_pages2_range
Details: The direct IO path doesn't call check_rpcs to submit a new RPC once one is completed. As a result, some RPCs are stuck in the queue and are never sent.

Severity: normal
Description: Procfs and llog threads access destoryed import sometimes.
Details: Sync the import destoryed process with procfs and llog threads by the import refcount and semaphore.

Severity: major
Description: mds fails to respond, threads stuck in ldlm_completion_ast
Details: Sort source/child resource pair after updating child resource.

Severity: major
Frequency: rare
Description: kernel BUG at ldiskfs2_ext_new_extent_cb
Details: If insertion of an extent fails, then discard the inode preallocation and free data blocks else it can lead to duplicate blocks.

Severity: normal
Description: don't always update ctime in ext3_xattr_set_handle()
Details: Current xattr code updates inode ctime in ext3_xattr_set_handle() In some cases the ctime should not be updated, for example for 2.0->1.8 compatibility it is necessary to delete an xattr and it should not update the ctime.

Severity: normal
Description: add quota statistics
Details: 1. sort out quota proc entries and proc code. 2. add quota statistics

Severity: normal
Frequency: often
Description: quotas are not honored with O_DIRECT
Details: all writes with the flag O_DIRECT will use grants which leads to this problem. Now using OBD_BRW_SYNC to guard this.

Severity: major
Frequency: rare
Description: Assertion in iopen_connect_dentry in 1.6.3
Details: looking up an inode via iopen with the wrong generation number can populate the dcache with a disconneced dentry while the inode number is in the process of being reallocated. This causes an assertion failure in iopen since the inode's dentry list contains both a connected and disconnected dentry.

Severity: normal
Description: assertion failure in ldlm_handle2lock()
Details: fix a race between class_handle_unhash() and class_handle2object() introduced in lustre 1.6.5 by bug 13622.

Severity: enhancement
Description: superblock lock contention with many SMP cores on one client
Details: several client filesystem locks were highly contended on SMP NUMA systems with 8 or more cores. Per-CPU datastructure and more efficient locking implemented to reduce contention.

Severity: minor
Frequency: rare
Description: Kernel BUG: sd_iostats_bump: unexpected disk index
Details: remove the limit of 256 scsi disks in the sd_iostat patch

Severity: minor
Frequency: rare
Description: oops in sd_iostats_seq_show()
Details: unloading/reloading the scsi low level driver triggers a kernel bug when trying to access the sd iostat file.

Severity: major
Frequency: rare
Description: Kernel panics during QLogic driver reload
Details: REQ_BLOCK_PC requests are not handled properly in the sd iostat patch, causing memory corruption.

Severity: minor
Frequency: rare
Description: journal_dev option does not work in b1_6
Details: pass mount option during pre-mount.

Severity: enhancement
Frequency:
Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs
Details: FIEMAP ioctl will allow an application to efficiently fetch the extent information of a file. It can be used to map logical blocks in a file to physical blocks in the block device.

Severity: normal
Frequency: only with adaptive timeout enabled
Description: DEBUG_REQ() bad paging request
Details: ptlrpc_at_recv_early_reply() should not modify req->rq_repmsg because it can be accessed by reply_in_callback() without the rq_lock held.

Severity: normal
Frequency: only on Cray X2
Description: X2 build failures
Details: fix build failures on Cray X2.

Severity: normal
Description: xid & resent requests
Details: Initialize RPC XID from clock at startup (randomly if clock is bad).

Severity: major
Description: quota recovery deadlock during mds failover
Details: This patch includes att18982, att18236, att18237 in bz14840. Solve the problems: 1. fix osts hang when mds does failover with quotaon 2. prevent watchdog storm when osts threads wait for the recovery of mds

Severity: normal
Description: kernel panic on racer
Details: Do not access dchild->d_inode when IS_ERR(dchild) is true.

Severity: enhancement
Description: Add lustre_start utility to start or stop multiple Lustre servers from a CSV file.

Severity: major
Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373}
Details: In case of memory pressure, list_del() can be called twice on req->rq_history_list, causing a kernel oops.

Severity: normal
Description: kptllnd_peer_check_sends()) ASSERTION(!in_interrupt()) failed
Details: fix stack overflow in the distributed lock manager by defering export eviction after a failed ast to the elt thread instead of handling it in the dlm interpret routine.

Severity: enhancement
Description: More exported tunables for mballoc
Details: Add support for tunable preallocation window and new tunables for large/small requests

Severity: normal
Description: Detect corruption of block bitmap and checking for preallocations
Details: Checks validity of on-disk block bitmap. Also it does better checking of number of applied preallocations. When corruption is found, it turns filesystem readonly to prevent further corruptions.

Severity: normal
Frequency: only for big-endian servers
Description: Check if big-endian system while mounting fs with extents feature
Details: Mounting a filesystem with extents feature will fail on big-endian systems since ext3-based ldiskfs is not supported on big-endian systems. Can be overridden with "bigendian_extents" mount option.

Severity: normal
Description: Excessive recovery window
Details: With AT enabled, the recovery window can be excessively long (6000+ seconds). To address this problem, we no longer use OBD_RECOVERY_FACTOR when extending the recovery window (the connect timeout no longer depends on the service time, it is set to INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time.

Severity: normal
Description: Watchdog triggered on MDS failover
Details: enable OBD_CONNECT_MDT flag when connecting from the MDS so that the OSTs know that the MDS "UUID" can be reused for the same export from a different NID, so we do not need to wait for the export to be evicted.

Severity: enhancement
Description: Don't sync journal after every i/o
Details: Implement write RPC replay to allow server replies for write RPCs before data is on disk. However, this feature is disabled by default since some issues leading to data corruptions have been found during recovery (e.g. bug 19128). This feature can be enabled by running the following command on the OSSs: lctl set_param obdfilter.*.sync_journal=0

Severity: low
Description: Slow reads beyond 8Tb offsets.
Details: Page index integer overflow in ll_read_ahead_page

Severity: major
Frequency: rare, only if using MMP with Linux RAID
Description: MMP doesn't work with Linux RAID
Details: While using HA for Lustre servers with Linux RAID, it is possible that MMP will not detect multiple mounts. To make this work we need to unplug the device queue in RAID when the MMP block is being written. Also while reading the MMP block, we should read it from disk and not the cached one.

Severity: minor
Frequency: rare, during recovery
Description: Assertion failure in ldlm_lock_put
Details: Do not put cancelled locks into replay list, hold references on locks in replay list

Severity: critical
Description: Lustre detected file system corruption with inode out of bounds
Details: don't update i_size on MDS_CLOSE for directories. This causes directory corruptions on the MDT.

Severity: normal
Description: client doesn't try to reconnect
Details: correctly skip time estimate if in recovery

Personal tools
Navigation