WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Lustre Project List

From Obsolete Lustre Wiki
Revision as of 14:23, 7 December 2010 by Adilger (talk | contribs) (→‎List of Lustre Features and Projects: add WIP status to a few projects)
Jump to navigationJump to search

List of Lustre Features and Projects

Below is a list of Lustre features and projects that are just waiting for someone to start working on them. They are listed roughly in order of increasing complexity, but this is highly dependent upon the coding skills of the developer and their familiarity with the Lustre code base.

After you have chosen a project, or if you are having trouble deciding what to work on, please contact the lustre-devel mailing list to discuss your project with the Lustre developers. That will ensure that the work you are doing is in line with other plans/projects for Lustre and also to ensure that nobody else is working on the same thing.

Feature Complexity Required skills Tracking Bug Brief Description
ioctl() number cleanups 1 kernel 20731 Clean up Linux IOC numbering to properly use "size" field so that mixed 32- and 64-bit kernel/userspace ioctls work correctly. Attention needs to be paid to maintaining userspace compatibility for a number of releases, so the old ioctl() numbers cannot simply be removed.
Over 2TB objects 3 RPC, OST 20128 (work in progress) Support objects larger than 2TB in size. Currently the client assumes that the largest possible object size is 2TB, but this limit should be returned from the OST at connect time.
Improve testing Efficiency 3 shell, test 23051 Improve the performance, efficiency, and coverage of the acceptance-small.sh test scripts. As a basic step, printing the duration of each test script in the acceptance-small.sh test summary would tell us where the testing time is being spent.

More advanced work includes improved test scheduling, dynamic cluster configuration to allow more efficient utilization of available test nodes. Virtual machines could be used for functional tests instead of real nodes.

Config save/edit/restore 3 MGS, llog, config 17094 Need to be able to backup/edit/restore the client/MDS/OSS config llog files after a writeconf. One reason is for config recovery if the config llog becomes corrupted. Another reason is that all of the filesystem tunable parameters (including all of the OST pool definitions) are stored in the config llog and are lost if a writeconf is done. Being able to dump the config log to a plain text file, edit it, and then restore it would make administration considerably easier.
kernel patch removal 3 MDS, OST 21524 Remove Lustre kernel patches to allow Lustre servers to be more easily ported to new kernels, and to be built against vendor kernels without changing the vendor kernel RPMs. There are a number of different patches, each one needs to use equivalent functionality which already exists in the kernel, or work to get the patch accepted upstream. See also ldiskfs patch removal
mdd-survey tools for performance analysis 3 obdfilter-survey, mdd, benchmarking 21658 Add a low-level metadata unit test to allow measuring performance of the metadata stack without having connected clients, similar and/or integrated to the obdfilter survey (echo client, echo server).
fallocate() API 3 VFS, OST 15064 Add client interface and RPC to allow space reservation for objects on OSTs; sys_fallocate() exists on clients since RHEL5.4 and in ext4-based ldiskfs.
Allow 100k open files on a single client 4 client, MDS 24217 (work in progress) Allow 100k open files per client. Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O(n) behaviour when adding new RPCs to the RPCs-for-recovery list on the client. Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O(n) behaviour when searching for an open file handle. For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.
Error message improvements 4 core, operations Review and improve the Lustre error messages to be more useful. A larger project is to change the core Lustre error message handling to generate better structured error messages so that they can be parsed/managed more easily.
Client under memory pressure 4 client, VFS, MM Fix client to work well under memory pressure, to avoid deadlocks during allocation and be able to continue processing RPCs, reduce caches, free memory. This is a prerequisite for swap-on-Lustre.
Large Readdir RPCs 4 MDS, RPC 17833 (work in progress) Read directory pages in large chunks instead of the current page-at-a-time reads from the client. This will improve readdir performance somewhat, and reduce load on the MDS. It is expected to be significant over WAN high-latency links.
Finish large EA handling for ldiskfs 4 ldiskfs 24268 Finish off the large EA handling in ldiskfs, and get this code accepted upstream.
Over 16TB ldiskfs filesystems 4 ldiskfs, obdfilter 20063 Single OST sizes larger than 16TB. This is largely supported in newer ext4 filesystems (e.g. RHEL5.4, RHEL6), but thorough testing and some bug fixing work may be needed in obdfilter (1.8, 2.0) or OFD (2.x), and other work may be needed in client (all versions).
Client subdirectory mounts 4 VFS, MDS 15276 Mount a subdirectory of a filesystem from the client instead of the root.
Online OST replacement 4 OST 24128 (work in progress) Allow a new OST to replace a previous OST at the same index, in case of hardware replacement or unrecoverable filesystem corruption.
Implement a distributed snapshot mechanism 5 MDS, OST, RPC 14124 Implement distributed snapshot mechanism; initially with only loosely synchronized operations (possibly ordered between MDS and OSS), or blocking whole fileystem while consistent snapshot is created. After the snapshot has been created, modify the fsname of the MDT and OSTs so that it can be mounted separately.
Improve QOS Round-Robin object allocator 5 MDS, LOV 18547 Improve LOV QOS allocator to always do weighted round-robin allocation, instead of degrading into weighted random allocations once the OST free space becomes imbalanced. This evens out allocations continuously, avoids crazy/bad OST allocation imbalances when QOS becomes active, and allows adding weighting for things like current load, OST RAID rebuild, etc.
ldiskfs patch cleanup 5 ext4, OST, MDT 21635 A number of the ldiskfs patches should be cleaned up, or possibly removed entirely so that ongoing patch updates against new kernels is simplified.
All RPCs pass a lock handle 5 DLM, RPC 22849 For protocol correctness, and improved performance, it would be desirable for all RPCs that are done with a client lock held to send the lock handle along with the request. For OST requests this means all read, write, truncate operations (unless "lockless") should include a lock handle. This allows the OST to validate the request is being done by a client that holds the correct locks, and allows lockh->lock->object lookups to avoid OI or inode lookups in most cases.
Readdir Object Statahead 5 VFS, DLM 18526 Enhancement of current statahead to do object glimpse asynchronously once inode stathead has returned layout information. Preferred solution is readdir+ or SOM, but this could help in the short term, and would still be useful for open files and does not affect the network protocol so could be removed when those features are available. Could potentially be extended to do object readahead instead of simply a size glimpse if a lookup-stat-read pattern was detected.
Imperative recovery 6 recovery, RPC 18767 Reduce recovery time by having the server notify clients after recovery has completed instead of waiting for the client to timeout the RPC before it begins recovery.
Simplified Interoperability 6 RPC, VFS 18496 Clean up client state before server upgrade to minimize or eliminate the need to have message format interoperability. The client only needs to track open files, and all other state (locks, cached pages, etc) can be dropped and re-fetched as needed from the server. Change client recovery to re-open files from open file handles instead of from saved RPCs.
Enhanced OST Pools Support 6 MDS, LOV Improve OST pools support to allow mandatory OST enforcement (i.e. ACLs to only allow specific users to access certain pools, including the default "all OSTs" pool), more complex policy specification (e.g. select a fallback pool on ENOSPC). Allow default initial file placement policies (e.g., server pool, stripe width) to be defined based on cluster membership (NID, UID, GID) and file parameters (name, extension, etc).
Replay Signatures 6 RPC, recovery 18547 Allow MDS/OSS to determine if client can legitimately replay an RPC, by digitally signing it at processing time and verifying the signature at replay time.
Network Request Scheduler (NRS) 6 RPC, OST, benchmarking 13634 Order IO (and possibly metadata) requests by client, file offset, priority, etc in order to improve overall back-end efficiency and/or provide QOS to clients. Dynamically change the number of RPCs in flight for each client to balance the RPC traffic at the server. Previous research done by Sun shows this can significantly improve overall performance.
Lustre Block Device 6 VFS, LOV 5498 Lustre object lloop driver exports block device to userspace, bypassing filesystem. Code partly works and is part of 1.6.4+, but has correctness issues and potential performance problems. It needs to be ported to newer kernels.
Client PAGE_SIZE < server PAGE_SIZE 6 RPC, LNET 686 Support smaller page sizes on client than server. Applies to exotic server HW like PPC/ia64/SPARC.
Swap on Lustre 7 VFS, VM 5498 Depends on the Lustre block device. Has problems when working under memory pressure, which makes it mostly useless until those problems are fixed.
Directory readdir+ 7 VFS, MDS 17845 Bulk metadata readdir/stat interface to speed up "ls -l" operations. Send back requested inode attributes for all directory entries as part of the extended dirent data. Integrate with any proposed API for this on the client. Needs Large Readdir RPCs to be efficient over the wire, since more data will be returned for every entry.
OST Space Management (Basic) 7 HSM, MDS, LOV 13107 Simple migration capability - transparently migrate objects/files between OSTs (blocking application writes, or aborting migration during contention); evacuate OSTs and move file data to other OSTs; add new OST and balance data on it. The OST doesn't really need to understand this, only the MDS (for LOV EA rewrite) and client (LOV EA rewrite). The HSM project implements layout lock support and policy engine for automatic space management. An ioctl that allows transparently changing an MDS inode to point to the migrated object(s) instead of the original object(s) and then scheduling the old object(s) for destruction.
Small file IO aggregation 7 CLIO, OST 944 Small file IO aggregation (multi-object RPCs), most likely for writes first, and possibly later for reads in conjunction with statahead.
Version Based Recovery for delayed clients 8 recovery 10609 Complete VBR implementation to handle delayed client recovery/reconnection. Needed for disconnected network operation, better fault tolerance.
Client-side data encryption 9 VM, security 5286 Encrypt files and directories (or possibly just filenames) on the client before sending to the server. This avoids sending unencrypted data over the network, or ever having the data in plaintext on the server (in case of separate decryption from network, encryption on disk).
Ptlrpc layer rewrite 9 recovery, RPC 5286 Rewrite the Lustre RPC code to clean up the code and simplify RPC handling.
local object zero-copy IO 9 VFS, DLM, OST Efficient data IO between a client and a local OST object; optimization to support local clients. Likely implemented as a fast-path connection between the OSC and the local OFD/OSD. Read cache should be kept on the OSD instead of at the client VFS level, so that the cache can be shared among all users of this OST.