WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Difference between revisions of "Lustre Project List"
m (→List of Lustre Features and Projects: add bug for testing efficiency)
|Line 93:||Line 93:|
|<small>Mount a subdirectory of a filesystem from the client instead of the root.</small>
|<small>Mount a subdirectory of a filesystem from the client instead of the root.</small>
|Implement a distributed
|Implement a distributed mechanism
|MDS, OST, RPC
|MDS, OST, RPC
Revision as of 14:12, 5 October 2010
List of Lustre Features and Projects
Below is a list of Lustre features and projects that are just waiting for someone to start working on them. They are listed roughly in order of increasing complexity, but this is highly dependent upon the coding skills of the developer and their familiarity with the Lustre code base.
After you have chosen a project, or if you are having trouble deciding what to work on, please contact the lustre-devel mailing list to discuss your project with the Lustre developers. That will ensure that the work you are doing is in line with other plans/projects for Lustre and also to ensure that nobody else is working on the same thing.
|Feature||Complexity||Required skills||Tracking Bug||Brief Description|
|ioctl() number cleanups||1||kernel||20731||Clean up Linux IOC numbering to properly use "size" field so that mixed 32- and 64-bit kernel/userspace ioctls work correctly. Attention needs to be paid to maintaining userspace compatibility for a number of releases, so the old ioctl() numbers cannot simply be removed.|
|Over 2TB objects||3||RPC, OST||20128||Support objects larger than 2TB in size. Currently the client assumes that the largest possible object size is 2TB, but this limit should be returned from the OST at connect time.|
|Improve testing Efficiency||3||shell, test||23051||Improve the performance, efficiency, and coverage of the acceptance-small.sh test scripts. As a basic step, printing the duration of each test script in the acceptance-small.sh test summary would tell us where the testing time is being spent.
More advanced work includes improved test scheduling, dynamic cluster configuration to allow more efficient utilization of available test nodes. Virtual machines could be used for functional tests instead of real nodes.
|Config save/edit/restore||3||MGS, llog, config||17094||Need to be able to backup/edit/restore the client/MDS/OSS config llog files after a writeconf. One reason is for config recovery if the config llog becomes corrupted. Another reason is that all of the filesystem tunable parameters (including all of the OST pool definitions) are stored in the config llog and are lost if a writeconf is done. Being able to dump the config log to a plain text file, edit it, and then restore it would make administration considerably easier.|
|kernel patch removal||3||MDS, OST||21524||Remove Lustre kernel patches to allow Lustre servers to be more easily ported to new kernels, and to be built against vendor kernels without changing the vendor kernel RPMs. There are a number of different patches, each one needs to use equivalent functionality which already exists in the kernel, or work to get the patch accepted upstream. See also ldiskfs patch removal|
|mdd-survey tools for performance analysis||3||obdfilter-survey, mdd, benchmarking||21658||Add a low-level metadata unit test to allow measuring performance of the metadata stack without having connected clients, similar and/or integrated to the obdfilter survey (echo client, echo server).|
|fallocate() API||3||VFS, OST||15064||Add client interface and RPC to allow space reservation for objects on OSTs; sys_fallocate() exists on clients since RHEL5.4 and in ext4-based ldiskfs.|
|Allow 100k open files on a single client||4||client, MDS||Allow 100k open files per client. Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O(n) behaviour when adding new RPCs to the RPCs-for-recovery list on the client. Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O(n) behaviour when searching for an open file handle. For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.|
|Error message improvements||4||core, operations||Review and improve the Lustre error messages to be more useful. A larger project is to change the core Lustre error message handling to generate better structured error messages so that they can be parsed/managed more easily.|
|Client under memory pressure||4||client, VFS, MM||Fix client to work well under memory pressure, to avoid deadlocks during allocation and be able to continue processing RPCs, reduce caches, free memory. This is a prerequisite for swap-on-Lustre.|
|Large Readdir RPCs||4||MDS, RPC||17833||Read directory pages in large chunks instead of the current page-at-a-time reads from the client. This will improve readdir performance somewhat, and reduce load on the MDS. It is expected to be significant over WAN high-latency links.|
|Over 16TB ldiskfs filesystems||4||ldiskfs, obdfilter||20063||Single OST sizes larger than 16TB. This is largely supported in newer ext4 filesystems (e.g. RHEL5.4, RHEL6), but thorough testing and some bug fixing work may be needed in obdfilter (1.8, 2.0) or OFD (2.x), and other work may be needed in client (all versions).|
|Client subdirectory mounts||4||VFS, MDS||15276||Mount a subdirectory of a filesystem from the client instead of the root.|
|Implement a distributed snapshot mechanism||5||MDS, OST, RPC||14124||Implement distributed snapshot mechanism; initially with only loosely synchronized operations (possibly ordered between MDS and OSS), or blocking whole fileystem while consistent snapshot is created. After the snapshot has been created, modify the fsname of the MDT and OSTs so that it can be mounted separately.|
|Improve QOS Round-Robin object allocator||5||MDS, LOV||18547||Improve LOV QOS allocator to always do weighted round-robin allocation, instead of degrading into weighted random allocations once the OST free space becomes imbalanced. This evens out allocations continuously, avoids crazy/bad OST allocation imbalances when QOS becomes active, and allows adding weighting for things like current load, OST RAID rebuild, etc.|
|ldiskfs patch cleanup||5||ext4, OST, MDT||21635||A number of the ldiskfs patches should be cleaned up, or possibly removed entirely so that ongoing patch updates against new kernels is simplified.|
|All RPCs pass a lock handle||5||DLM, RPC||22849||For protocol correctness, and improved performance, it would be desirable for all RPCs that are done with a client lock held to send the lock handle along with the request. For OST requests this means all read, write, truncate operations (unless "lockless") should include a lock handle. This allows the OST to validate the request is being done by a client that holds the correct locks, and allows lockh->lock->object lookups to avoid OI or inode lookups in most cases.|
|Readdir Object Statahead||5||VFS, DLM||18526||Enhancement of current statahead to do object glimpse asynchronously once inode stathead has returned layout information. Preferred solution is readdir+ or SOM, but this could help in the short term, and would still be useful for open files and does not affect the network protocol so could be removed when those features are available. Could potentially be extended to do object readahead instead of simply a size glimpse if a lookup-stat-read pattern was detected.|
|Imperative recovery||6||recovery, RPC||18767||Reduce recovery time by having the server notify clients after recovery has completed instead of waiting for the client to timeout the RPC before it begins recovery.|
|Simplified Interoperability||6||RPC, VFS||18496||Clean up client state before server upgrade to minimize or eliminate the need to have message format interoperability. The client only needs to track open files, and all other state (locks, cached pages, etc) can be dropped and re-fetched as needed from the server. Change client recovery to re-open files from open file handles instead of from saved RPCs.|
|Enhanced OST Pools Support||6||MDS, LOV||Improve OST pools support to allow mandatory OST enforcement (i.e. only allow specific users to access certain pools, including the default "all OSTs" pool), more complex policy specification (e.g. select a fallback pool on ENOSPC). Allow default initial file placement policies (e.g., server pool, stripe width) to be defined based on cluster membership (NID, UID, GID).|
|Replay Signatures||6||RPC, recovery||18547||Allow MDS/OSS to determine if client can legitimately replay an RPC, by digitally signing it at processing time and verifying the signature at replay time.|
|Network Request Scheduler (NRS)||6||RPC, OST, benchmarking||13634||Order IO (and possibly metadata) requests by client, file offset, priority, etc in order to improve overall back-end efficiency and/or provide QOS to clients. Dynamically change the number of RPCs in flight for each client to balance the RPC traffic at the server. Previous research done by Sun shows this can significantly improve overall performance.|
|Lustre Block Device||6||VFS, LOV||5498||Lustre object lloop driver exports block device to userspace, bypassing filesystem. Code partly works and is part of 1.6.4+, but has correctness issues and potential performance problems. It needs to be ported to newer kernels.|
|Client PAGE_SIZE < server PAGE_SIZE||6||RPC, LNET||686||Support smaller page sizes on client than server. Applies to exotic server HW like PPC/ia64/SPARC.|
|Swap on Lustre||7||VFS, VM||5498||Depends on the Lustre block device. Has problems when working under memory pressure, which makes it mostly useless until those problems are fixed.|
|Directory readdir+||7||VFS, MDS||17845||Bulk metadata readdir/stat interface to speed up "ls -l" operations. Send back requested inode attributes for all directory entries as part of the extended dirent data. Integrate with any proposed API for this on the client. Needs Large Readdir RPCs to be efficient over the wire, since more data will be returned for every entry.|
|OST Space Management (Basic)||7||HSM, MDS, LOV||13107||Simple migration capability - transparently migrate objects/files between OSTs (blocking application writes, or aborting migration during contention); evacuate OSTs and move file data to other OSTs; add new OST and balance data on it. The OST doesn't really need to understand this, only the MDS (for LOV EA rewrite) and client (LOV EA rewrite). The HSM project implements layout lock support and policy engine for automatic space management. An ioctl that allows transparently changing an MDS inode to point to the migrated object(s) instead of the original object(s) and then scheduling the old object(s) for destruction.|
|Small file IO aggregation||7||CLIO, OST||944||Small file IO aggregation (multi-object RPCs), most likely for writes first, and possibly later for reads in conjunction with statahead.|
|Version Based Recovery for delayed clients||8||recovery||10609||Complete VBR implementation to handle delayed client recovery/reconnection. Needed for disconnected network operation, better fault tolerance.|
|Client-side data encryption||9||VM, security||5286||Encrypt files and directories (or possibly just filenames) on the client before sending to the server. This avoids sending unencrypted data over the network, or ever having the data in plaintext on the server (in case of separate decryption from network, encryption on disk).|
|Ptlrpc layer rewrite||9||recovery, RPC||5286||Rewrite the Lustre RPC code to clean up the code and simplify RPC handling.|
|local object zero-copy IO||9||VFS, DLM, OST||Efficient data IO between a client and a local OST object; optimization to support local clients. Likely implemented as a fast-path connection between the OSC and the local OFD/OSD. Read cache should be kept on the OSD instead of at the client VFS level, so that the cache can be shared among all users of this OST.|