WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Difference between revisions of "Lustre Project List"

From Obsolete Lustre Wiki
Jump to navigationJump to search
Line 23: Line 23:
 
|shell, test
 
|shell, test
 
|
 
|
|<small>Improve the performance, efficiency, and coverage of the acceptance-small.sh test scripts. More advanced work includes improved test scheduling.  Dynamic cluster configuration.  Virtual machines for functional tests.</small>
+
|<small>Improve the performance, efficiency, and coverage of the acceptance-small.sh test scripts.   As a basic step, printing the duration of each test script in the acceptance-small.sh test summary would tell us where the testing time is being spent.
 +
 
 +
More advanced work includes improved test scheduling, dynamic cluster configuration to allow more efficient utilization of available test nodes.  Virtual machines could be used for functional tests instead of real nodes.</small>
 
|-
 
|-
 
|Config save/edit/restore
 
|Config save/edit/restore
Line 29: Line 31:
 
| MGS, llog, config
 
| MGS, llog, config
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=17094 17094]
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=17094 17094]
|<small>Need to be able to backup/edit/restore the client/MDS/OSS config llog files after a writeconf.  One reason is for config recovery if the config llog becomes corrupted.  Another reason is that all of the filesystem tunable parameters (including all of the OST pool definitions) are stored in the config llog and are lost if a writeconf is done.</small>
+
|<small>Need to be able to backup/edit/restore the client/MDS/OSS config llog files after a writeconf.  One reason is for config recovery if the config llog becomes corrupted.  Another reason is that all of the filesystem tunable parameters (including all of the OST pool definitions) are stored in the config llog and are lost if a writeconf is done.  Being able to dump the config log to a plain text file, edit it, and then restore it would make administration considerably easier.</small>
 
|-
 
|-
 
|mdd-survey tools for performance analysis
 
|mdd-survey tools for performance analysis
Line 35: Line 37:
 
|obdfilter-survey, mdd, benchmarking
 
|obdfilter-survey, mdd, benchmarking
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=21658 21658]
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=21658 21658]
|<small>Add a low-level metadata unit test to allow measuring performance of the metadata stack without having connected clients, similar and/or integrated to the obdfilter survey (echo client, echo server)</small>
+
|<small>Add a low-level metadata unit test to allow measuring performance of the metadata stack without having connected clients, similar and/or integrated to the obdfilter survey (echo client, echo server).</small>
 
|-
 
|-
|Readdir with large requests
+
|Allow 100k open files on a single client
 +
|4
 +
|client, MDS
 +
|
 +
|<small>Allow 100k open files per client.  Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O(n) behaviour when adding new RPCs to the RPCs-for-recovery list on the client.  Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O(n) behaviour when searching for an open file handle.  For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.</small>
 +
|-
 +
|Error message improvements
 +
|4
 +
|core, operations
 +
|
 +
|<small>Review and improve the Lustre error messages to be more useful.  A larger project is to change the core Lustre error message handling to generate better structured error messages so that they can be parsed/managed more easily.</small>
 +
|-
 +
|Client under memory pressure
 +
|4
 +
|client, VFS, MM
 +
|
 +
|<small>Fix client to work well under memory pressure, to avoid deadlocks during allocation and be able to continue processing RPCs, reduce caches, free memory.  This is a prerequisite for swap-on-Lustre.</small>
 +
|-
 +
|Readdir with large read RPCs
 
|4
 
|4
 
|MDS, RPC
 
|MDS, RPC
Line 44: Line 64:
 
|-
 
|-
 
|32TB ldiskfs filesystems
 
|32TB ldiskfs filesystems
|5
+
|4
 
|ldiskfs, obdfilter
 
|ldiskfs, obdfilter
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=20063 20063]
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=20063 20063]
|<small>Single OST sizes larger than 16TB.  This is largely supported in newer ext4 filesystems (e.g. RHEL5.4, RHEL6), but thorough testing and some bug fixing work may be needed in obdfilter (1.8, 2.0) or OFD (2.x), and other work may be needed in client (all versions).
+
|<small>Single OST sizes larger than 16TB.  This is largely supported in newer ext4 filesystems (e.g. RHEL5.4, RHEL6), but thorough testing and some bug fixing work may be needed in obdfilter (1.8, 2.0) or OFD (2.x), and other work may be needed in client (all versions).</small>
</small>
+
|-
 +
|Client subdirectory mounts
 +
|4
 +
|VFS, MDS
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=15267 15276]
 +
|<small>Mount a subdirectory of a filesystem from the client instead of the root.</small>
 +
|-
 +
|Implement a distributed checksum mechanism
 +
|5
 +
|MDS, OST, RPC
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=14124 14124]
 +
|<small>Implement distributed snapshot mechanism; initially with only loosely synchronized operations (possibly ordered between MDS and OSS), or blocking whole fileystem while consistent snapshot is created.  After the snapshot has been created, modify the fsname of the MDT and OSTs so that it can be mounted separately.</small>
 +
|-
 +
|Improve QOS Round-Robin object allocator
 +
|5
 +
|MDS, LOV
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=18547 18547]
 +
|<small>Improve LOV QOS allocator to always do weighted round-robin allocation, instead of degrading into weighted random allocations once the OST free space becomes imbalanced.  This evens out allocations continuously, avoids crazy/bad OST allocation imbalances when QOS becomes active, and allows adding weighting for things like current load, OST RAID rebuild, etc.</small>
 
|-
 
|-
 
|All RPCs pass a lock handle
 
|All RPCs pass a lock handle
Line 54: Line 91:
 
|DLM, RPC
 
|DLM, RPC
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=22849 22849]
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=22849 22849]
|<small>For protocol correctness, and improved performance, it would be desirable for all RPCs that are done with a client lock held to send the lock handle along with the request.  For OST requests this means all read, write, truncate operations (unless "lockless") should include a lock handle.  This allows the OST to validate the request is being done by a client that holds the correct locks, and allows lockh->lock->object lookups to avoid OI lookups in most cases.</small>
+
|<small>For protocol correctness, and improved performance, it would be desirable for all RPCs that are done with a client lock held to send the lock handle along with the request.  For OST requests this means all read, write, truncate operations (unless "lockless") should include a lock handle.  This allows the OST to validate the request is being done by a client that holds the correct locks, and allows lockh->lock->object lookups to avoid OI or inode lookups in most cases.</small>
 
|-
 
|-
|OST Space Management (Basic)
+
|Readdir Object Statahead
|6
+
|5
|HSM, layout
+
|VFS, DLM
|[https://bugzilla.lustre.org/show_bug.cgi?id=13107 13107]
+
|[https://bugzilla.lustre.org/show_bug.cgi?id=18526 18526]
|<small>Simple migration capability - transparently migrate objects/files between OSTs (blocking application writes, or aborting migration during contention); evacuate OSTs and move file data to other OSTs; add new OST and balance data on itThe OST doesn't really need to understand this, only the MDS (for LOV EA rewrite) and client (LOV EA rewrite); HSM project implements layout lock support and policy engine for automatic space management.</small>
+
|<small>Enhancement of current statahead to do object glimpse asynchronously once inode stathead has returned layout informationPreferred solution is readdir+ or SOM, but this could help in the short term, and would still be useful for open files and does not affect the network protocol so could be removed when those features are available.</small>
 
|-
 
|-
 
|Imperative recovery
 
|Imperative recovery
Line 67: Line 104:
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=18767 18767]
 
|[https://bugzilla.lustre.org/show_bug.cgi?id=18767 18767]
 
|<small>Reduce recovery time by having the server notify clients after recovery has completed instead of waiting for the client to timeout the RPC before it begins recovery.</small>
 
|<small>Reduce recovery time by having the server notify clients after recovery has completed instead of waiting for the client to timeout the RPC before it begins recovery.</small>
 +
|-
 +
|Simplified Interoperability
 +
|6
 +
|RPC, VFS
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=18496 18496]
 +
|<small>Clean up client state before server upgrade to minimize or eliminate the need to have message format interoperability.  The client only needs to track open files, and all other state (locks, cached pages, etc) can be dropped and re-fetched as needed from the server.  Change client recovery to re-open files from open file handles instead of from saved RPCs.</small>
 +
|-
 +
|Enhanced OST Pools Support
 +
|6
 +
|MDS, LOV
 +
|
 +
|<small>Improve OST pools support to allow mandatory OST enforcement (i.e. only allow specific users to access certain pools, including the default "all OSTs" pool), more complex policy specification (e.g. select a fallback pool on ENOSPC).  Allow default initial file placement policies (e.g., server pool, stripe width) to be defined based on cluster membership (NID, UID, GID).</small>
 +
|-
 +
|Replay Signatures
 +
|6
 +
|RPC, recovery
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=18547 18547]
 +
|<small>Allow MDS/OSS to determine if client can legitimately replay an RPC, by digitally signing it at processing time and verifying the signature at replay time.</small>
 +
|-
 +
|Network Request Scheduler (NRS)
 +
|6
 +
|RPC, OST, benchmarking
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=13634 13634]
 +
|<small>Order IO (and possibly metadata) requests by client, file offset, priority, etc in order to improve overall back-end efficiency and/or provide QOS to clients.  Dynamically change the number of RPCs in flight for each client to balance the RPC traffic at the server.  Previous research done by Sun shows this can significantly improve overall performance.</small>
 +
|-
 +
|Lustre Block Device
 +
|6
 +
|VFS, LOV
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=5498 5498]
 +
|<small>Lustre object lloop driver exports block device to userspace, bypassing filesystem. Code partly works and is part of 1.6.4+, but has correctness issues and potential performance problems.  It needs to be ported to newer kernels.</small>
 +
|-
 +
|Swap on Lustre
 +
|7
 +
|VFS, VM
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=5498 5498]
 +
|<small>Depends on the Lustre block device. Has problems when working under memory pressure, which makes it mostly useless until those problems are fixed.</small>
 +
|-
 +
|OST Space Management (Basic)
 +
|7
 +
|HSM, MDS, LOV
 +
|[https://bugzilla.lustre.org/show_bug.cgi?id=13107 13107]
 +
|<small>Simple migration capability - transparently migrate objects/files between OSTs (blocking application writes, or aborting migration during contention); evacuate OSTs and move file data to other OSTs; add new OST and balance data on it.  The OST doesn't really need to understand this, only the MDS (for LOV EA rewrite) and client (LOV EA rewrite). The HSM project implements layout lock support and policy engine for automatic space management.  An ioctl that allows transparently changing an MDS inode to point to the migrated object(s) instead of the original object(s) and then scheduling the old object(s) for destruction.</small>
 
|}
 
|}

Revision as of 14:36, 9 September 2010

List of Lustre Features and Projects

Below is a list of Lustre features and projects that are just waiting for someone to start working on them. They are listed roughly in order of increasing complexity, but this is highly dependent upon the coding skills of the developer and their familiarity with the Lustre code base.

After you have chosen a project, or if you are having trouble deciding what to work on, please contact the lustre-devel mailing list to discuss your project with the Lustre developers. That will ensure that the work you are doing is in line with other plans/projects for Lustre and also to ensure that nobody else is working on the same thing.

Feature Complexity Required skills Tracking Bug Brief Description
ioctl() number cleanups 1 kernel 20731 Clean up Linux IOC numbering to properly use "size" field so that mixed 32- and 64-bit kernel/userspace ioctls work correctly. Attention needs to be paid to maintaining userspace compatibility for a number of releases, so the old ioctl() numbers cannot simply be removed.
Improve testing Efficiency 3 shell, test Improve the performance, efficiency, and coverage of the acceptance-small.sh test scripts. As a basic step, printing the duration of each test script in the acceptance-small.sh test summary would tell us where the testing time is being spent.

More advanced work includes improved test scheduling, dynamic cluster configuration to allow more efficient utilization of available test nodes. Virtual machines could be used for functional tests instead of real nodes.

Config save/edit/restore 3 MGS, llog, config 17094 Need to be able to backup/edit/restore the client/MDS/OSS config llog files after a writeconf. One reason is for config recovery if the config llog becomes corrupted. Another reason is that all of the filesystem tunable parameters (including all of the OST pool definitions) are stored in the config llog and are lost if a writeconf is done. Being able to dump the config log to a plain text file, edit it, and then restore it would make administration considerably easier.
mdd-survey tools for performance analysis 3 obdfilter-survey, mdd, benchmarking 21658 Add a low-level metadata unit test to allow measuring performance of the metadata stack without having connected clients, similar and/or integrated to the obdfilter survey (echo client, echo server).
Allow 100k open files on a single client 4 client, MDS Allow 100k open files per client. Fix client to not store committed open RPCs in the resend list but instead reopen files from the file handles upon recovery (see Simplified Interop) to avoid O(n) behaviour when adding new RPCs to the RPCs-for-recovery list on the client. Fix MDS to store "mfd" in a hash table instead of a linked list to avoid O(n) behaviour when searching for an open file handle. For debugging it would be useful to have a /proc entry on the MDS showing the open FIDs for each client export.
Error message improvements 4 core, operations Review and improve the Lustre error messages to be more useful. A larger project is to change the core Lustre error message handling to generate better structured error messages so that they can be parsed/managed more easily.
Client under memory pressure 4 client, VFS, MM Fix client to work well under memory pressure, to avoid deadlocks during allocation and be able to continue processing RPCs, reduce caches, free memory. This is a prerequisite for swap-on-Lustre.
Readdir with large read RPCs 4 MDS, RPC 17833 Read directory pages in large chunks instead of the current page-at-a-time reads from the client. This will improve readdir performance somewhat, and reduce load on the MDS. It is expected to be significant over WAN high-latency links.
32TB ldiskfs filesystems 4 ldiskfs, obdfilter 20063 Single OST sizes larger than 16TB. This is largely supported in newer ext4 filesystems (e.g. RHEL5.4, RHEL6), but thorough testing and some bug fixing work may be needed in obdfilter (1.8, 2.0) or OFD (2.x), and other work may be needed in client (all versions).
Client subdirectory mounts 4 VFS, MDS 15276 Mount a subdirectory of a filesystem from the client instead of the root.
Implement a distributed checksum mechanism 5 MDS, OST, RPC 14124 Implement distributed snapshot mechanism; initially with only loosely synchronized operations (possibly ordered between MDS and OSS), or blocking whole fileystem while consistent snapshot is created. After the snapshot has been created, modify the fsname of the MDT and OSTs so that it can be mounted separately.
Improve QOS Round-Robin object allocator 5 MDS, LOV 18547 Improve LOV QOS allocator to always do weighted round-robin allocation, instead of degrading into weighted random allocations once the OST free space becomes imbalanced. This evens out allocations continuously, avoids crazy/bad OST allocation imbalances when QOS becomes active, and allows adding weighting for things like current load, OST RAID rebuild, etc.
All RPCs pass a lock handle 5 DLM, RPC 22849 For protocol correctness, and improved performance, it would be desirable for all RPCs that are done with a client lock held to send the lock handle along with the request. For OST requests this means all read, write, truncate operations (unless "lockless") should include a lock handle. This allows the OST to validate the request is being done by a client that holds the correct locks, and allows lockh->lock->object lookups to avoid OI or inode lookups in most cases.
Readdir Object Statahead 5 VFS, DLM 18526 Enhancement of current statahead to do object glimpse asynchronously once inode stathead has returned layout information. Preferred solution is readdir+ or SOM, but this could help in the short term, and would still be useful for open files and does not affect the network protocol so could be removed when those features are available.
Imperative recovery 6 recovery, RPC 18767 Reduce recovery time by having the server notify clients after recovery has completed instead of waiting for the client to timeout the RPC before it begins recovery.
Simplified Interoperability 6 RPC, VFS 18496 Clean up client state before server upgrade to minimize or eliminate the need to have message format interoperability. The client only needs to track open files, and all other state (locks, cached pages, etc) can be dropped and re-fetched as needed from the server. Change client recovery to re-open files from open file handles instead of from saved RPCs.
Enhanced OST Pools Support 6 MDS, LOV Improve OST pools support to allow mandatory OST enforcement (i.e. only allow specific users to access certain pools, including the default "all OSTs" pool), more complex policy specification (e.g. select a fallback pool on ENOSPC). Allow default initial file placement policies (e.g., server pool, stripe width) to be defined based on cluster membership (NID, UID, GID).
Replay Signatures 6 RPC, recovery 18547 Allow MDS/OSS to determine if client can legitimately replay an RPC, by digitally signing it at processing time and verifying the signature at replay time.
Network Request Scheduler (NRS) 6 RPC, OST, benchmarking 13634 Order IO (and possibly metadata) requests by client, file offset, priority, etc in order to improve overall back-end efficiency and/or provide QOS to clients. Dynamically change the number of RPCs in flight for each client to balance the RPC traffic at the server. Previous research done by Sun shows this can significantly improve overall performance.
Lustre Block Device 6 VFS, LOV 5498 Lustre object lloop driver exports block device to userspace, bypassing filesystem. Code partly works and is part of 1.6.4+, but has correctness issues and potential performance problems. It needs to be ported to newer kernels.
Swap on Lustre 7 VFS, VM 5498 Depends on the Lustre block device. Has problems when working under memory pressure, which makes it mostly useless until those problems are fixed.
OST Space Management (Basic) 7 HSM, MDS, LOV 13107 Simple migration capability - transparently migrate objects/files between OSTs (blocking application writes, or aborting migration during contention); evacuate OSTs and move file data to other OSTs; add new OST and balance data on it. The OST doesn't really need to understand this, only the MDS (for LOV EA rewrite) and client (LOV EA rewrite). The HSM project implements layout lock support and policy engine for automatic space management. An ioctl that allows transparently changing an MDS inode to point to the migrated object(s) instead of the original object(s) and then scheduling the old object(s) for destruction.