WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - ZFS large dnodes

From Obsolete Lustre Wiki
Jump to navigationJump to search

Definitions

dnode
DMU node, 512 bytes in original ZFS implementation and includes "bonus buffer" for users to store extra data
EA
extended attribute generally of a limited size (4kB or less), not to be confused with the ZFS 'xattr' which is more like a named byte stream that may be an arbitrary size
znode
ZPL-format dnode, uses bonus buffer to store POSIX data for ZFS

Use Cases

id quality attribute summary
large_dnode performance increased dnode size to allow more data in the inode

EA in large dnode

Scenario: Storage of EA in dnode
Business Goals: Fast access to Lustre EA values
Relevant QA's: Performance
details Stimulus: EA needs to be stored in ZFS-format dnode
Stimulus source: Lustre OSD (MDT/OST) storing EA data to object
Environment: EA being stored on a specific znode within a transaction
Artifact: EA is stored in the znode
Response: time to store EA data
Response measure: time is not significantly more than znode update without EA, much less than storing data in ZFS xattr
Questions:
Issues: None.

Implementation constraints

DMU must work with both larger and original size dnodes in the same pool. There is currently no requirement for dynamic dnode size within a single filesystem

ZFS must be able to work on a filesystem with large dnodes, even if it cannot initially access the extra dnode space

'zfs' tool must be able to specify dnode size for a filesystem at creation time

EA is stored in microzap (as described in ZFS_Microzap) to allow efficient access, retrieval, and flexibility.

Questions and Issues

How do we handle case where EAs overflow available space in dnode? Strong consideration should be given to allowing EAs (also?) be stored in an external block or in a protected EA xattr file.

References

http://www.opensolaris.org/jive/thread.jspa?threadID=39817&tstart=15