Architecture - CROW

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Definitions

CROW (CReate On Write): the technique to optimize the create performance by deferring actual OSS objects creation until the first modify event (write or setattr) occur

CROW Architecture

MDS_LOV	Allocate object FIDs and store them into LOV EA without object creation RPC to OST
obdfilter	Implement object creation during first write/setattr request

Use Cases

Summary

id	quality attribute	summary
object_creation	performance, usability	Only save objects FIDs in EA during create operation on MDS. Create objects on OSTs at first write/setattr. Don't use precreation mechanism.
wrong_creation	usability	The re-creation of just destroyed objects are properly recognized and objects are not re-created
objects	usability	The MD object can exist with stripe info but without OSS object itself
fid	usability	Uniform FIDs are used to identify objects. MD FID and OSS FID are cross-referred using EA
cache	availability, usability	Client allocates both MD and OSS FIDs before doing create request to MDS.
recovery	availability, usability	Recoverable through re-creation of lost objects and/or deletion of orphaned objects in case of single-point failures.
cmd	usability	CMD is supported.
tests	testability	sanity and recovery tests.
quota	security, usability	compatible with quota.
scalability	availability, usability	more scalable than pre-creation mechanism
inodes_reservation	usability	OST should have enough inodes for all allocated FIDs

wrong_creation

Scenario:		The OST objects being unlinked are re-created again by setattr.
Business Goals:		Avoid re-creation of objects on OST
Relevant QA's:		Usability & availability
details	Stimulus:	OST objects are unlinked then late setattr or write come and create objects again.
	Stimulus source:	Object destroy and setattr come to OST from different nodes, therefore setattr can come later in case of network or server failure and consequent recovery.
	Environment:	MDS, OST
	Artifact:	There is no way to know about was the object already destroyed or just not created yet.
	Response:	Determine the state of non-existent OST objects - not yet created or destroyed already or cluster-wide serialization setattr vs. unlink
	Response measure:	Objects are not re-created.
Questions:		No.
Issues:		There is no clear understanding yet how to achieve the goal. The key can be the MDS FID of object. If it is exists then OST objects are not yet created, otherwise they was destroyed already. serialize setattr vs unlink on MDS? bzzz

cache

Scenario:		The FIDs for OST objects allocation on MDS can appear later than client needs them.
Business Goals:		Caching MDS and disconnected operations should work
Relevant QA's:		Usability
details	Stimulus:	Though the creation of OST objects is postponed the FIDs for them should be allocated during create and saved at MDS in LOV EA. Doing that on MDS can invoke problems.
	Stimulus source:	Caching MDS, disconnected operations
	Environment:	Client
	Artifact:	Caching MD or disconnected operations can send requests to MDS with delay, doing MDS job. Meanwhile the client should have FIDs for OST objects without delays to work with OST.
	Response:	Allocate OST FIDs at client during create and pass them to MDS along with create request in LOV EA data.
	Response measure:	Client has valid LOV EA with OST object FIDs right after create operation.
Questions:		If the clients are doing OST object creation before notifying the MDS then there is no way for the MDS/OST to clean up orphan objects if the client crashes before sending LOV EA to the MDS. Possibly the MDS would need to track the last-used objid for each sequence, and clients need to flush files+LOV_EAs to the MDS in objid order?
Issues:		No.

inodes_reservation

Scenario:		FIDs are allocated during create() but inodes on OST are created later, so there can be no free inodes for already allocated FIDs
Business Goals:		All object FIDs should have inode
Relevant QA's:		Usability
details	Stimulus:	The object FIDs are allocated and stored in LOV EA on MDS but there are no free inodes at the moment of write/setattr on OST.
	Stimulus source:	Applications
	Environment:	OST
	Artifact:	The FID allocation is done earlier than getting inode on OST.
	Response:	Reserve OST inodes for future FID allocations.
	Response measure:	Any allocated FID for object shall get inode on OST or allocation should fail.
Questions:		The DMU does not have a (practical) inode count limit like ldiskfs does, but will return ENOSPC when there is no free space left to create a new inode. This is equivalent to ENOSPC due to no free space for data, so maybe no reservation is needed in this case.
Issues:		No.

Implementation Constraints

1. Use existing API and protocols

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - CROW

Contents

Definitions

CROW Architecture

Use Cases

Summary

wrong_creation

cache

inodes_reservation

Implementation Constraints

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools