WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - MDS striping format: Difference between revisions
Line 1: | Line 1: | ||
'''''Note:''''' The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information. | |||
==Striping Description== | ==Striping Description== | ||
In a Lustre file system, metadata describing where data is stored on object storage servers (OSTs) is defined in extended attributes (EAs) on the metadata server (MDS). This information, called the “striping EA”, is described in detail below. Also described are a set of APIs provided with Lustre that allow modules and applications to manipulate the striping EA. | In a Lustre file system, metadata describing where data is stored on object storage servers (OSTs) is defined in extended attributes (EAs) on the metadata server (MDS). This information, called the “striping EA”, is described in detail below. Also described are a set of APIs provided with Lustre that allow modules and applications to manipulate the striping EA. |
Revision as of 16:01, 18 January 2010
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Striping Description
In a Lustre file system, metadata describing where data is stored on object storage servers (OSTs) is defined in extended attributes (EAs) on the metadata server (MDS). This information, called the “striping EA”, is described in detail below. Also described are a set of APIs provided with Lustre that allow modules and applications to manipulate the striping EA.
This architecture page is available as a .pdf file at this link: Managing Lustre Data Striping
Striping Extended Attributes
In a Lustre file system, metadata and data are stored separately, in the metadata server (MDS) and in the object storage server (OST) respectively. When accessing a file, the client obtains data location information from the MDS. The location information indicates how the file is striped across the OSTs. Since this information is stored in the extend attributes of each inode in the MDS, it is called the “striping EA.” The status of the striping EA may be in-disk, in-memory (kernel mode inside Lustre), or in-application (striping EA in a user-level application). Each status corresponds to a different format.
Lustre provides a set of APIs for other modules or applications to use to manipulate a striping EA. Below are a few examples showing how the striping EA is used by other Lustre modules.
Use Case
id | quality attribute | summary |
---|---|---|
create-file | usability | The client creates a file. |
unlink-file | usability | The client unlinks a file. |
lfs-setstripe | usability | The client creates a file with a specified stripe EA. |
MPI-LIB | usability | The MPI opens or creates a file with a specified stripe EA. |
copy-file | usability | Copy files from Lustre to another filesystem (QFS, pNFS or GPFS), while retaining the same striping information. |
Quality Attribute Scenarios
- create-file
Scenario: | Client create a new file. | |
Business Goals: | Ensure that the basic POSIX function works. | |
Relevant QAs: | Usability | |
Details | Stimulus: | Create a file |
Stimulus source: | Client application | |
Environment: | Lustre mounted-client | |
Striping API usages: | The client sends a “create” request to the MDS. The MDS calls the striping API to distribute the “create” request to the OSTs to create the data objects. The striping information is then returned to the MDS. The MDS calls the striping API again to convert the striping information to the appropriate disk format and places it into the EA of the metadata object. |
- unlink-file
Scenario: | Client unlinks a file. | |
Business Goals: | Ensure that the basic POSIX function works. | |
Relevant QAs: | Usability | |
Details | Stimulus: | Unlink a file |
Stimulus source: | Client application | |
Environment: | Lustre-mounted client | |
Striping API usages: | A client sends an unlink request to the MDS. The MDS unlinks the metadata object and logs the action in the unlink log. The client then calls the striping API to locate the object on the OST and sends the unlink request to the OST. After the data objects of the OST are removed, the callback mechanism tells the MDS to remove the unlink log. |
- lfs-setstripe
Scenario: | Client opens/creates a file with a specified striping EA. | |
Business Goals: | Tune striping to meet user requirements. | |
Relevant QAs: | Usability | |
Details | Stimulus: | Execute lfs setstripe. |
Stimulus source: | lfs setstripe and lfs getstripe utilities. Lustre also provides several lfs utilities to end users to set or get the striping information for a regular file or directory. | |
Environment: | Lustre-mounted client | |
Striping API usages: | In the current Lustre release, the striping EA of a regular file can only be set when it is opened or written the first time. So executing lfs-setstripe implies opening or creating the file with a specific striping EA.
Note: Limits for stripe settings are:
|
- MPI-LIB
Scenario: | Client opens/creates a file with a specified striping EA in MPI-LIB | |
Business Goals: | Enable MPI-LIB (Lustre ADIO driver) to to execute lfs-setstripe directly. | |
Relevant QAs: | Usability | |
Details | Stimulus: | Use MPI_open/create with stripe hints to open or create a file |
Stimulus source: | MPI-LIB + Lustre ADIO driver | |
Environment: | Lustre-mounted client and MPI environment | |
Striping API usages: | The MPI uses the striping API only in MPI_Open (in the Lustre ADIO driver), where it may be necessary to open/create a file with a certain striping EA. The MPI programmer can set the striping EA using a hint. Below is an example showing how IOR is used to set a striping EA.
The setting process is almost the same as for lfs-setstripe, but with one difference. In MPI, the ioctl system call is used directly to set the striping EA, instead of using an API from the Lustre user API lib, to avoid linking the unnecessary lib when building the MPI + Lustre ADIO driver. |
- copy-file
Scenario: | Copy files from Lustre to another filesystem (QFS, pNFS or GPFS). | |
Business Goals: | Copying files between Lustre and other filesystems (QFS, pNFS and GPFS), while retaining striping information without manual user intervention. | |
Relevant QAs: | Usability | |
Details | Stimulus: | Copy files from the Lustre file system to another file system but keep the same striping pattern. |
Stimulus source: | Copy filesystem tool, GNU tar (gtar) is used to specify user-level Lustre striping. | |
Environment: | Lustre filesystem. Striping information for the Lustre and QFS filesystems is similar enough that the user-level tool, GNU tar (gtar) can convert one to the other. | |
Striping API usages: | Lustre provides an updated version of the GNU tar (gtar) backup tool that enables a complete Lustre file system to be restored with the same striping pattern as before. Gtar can also be used in the copy process.
For example, when file A is copied, gtar first calls the Lustre user-level striping API to extract the striping EA of file A from the MDS (in-application format). Then gtar starts to copy file A to the other file system (e.g. QFS). Gtar creates a file on the target file system (possibly by using mknod) and sets the striping EA to that file. Since the striping format for these two file systems is very similar, gtar should not change the striping EA or should make only minor modifications.. Finally, gtar copies file A to the target file system according to the defined striping EA format. |
Striping Format
The striping EA status designates three striping EA formats:
- In-disk format (lov_mds_md) – Used when the striping EA is stored in disk.
- In-memory format (lov_stripe_md) – Used when the striping EA is being read out from the disk and unpacked.
- User format (lov_user_md) – Used when the striping EA is retrieved by the application and ready to output to the end user.
Independent of the format, all striping EAs consist primarily of two parts:
- Public – Applies to all the OSTs on which the file is located. Indicates how the file is striped over the OSTs.
- Private – An array in which each array item corresponds to one OST. Each array item specifies the OST index and data object ID within it.
When mapping the file offset to the special offset of the OST object, Lustre will compute the OST array index according to the file offset, striping size and striping count. Then it will go to the private OST array to obtain the OST index and object ID.
Striping Disk format
Two striping disk formats are available: normal striping format for a normal file and joined striping format for a joined file.
Normal Striping EA formats
The two parts of the normal striping EA, lov_mds_md (public) and lov_ost_data (OST private) are described below.
struct lov_mds_md {
/* LOV_MDS_MD */ __u32 lmm_magic; __u32 lmm_pattern; __u64 lmm_object_id; __u64 lmm_object_gr; __u32 lmm_stripe_size; __u32 lmm_stripe_count; /* LOV_OST_DATA */ struct lov_ost_data lmm_objects[0];
};
ID | Description |
LOV_MDS_MD | Striping information |
LOV_OST_DATA[ ] | Location information for the objects. Each OST for this object corresponds to an entry in the array. |
LOV_MDS_MD
name | size | description |
lmm_magic | 32 bits | Normal file (0x0BD10BD0) |
lmm_pattern | 32 bits | Stripe pattern: RAID-0, RAID-1 or other network striping pattern. Only the RAID-0 pattern is currently supported. |
lmm_object_id | 64 bits | Object ID on MDS, which is ino of the object (inode) in MDS. |
lmm_object_gr | 64 bits | For a directory, the object group number is used to determine if the striping EA for the directory is the default striping EA or a striping EA specified by lfs setstripe. For a file, the object group number is currently unused, but, in future releases, it will be used to identify groups of objects in a cluster metadata(CMD)environment. |
lmm_stripe_size | 32 bits | Stripe size: Number of bytes stored on each OST before moving to next OST. |
lmm_stripe_count | 32 bits | Stripe count: Number of stripes in the file. |
LOV_OST_DATA
struct lov_ost_data_v1 {
__u64 l_object_id; __u64 l_object_gr; __u32 l_ost_gen; __u32 l_ost_idx;
};
Name | Size | Description |
l_object_id | 64 bits | The object ID on the OST. |
l_object_gr | 64 bits | The object group number (same as lmm_object_gr in LOV_MDS_MD_FORMAT_ID). |
l_ost_gen | 32 bits | Generation of l_ost_idx. |
l_ost_idx | 32 bits | OST index in the logical object volume (LOV) in the MDS server, which is handled by the management server (MGS) in the current version of Lustre. |
Joined Striping EA format
A joined file is made up of several normal files, each with its own extent and corresponding striping EA.
Joined File Stripe Format
For a joined file, the striping disk formats include:
- Joined striping information (LOV_MDS_JOINED_MD).
- Striping extent information (MDS_EXTENT_DESCRIPTION). This information is stored in the log file for which the llog_log_id is defined in the joined striping EA.
- struct lov_mds_md_join {
/* LOV_MDS_JOINED_MD */ struct lov_mds_md lmmj_md; /* MDS_EXTENT_DESCRIPTION*/ struct llog_logid lmmj_array_id; __u32 lmmj_extent_count;
};
ID | Description | |
LOV_MDS_JOINED_MD | lmmj_md | Striping information. The format is the same as the LOV_MDS_MD |
lmmj_extent_count | The number of normal files in the joined file. | |
JOINED_LOG_ID | ID for the log file containing the striping extent information. |
LOV_MDS_JOINED_MD
name | size | description |
lmm_magic | 32 bits | Joined file (0x0BD20BD0) |
lmm_pattern | 32 bits | Stripe pattern. For joined file, each file should be the same pattern in the current version of Lustre |
lmm_object_id | 64 bits | Object ID on the MDS, which is ino of the object(inode) in the MDS. |
lmm_object_gr | 64 bits | For a directory, the object group number is used to determine if the striping EA for the directory is the default striping EA or a striping EA specified by lfs setstripe. For a file, the object group number is currently unused, but, in future releases, it will be used to identify groups of objects in a cluster metadata(CMD)environment. |
lmm_stripe_count | 32 bits | Total stripe count of each normal file in the joined file. |
lmm_stripe_size | 32 bits | Not used currently. |
lmmj_extent_count | 32 bits | The number of normal files in the joined file. |
MDS_EXTENT_DESCRIPTION
For each joined file, extent striping information is stored in a log file, which is referred to by llog_logid.
struct llog_logid {
__u64 lgl_oid; __u64 lgl_ogr; __u32 lgl_ogen;
};
JOINED_LOG_ID
Name | Size | Description |
lgl_oid | 64 bits | Log ID of the object. |
lgl_ogr | 64 bits | Log group of the object. |
lgl_ogen | 32 bits | Log generation of the object. |
JOINED File LOG Formats
The joined log file is composed of joined log records. Each joined record includes a log header, a joined_record and a log tail.
- struct mds_extent_desc {
__u64 med_start; __u64 med_len; struct lov_mds_md med_lmm;
};
- struct llog_rec_hdr {
__u32 lrh_len; __u32 lrh_index; __u32 lrh_type; __u32 padding;
};
- struct llog_rec_tail {
__u32 lrt_len; __u32 lrt_index;
};
- struct llog_array_rec {
struct llog_rec_hdr lmr_hdr; struct mds_extent_desc lmr_med; struct llog_rec_tail lmr_tail;
};
Name | Size | Description | |
log_header | lrh_len | 32 bit | Log record length |
lrh_index | 32 bit | Log record index | |
lrh_type | 32 bit | Log record type | |
padding | 32 bit | Record padding for 4 bytes aligned | |
joined record | med_start | 64 bits | Offset of the extent for the normal file in the joined file. |
med_len | 64 bits | Length of the extent for the normal file in the joined file. | |
med_lmm | size of LOV_MDS_MD | Striping information for each normal file (same as LOV_MDS_MD) | |
log_tail | lrt_len | 32 bit | Log record length. The value is the same as for lrh_len. |
lrt_index | 32 bit | Log record index, The value is the same as for lrh_index. |
Striping memory format
In-memory striping MD also includes general striping information and private information for each OST.
struct lov_oinfo {
__u64 loi_id; __u64 loi_gr; int loi_ost_idx; int loi_ost_gen;
/* used by the osc to keep track of what objects to build into rpcs */ struct loi_oap_pages loi_read_lop; struct loi_oap_pages loi_write_lop; /* _cli_ is poorly named, it should be _ready_ */ struct list_head loi_cli_item; struct list_head loi_write_item; struct list_head loi_read_item;
unsigned loi_kms_valid:1; __u64 loi_kms; struct ost_lvb loi_lvb; struct osc_async_rc loi_ar;
};
struct lov_stripe_md {
/* General striping information */ spinlock_t lsm_lock; void *lsm_lock_owner;
struct { __u64 lw_object_id; __u64 lw_object_gr; __u64 lw_maxbytes;
__u32 lw_magic; __u32 lw_stripe_size; __u32 lw_pattern; unsigned lw_stripe_count; } lsm_wire;
/* Private OST array */ struct lov_array_info *lsm_array; struct lov_oinfo *lsm_oinfo[0];
};
Name | Size | Description | |
lsm_lock | size of spin_lock_t | lsm lock to protect each item of the striping EA. | |
lsm_lock_owner | size of void* | Owner of the lsm_lock, for debugging purposes | |
lsm striping information | lw_object_id | 64 bit | lov object id (same as lmm_object_id) |
lw_object_gr | 64 bit | lov object group number, same as lmm_object_gr | |
lw_max_bytes | 64 bit | Maximum possible file size | |
lw_magic | 32 bit | lsm magic number (same as lmm_magic) | |
lw_stripe_size | 32 bit | Size of the stripe (same as lmm_stripe_size) | |
lw_stripe_pattern | 32 bit | Pattern of the stripe (same as lmm_stripe_pattern) | |
OST array information | |||
lsm_array | size of pointer | Pointer to a lsm array, only for joined file | |
loi_id | 64 bit | Data object id (same as l_object_id) | |
loi_gr | 64 bit | Data object group (same as l_object_gr) | |
loi_ost_idx | 64 bit | OST index of the data object | |
loi_ost_gen | 64 bit | OST generation of the data object | |
loi_read_lop | size of struct loi_oap_pages | List of pending read pages for the file for this object server client (OSC). | |
loi_write_lop | size of struct loi_oap_pages | List of pending write pages for the file for this OSC. | |
loi_cli_item | size of struct list_head | List of objects ready to read/write for this OSC. | |
loi_read_item | size of struct list_head | List of objects to be read for this OSC. | |
loi_write_item | size of struct list_head | List of objects to be written for this OSC. | |
loi_kms | 64 bits | Known minimum size of the data object. | |
loi_kms_valid | size of unsigned long | Valid flag for known minimum size | |
loi_lvb | size of struct ost_lvb | Lock value block. Used to capture data object status information (size, time, etc.) commu-nicated between the filter and OSC. The Lustre client system (llite) and LOV (llite/lov) merge the acquired information into a complete set of information about the file. | |
loi_ar | size of struct osc_async_rc | Used to propagate asynchronous writeback errors back up to the application. If an asynchronous write fails, an error code is recorded and used later when an application executes an fsync operation. |
Striping user format
The striping user format is used when the striping EA is retrieved by a user-level application (for example, with lfs getstripe/setstripe).
struct lov_user_ost_data_v1 {
__u64 l_object_id; __u64 l_object_gr; __u32 l_ost_gen; __u32 l_ost_idx;
}
struct lov_user_md {
__u32 lmm_magic; __u32 lmm_pattern; __u64 lmm_object_id; __u64 lmm_object_gr; __u32 lmm_stripe_size; __u16 lmm_stripe_count; __u16 lmm_stripe_offset; struct lov_user_ost_data_v1 lmm_objects[0];
}
The user format differs in the following ways from the in-disk format:
- The user format has a lmm_stripe_offset, which the in-disk format does not have. lmm_stripe_offset is used by setstripe to transfer the striping_index parameters to Lustre when setting a stripe.
- For the user format, lmm_stripe_count has only 16 bits, while for in-disk format, stripe_count has 32 bits. So in the current Lustre release, the maximum stripe count is 65532.
Striping API
Lustre provides a set of APIs to handle the striping EAs. The five types of APIs are listed below according to their functionality:
- Set/get APIs. Used to set or get a striping EA to or from storage.
- Pack/unpack APIs. Because striping EAs are stored in packed format on disk, pack/unpack APIs are provided to pack and unpack striping EAs after a get or setstriping EA API is used.
- Allocate/free APIs. Used to allocate and free striping EAs in memory.
- Striping location APIs. Since location information for data objects is stored in striping EAs, APIs are provided to access the striping EAs and return data object location information. These APIs are also used to select the OST where the data object is to be created.
- lfs APIs. User-level APIs used by applications (lfs utilities) to handle striping EAs.
The set/get APIs operate on striping EAs in in-disk format. The pack/unpack APIs operate on striping EAS in both in-disk and in-memory formats. The other APIs operate on striping EAs in in-memory format.
Get/Set striping EA API
fsfilt_set/get_md
- int fsfilt_set_md(struct obd_device *obd, struct inode *inode, void *handle,void *md, int size, const char *name)
- int fsfilt_get_md(struct obd_device *obd, struct inode *inode, void *md, int size, const char *name)
- Parameters
- obd
- Device of the object.
- inode
- MDS object.
- handle
- Journal handle for setting striping EA.
- md
- Buffer of the striping EA.
- size
- Size of the striping EA.
- name
- Name (LOV) of the striping EA
- Return
- fsfilt_set_md
- 0 means success. A negative error number means an error.
- fsfilt_get_md
- 0 means success. A positive return value is the number of bytes that need to be added to the buffer to make it large enough to contain the striping EA. A negative error number means an error. Note: If the striping EA does not exist, get_md still returns 0.
- Description
These two APIs are used by MDS to get/set a striping EA.
Pack/Unpack Striping EA API
obd_packmd
- int obd_packmd(struct obd_export *exp, struct lov_mds_md **disk_tgt,struct lov_stripe_md *mem_src)
- Parameters
- exp
- Export of the device.
- disk_tgt
- Disk structure for the striping EA.
- mem_src
- In-memory structure for the striping EA.
- Return
- If disk_tgt is NULL, striping size(in-memory structure *mem_src) is returned.
- If both disk_tgt and mem_src are NULL, the maximum possible stripe size is returned.
- If disk_tgt is not NULL and mem_src is NULL, @*disk_tgt is freed.
- If @*disk_tgt is NULL, a in-disk structure is allocated.
- Description
- This API packs the striping EA from in-memory format to an in-disk description.
obd_unpackmd
- int obd_unpackmd(struct obd_export *exp, struct lov_stripe_md **mem_tgt,struct lov_mds_md *disk_src, int disk_len)
- Parameters
- exp
- Export of the device
- mem_tgt
- In-memory structure for the striping EA
- disk_src
- Disk structure for the striping EA
- disk_len
- Length of disk_tgt.
- Return
- Positive value indicates the size of the unpacked striping EA.
- 0 is returned when the API tries to free the disk_src.
- Negative value indicates an error.
- Description
- This API unpacks the striping EA from an in-disk format (disk_src) to an in-memory description (mem_tgt). When mem_tgt is NULL, the API will free disk_src.
Allocation/Free
obd_size_diskmd
- void obd_size_diskmd(struct obd_export *exp, struct lov_stripe_md *mem_src)
- Parameters
- exp
- Export of the device.
- disk_tgt
- Disk structure for the striping EA.
- mem_src
- In-memory structure for the striping EA.
- Return
- If mem_src is not NULL, striping size pointed to by mem_src is returned.
- If mem_src is NULL, the maximum striping size is returned.
- Description
- This API returns the real size of the striping EA.
obd_alloc_diskmd
- int obd_alloc_diskmd(struct obd_export *exp, struct lov_mds_md **disk_tgt)
- Parameters
- exp
- Export of the device.
- disk_tgt
- Allocated in-disk-formatted striping EA.
- Return
- 0 means success. A negative number means an error.
- Description
- This API returns the in-disk-formatted striping EA pointed to by disk_tgt. It allocates the maximum striping EA size, which typically equals the maximum data object count of the file * size of struct lov_ost.
obd_free_diskmd
- int obd_free_diskmd(struct obd_export *exp, struct lov_mds_md **disk_tgt)
- Parameters
- exp
- Export of the device.
- disk_tgt
- Allocated in-disk-formatted striping EA.
- Return
- 0 means success. A negative number means an error.
- Description
- This API frees the in-disk-formatted striping EA referenced by *disk_tgt.
obd_alloc_memmd
- int obd_alloc_memmd(struct obd_export *exp, struct lov_stripe_md **mem_tgt)
- Parameters
- exp
- Export of the device.
- disk_tgt
- Allocated in-memory-formatted striping EA
- Return
- 0 means success. A negative number means an error.
- Description
- This API returns the in-memory-striping EA pointed to by mem_tgt. It allocates the maximum striping EA size.
obd_free_memmd
- int obd_free_memmd(struct obd_export *exp,struct lov_stripe_md **mem_tgt)
- Parameters
- exp
- Export of the device.
- disk_tgt
- In-memory-formatted striping EA memory to be freed.
- Return
- 0 means success. A negative number means an error.
- Description
- This API frees the in-memory-formatted striping EA referenced by *mem_tgt.
Striping Location APIs
lov_stripe_size
obd_size lov_stripe_size(struct lov_stripe_md *lsm, obd_size ost_size,int stripeno)
- Parameters
- lsm
- In-memory striping EA.
- ost_size
- Size of a single data object in an OST.
- stripeno
- Stripe number of the data object.
- Return
- 0 means success. A negative number means an error.
- Description
- This API computes the file size given stripeno and the OST size, where stripeno and the OST size are associated with the OST where the end of the file is located.
lov_stripe_offset
int lov_stripe_offset(struct lov_stripe_md *lsm, obd_off lov_off, int stripeno, obd_off *obd_off)
- Parameters
- lsm
- In-memory striping EA.
- lov_off
- Logic file offset.
- stripeno
- Stripe number of the data object.
- obd_off
- Offset of the OST indicated by stripeno, which is nearest to the logic file offset (lov_off).
- Return
- 0 means the OST indicated by stripeno is exactly the same OST as the offset (lov_off) indicated.
- -1 means the index of the OST indicated by stripeno is less than the index of the OST indicated by the offset (lov_off).
- 1 means the index of the OST indicated by stripeno is larger than the index of the OST indicated by the offset (lov_off).
- Description
This API is used to check whether an extent intersects with an OST.
lov_stripe_number
int lov_stripe_number(struct lov_stripe_md *lsm, obd_off lov_off)
- Parameters
- lsm
- In-memory striping EA
- lov_off
- Logic file offset
- Return
- 0 means success. A negative number means an error.
- Description
This API computes which stripe number lov_off belongs to.
lfs API
llapi_file_get_stripe
int llapi_file_get_stripe(const char *path, struct lov_user_md *lum)
- Parameters
- path
- Path of the file.
- lum
- Striping information returned to the caller.
- Return
- 0 means success. A negative number means an error.
- Description
This API returns striping information to the caller to be used by the application.
llapi_file_open
int llapi_file_open(const char *name, int flags, int mode, unsigned long stripe_size, int stripe_offset, int stripe_count, int stripe_pattern)
- Parameters
- name
- File name
- flags
- Open flags
- mode
- Open mode
- stripe_size
- Stripe size of the file
- stripe_offset
- Stripe offset(stripe_index) of the file
- stripe_count
- Stripe count of the file
- stripe_pattern
- Stripe pattern of the file
- Return
- 0 means success. A negative number means an error.
- Description
This API opens/creates a file with specified striping parameters.
Future developments
With the currently implemented striping disk format, ->obd_unpackmd() must have an end-to-end understanding of all possible combinations of layouts, i.e., the format is basically flat rather than hierarchical.
To facilitate development of new layouts, the striping disk format will be adjusted so that higher layers (e.g., struct lov_mds_md) can be parsed without knowing the details of the lower layer (in this case, struct lov_ost_data) representation.
A straightforward way to do this is to precede each layout descriptor with the standard header:
struct md_layout_descriptor_header { __u16 mldh_magic; __u16 mldh_length; };
where ->mldh_magic identifies the layout type and is used to determine the ->obd_unpackmd() method to be called to parse the descriptor; and ->mldh_length is the total descriptor length, which is used by the upper layer to pass over lower layer descriptors without understanding details of their representation.
Care must be taken, however, to avoid introducing too much redundant information to the on-disk EA for the most common uses.
Glossary
- ADIO
- Analog-to-digital I/O. The ADIO driver is an abstract-device interface for parallel I/O that is used by the MPI to implement its I/O library.
- CMD
- Cluster metatdata
- EA
- Extended attribute
- llite
- Lustre client system
- LOV
- Logical object volume
- MDS
- Metadata server
- MGS
- Management server
- MPI
- Message Passing Interface
- OSC
- Object server client
- OST
- Object storage server