Friday 20 March 2015

Introduction to Universal Disk Format (UDF)

What is UDF?

Universaal Universal Disk Format (UDF) is a file system specification defined by OSTA. One objective of UDF is to replace the ISO9660 file system on optical media (CDs, DVDs, etc). It is also a good file system to replace FAT on removable media.


Why UDF?

  • Any removable media (CD, DVD, flash drive, external hard drive, etc) needs a file system format. Ideally, this format should have these characteristics:
  • Can be understood by different platforms. This makes it possible to copy files between Windows, Mac, and Unix systems. FAT and ISO9660 are two formats that can be understood by most systems. However, they have many limitations.
  • Its specification is open. ISO9660 is an open standard, while FAT belongs to Microsoft.
  • Has rich features (preferably a super set of all common file systems) so information won't be lost when files are copied to this file system.
  • Can support different kinds of physical media. Optical media is very different from hard drives. Some media is write once (CD-R, DVD-R, DVD+R, BD-R), some needs defect-management (CD-RW, DVD-RW, DVD+RW, BD-RE, etc), some needs to be expanded sequentially before being overwritten (most RW media).
  • Its format should be as simple as possible. This is important when this format is implemented in embedded devices (DVD player, Camcorder, Camera, etc). Complex data structures such as B-tree are not good candidates for this purpose.
  • Its format should evolve in a compatible way so old media can still be accessed by new systems.
UDF is the only file system that meets all these standards, since it was designed for the information exchange purpose.
  • UDF is an open standard.
  • The design and evolution of UDF keeps compatibility in mind.
  • UDF natively supports many modern file systems features:
    • Large partition size (maximum 2TB with 512B block size, or 8TB with 2KB block size)
    • 64-bit file size
    • Extended attributes (e.g., named streams, or forks) without size limitation
    • Long file names (maximum 254 bytes, any character can appear in the name)
    • Unicode encoding of file names
    • Sparse file
    • Hard links
    • Symbolic links
    • Metadata checksum
    • Metadata redundancy (optional in UDF 2.50 or later in metadata partition)
    • Defect management (for media that does not manage defect internally, such as CD-RW, DVD-RW, and DVD+RW) 
  • UDF defines how different platforms interact with each other. For example, it defines how to store Mac Finder Info and Resource Fork, NTFS ACL, UNIX ACL, OS/2 EA, etc. It also requires platforms to preserve the information that they don't understand.
  • UDF is a truly universal file system. It can be used on all kinds of optical media, including read only (CD-ROM, DVD-ROM, BD-ROM (Blu-ray Disc Read-Only)), write once (CD-R, DVD-R, DVD+R, BD-R), rewritable (CD-RW, DVD-RW, DVD+RW, DVD-RAM, CD-MRW, DVD+MRW, BD-RE), and of course block device (hard drives). Even write-once media appears as a big overwritable floppy under UDF.
Drawbacks of the current revision of UDF as of 2.60:
  • Limited partition size. 32-bit block number limits the partition size to 2TB for 512 sector size. Although it is not a problem for the current optical media, it may become a problem later.
  • Does not provide a fast crash recovery mechanism. As the size of the media increases, crash recovery becomes more and more important. Full disc scan before mount becomes less feasible on slow optical media with tens of gigabytes space. Although an implementation may use a journal to protect metadata integrity, this does not guarantee interoperability between platforms since it is not part of the standard.
  • Does not support compressed/encrypted file and directories. As device gets bigger and bigger, compression is not that important. However, encryption may become more compelling since UDF is mainly used on removable media.
  • Becomes more and more complex. UDF 2.50 adds the metadata partition in order to improve performance. File system metadata are clustered within the metadata partition so that they can be accessed quickly. Optionally, a mirror of the metadata partition could be duplicated to provide better robustness in a big cost of performance. This adds non-trivial complexity to the file system. Does the benefit of metadata partition warrant its complexity? If the UDF implementation organizes metadata properly, it may achieve similar (or better) performance to what the metadata partition can provide. Unfortunately, in UDF 2.50 or later, the use of metadata partition is mandatory on overwrite media like CD-RW and hard drive. UDF 2.60 even requires the use of metadata partition on write-once media using pseudo-overwrite partition. If a UDF implementation wants to avoid the complexity of metadata partition, it should use UDF 2.00/2.01.
  • Is not as popular as FAT and ISO9660 now. As more and more systems implement UDF, this problem will go away.

 

 UDF is an evolving standard. Their major features are summarized in the following table.

SummaryofUDFRevisionHistory
  
Revision Date Published Major New Features
1.02 08/1996 First revision, suitable for read-only media
1.50 02/1997 Support write once media and defect management on media
2.00 04/1998 Support named streams
2.01 03/2000 Fix minor errors
2.50 04/2003 Support metadata partition for better performance
2.60 03/2005 Support pseudo-overwrite partition

There have been 6 UDF revisions published: 1.02, 1.50, 2.00, 2.01, 2.50, and 2.60. Revision 2.00 and 2.01 is very similar, and revision 2.50 and 2.60 is very similar. So there are four generations of UDF: 1.02, 1.50, 2.00/2.01, 2.50/2.60. They are discussed in more details below.
  • UDF 1.02 is the first UDF revision. It is the standard used by DVD movie. It is suitable for read-only and hard-drive like media.
  • UDF 1.50 adds virtual partition and sparable partition. Virtual partition allows a write-once media (CD-R, DVD-R and DVD+R) appears as an overwritable media. A write-once media appears as an overwritable floppy (but hundreds or thousands times larger), except that its available space keeps decreasing as you use it. Even removing files cannot reclaim space. The sparable partition performs defect management on the media, similar to what the hard drive firmware does on modern hard drives. This is because overwritable media such as CD-RW, DVD-RW, and DVD+RW can only be overwritten for a limited number of times (several thousand times) and will fail. A sparable partition makes a disc with many defects appear as a good one with a contiguous logical space.
  • UDF 2.00 adds named streams to files and directories, as well as system streams to the logical volume. Named streams can be used to implement extended attributes in other file systems, such as the resource fork and ACL in Mac OS X, and the ACL in NTFS. At the same time, the format of the mapping table for virtual partition is changed.
  • UDF 2.01 fixed a few minor errors of 2.00 and does not introduce major features.
  • UDF 2.50 brings in metadata partition, and increases the complexity of UDF to a new level. The metadata partition contains all metadata such as directories and blocks managing file space allocations. The objective of metadata partition is to improve file system performance by aggregating metadata together. The metadata partition could optionally support software mirroring so two copies of the metadata are maintained. This feature pay a price on performance and improves the robustness of the file system, while at the same time makes the file system even more complex. This revision is the standard for the coming high-definition DVDs (HD-DVD or Blu-ray).
  • UDF 2.60 adds the support for pseudo-overwritable partition when the drive supports pseudo-overwrite mode for write-once media. Pseudo-overwrite means the drive manages a logical to physical address mapping (similar to virtual partition) so the file system can simply treat the partition as overwritable. With the intention of reducing the file system complexity, UDF defines that some drives may not support pseudo-overwritable partition, so the file system must use virtual partition to manage such media. So in a long time when two types of drives co-exist, the file system must be able to handle both drives and thus will be even more complex, ironically.

 Structure of the UDF Standard

The UDF Standard contains two sets of specifications, ECMA-167 and UDF.
  • ECMA (European Computer Manufacturer's Association) is a standards body that determines standards for computing technology. ECMA-167 is a volume and file structure standard. It is also ISO/IEC 13346. ECMA-167 defines a general volume and file format mainly targeted optical media (write-once and rewritable media). ECMA is defined as a general framework. It leaves enough choices and undecided details that need to be filled by another standard. ECMA-167 has second edition (ECMA-167/2) and third edition (ECMA-167/3). ECMA-167/3 added named streams. So UDF revision 1.x is based on ECMA-167/2 while UDF revision 2.x is based on ECMA-167/3. ECMA-167/3 is almost identical to ECMA-167/2 except the named stream support, a higher revision number, and a few other minor places. These differences are discussed in detail in ECMA-167/3 if you search the term "167/2".
  • UDF is the standard defined by OSTA (Optical Storage Technology Association). UDF is based on the ECMA-167 framework, filling in all the necessary details, clarify ambiguous pieces.
Because UDF consists of ECMA-167 and UDF, you need to have both standards in hand and read them side-by-side. To make things very clear, the style of the standard is like a reference book. Learning knowledge from a reference book is not fun. It is like learning a language by reading its dictionary from A to Z, and put all the grammar together by connecting all the fragments in the dictionary. Reading two standards is twice as worse: you need to learn two new languages A and B from two dictionaries, while dictionary B is written using language A. Another side effect of reading a standard is that it makes people fall asleep fairly quickly :-;
The learning process should be iterative. You start reading ECMA-167 to get some feeling, and read UDF for corresponding sessions to get more feelings, and go back. At some point, grab a UDF disc and dump its structure and read what's on it, to verify what is in your mind. You don't need to finish reading ECMA-167 and fully understands it before read UDF, because there are many details in ECMA-167 that are not used in UDF.
The UDF tutorial in the following session explains what UDF looks like. I don't assume you have knowledge of other file systems such as the Unix file system. But if you know that, it will be easier to understand UDF.


UDF Specification

  1.  Highlight of the UDF Format

Compared with the Unix file system (UFS/FFS/ext2), UDF's main structures are highlighted below.
  • Where UDF is similar to FFS
    • An inode is used to represent a file or a directory. The UDF's term of inode is File Entry. In UDF 2.x, there is also Extended File Entry which works in the same way as a File Entry, except that it supports named streams. When we mention File Entry below, it means the File Entry in general, including File Entry and Extended File Entry.
    • A directory is a special file which contains many variable size directory entries. Each entry has a variable size file name and the address of the File Entry (i.e., inode) of the file or directory. The structure of the directory is linear. A linear search is required to lookup a file with its name.
    • Hard links are implemented by letting more than one directory entries point to the same File Entry (i.e., inode). Each File Entry maintains a link count, but does not have information about which directory entry points to it.
    • Symbolic link is a special file containing a path.
    • A bitmap is used to manage the free space of the file system, although read-only and write-once media do not use bitmaps.
  • Where UDF is different from FFS
    • Each File Entry (i.e., inode) consumes one disk block (512B on most hard drives, and 2KB on most optical media). The File Entry is identified by its block address. Unlike FFS, there is no limitation on number of File Entries in the file system.
    • Small files (and directories) can be stored in the File Entry block itself, similar to the embedded files in NTFS.
    • An Extended File Entry inode block can point to another File Entry called a named steam directory, which may contain unlimited number of named streams.
    • Disk space allocated to a file/directory is managed by extents. Sparse file is supported by marking an extent as sparse. For files with many fragments that all its extents cannot be stored in the File Entry block, more disk blocks can be allocated to store the extent information. These disk blocks are linked together as a singly linked list.
    • The file structure of UDF is not built on the raw device, but on the partitions. UDF has the most complex partition management among existing file systems. Although different partition (type 1, sparable, virtual, metadata, and pseudo-overwritable) has different underlying physical properties, they provide an almost universal interface to the above layer (the file and directory structures): each partition is a contiguous logical space consisting of blocks with logical numbers from 0 to n-1, where n is the partition size. 

    2. UDF Volume Structure and Mount Procedure

    The volume structure is transparent to the file and directory structure. It provides a framework so that different format may co-exist on the same media. This part of the standard is the most abstract and dry to read. It defines many terms that a UDF file system implementer rarely needs to care in file system operations. This part is most interesting for writers of the volume mount module (to identify that this is a UDF volume, i.e., the command mount_udf) and media formating module (to allow other system identify that this is a UDF volume, i.e., the command newfs_udf).
    To explain the volume structure, we step though the mount procedure to see how the UDF Volume is recognized. Mount always happens before the UDF media can be used by the host. This usually happens automatically when a removable UDF media is attached to the system, or it can be enabled manually by a command (say, the mount command on Unix type systems).
    The mount procedure can be separated into two parts: volume recognition and file system verification.
    Volume recognition is the first step to make sure this is a UDF volume. It only tells that this is a UDF media but does not tell where the file system metadata. A quick format utility can simply erase the UDF volume by erasing the recognition sequence of the volume. The volume recognition procedure looks for the Volume Recognition Sequence (VRS) from a base address (UDF's term is Volume Recognition Space). For most media, the base address is the start of the media. For multi-session optical media (CD-R, DVD-R, DVD+R, BD-R etc), the base point is the start of the last data session. VRS consists of the following three contiguous sectors which are stored after the first 32KB of the base address:
  • Beginning Extended Area Descriptor (BEA)
  • Volume Sequence Descriptor (VSD) with id "NSR02" or "NSR03"
  • Terminating Extended Area Descriptor (TEA)
After volume recognition, the mounter must find the metadata of UDF to make sure this UDF volume is valid and its revision can be handled by the system. UDF metadata structures are called Descriptors in the standard. The start address of all descriptor are sector-aligned. Most descriptors are smaller than a sector (the bitmap and sparing table are two exceptions). Some descriptors contain pointers (i.e., addresses) of other descriptors. These descriptors are chained together in a certain order. A mounter may perform the following steps to make sure the UDF media is mountable:
  • Anchor Volume Descriptor Pointer (AVDP): find the AVDP at sector address 256. AVDP contains the start address and size of the main Volume Descriptor Sequence (VDS) and a reserved VDS. The reserved VDS duplicate the data in the main VDS to increase the reliability. Only the main VDS needs to be read by the mounter.
  • VDS contains many descriptors until it is terminated by a Terminating Descriptor (TD). The following descriptors are the most crucial to the mounter: Partition Descriptor (PD) and Logical Volume Descriptor (LVD).
    • Partition Descriptor (PD): PD states the start and size of a partition. All files and directories are stored in the partition. Most UDF media has PD. UDF also allows two PDs, one describes a read-only partition, and the other one describes an overwrite partition.
    • Logical Volume Descriptor (LVD): LVD specifies the name of the volume through Logical Volume Identifier, defines all the physical and logical partitions through Partition Map, and indicates the location of the root directory through File Set. Partitions defines by the partition map has a Partition Reference Number, i.e., the zero-based index of the partition in the partition map. Any sector in the partition can be addressed by the Partition Reference Number and a logical address within the partition. UDF supports many different types of partitions. The details will be discussed in section 5.5. The Integrity Sequence Extent of LVD contains the address of the Logical Volume Integrity Descriptor (LVID). LVID records the last time the media is written. The existence of a LVID tells that the UDF file system is in a consistent state. An exception is that on write-once media using virtual partition, the write of the Virtual Allocation Table (VAT) File Entry (FE) replaces the function of LVID.
  • Some OS (e.g., Mac OS X) requires a valid root directory when a file system is mounted. So the mounter could optionally verify that the root directory is valid before mounting the file system. The partition map defined in LVD has enough information about the partitions on the media. The file set in LVD tells the address of the root directory FE. The mounter could read this FE to make sure the volume is valid. 


3 UDF Partition Structure

UDF defines five different types of partitions. A partition provides a uniform interface to the file system layer while hiding the different underlying physical properties. Each partition has a partition reference number, which is the zero-based index in the Partition Map of the LVD. Blocks in a partition can be addressed by a block number ranging from 0 to N-1, where N is the size of the partition. The size of a partition may not be fixed. It may increase (for Virtual Partition, Metadata Partition, and Pseudo-Overwrite Partition) or decrease (for Metadata Partition).


3. Type 1 Partition

This is the simplest partition. A type 1 partition has a start address S and size N. A logical block number A in the partition can be converted to the media physical address (in UDF's term, the logical sector address) S+A. In certain optical media, the start and size of the partition must be aligned to the packet size (such as 32KB). These special requirements are defined in the appendixes of the UDF standard. Free space of the partition is managed by the Unallocated Space Bitmap Descriptor. It contains one bit for each block of the partition. If the bit is set (1), the corresponding block is free. If it is clear (0), the corresponding block is allocated. The is contrary to what FFS/UFS uses the bitmap, because the bitmap in UDF is called Unallocated Space Bitmap.

4. Sparable Partition

Sparable partitions are used on overwrite media that will fail after a certain number of overwrites (several thousands), such as CD-RW. In a file system, the places that are overwritten frequently are often important metadata area, e.g., bitmaps. Sparable partition allows the failed area to be remapped to other good part on the media so the failed area appears good to the upper level. A sparable partition is similar to a type 1 partition in the sense that it has a start address and size. In addition, it defines 2 to 4 sparing tables which points to reserved spare area on the media. Each sparing table points to the same reserved spare area. If one sparing table fails, another sparing table can be used instead. The unit of overwrite on such media is packet. For example, the packet size for CD-RW is 32 2K-sectors. One sector in packet failing means the whole packet fails. When this happens, the content of this packet is written to a spare area, and its new address is written to the sparing table. When translating a logical address in the sparable partition to the physical address, the sparing table is always consulted. If the logical address is not found in the sparing table, the address translation is the same as a type 1 partition. Otherwise, its new address in the sparing area recorded in the sparing table is returned. Thus, the sparing table acts as an exception table in the address translation. This mechanism guarantees that the logical address does not change when its original packet fails.
We use an example to explain how the sparing table and address translation works. To make it more intuitive, we assume the packet size be 10 sectors, although in real optical media, the packet size is always a power of two. Assume the partition starts from physical address 1000 and has 8000 sectors. We have two spare areas starting from 500 and 9000, respectively. The size of each spare area is 50. Therefore, we have 5 packets in each spare area. Since each sparing table has the same content, we only show the content of the first sparing table. Before the media has any defects, the sparing table looks like below:
Original Logical AddressMapped Physical Address
available500
available510
available520
available530
available540
available9000
available9010
available9020
available9030
available9040

UDF uses 0xFFFFFFFF to indicate that this spare packet is available. Since there is no defect, address translation is the same as a type 1 partition. So logical address 67 is translated to physical address 1067. Assume after some use, the system find the packet that contains block 93 fails when writing to it. It then write this packet to the spare packet with physical address 500, and update the sparing table:
Original Logical AddressMapped Physical Address
90500
available510
available520
available530
available540
available9000
available9010
available9020
available9030
available9040

Now the logical address translation is the same as before except for logical address 90-99. For example, logical address 67 still has the physical address 1067, but the logical address 97 now has physical address 507. As indicated in the example, the unit of sparing is a packet. The sparing table records the address of the first block of the packet.
In this example, the spare area is outside of the partition. Actually, it can also be inside of the partition, and the partition then must mark the space occupied by the spare area unavailable for regular space allocation.
For sparable partitions, the partition must start on a packet boundary, and its size must be an integral multiple of the packet size.

5. Virtual Partition

Virtual partition is used on write-once media. Only three types of metadata are stored in the virtual partition: File Set Descriptor, File Entry (including Extended File Entry), and Allocation Extent Descriptor. If the file data is embedded in the file entry, these file data are also stored in the virtual partition. Virtual partition makes the write-once media appear as an overwrite media. Virtual partition layers on top of the type 1 partition. A Virtual Allocation Table (VAT) is used to map logical addresses of the virtual partition to logical addresses in the underlying type 1 partition.


6. Metadata partition

Metadata partition is used to cluster metadata of the media together to get better performance. Metadata includes File Entries, allocation descriptors, directories, but does not include named streams or extended attributes. The metadata partition lies on top of the underlying partition, which could be a type 1 partition, sparable partition, or a pseudo-overwrite partition. The metadata partition consists of 3 files: the Metadata File, the Metadata Mirror File, and the Metadata Bitmap File. The Metadata File and Metadata Mirror File have duplicated metadata -- File Entries and Allocation Extent Descriptors. They may optionally have duplicated data, i.e., each metadata has two copies on the media. To simplify the following discussion, we assume that the Metadata Mirror File does not duplicate the Metadata File content.
All data in the metadata partition are stored in the Metadata File. The logical block number in the metadata partition is the file offset in the Metadata File. Since some space in the Metadata File may be unused, the Metadata Bitmap File is used to keep track of the free space in the Metadata File. The metadata for the Metadata File, Metadata Mirror File, and Metadata Bitmap File are stored on the underlying type 1 (or sparable or pseudo-overwrite) partition. These are the only metadata that are not stored in the metadata partition. The data of the Metadata File and Metadata Mirror File must be aligned to the media ECC block size or packet size, whichever is bigger, and its size must be a multiple of the media ECC block size or packet size, whichever is bigger.
We use an example to explain how the metadata partition works. We assume the ECC block size and packet size is 10, although UDF requires it to be larger than 32, and the size in real media is always a power of two. Assume the underlying partition is a type 1 partition, starts at physical address 1000 and has 8000 sectors. The content of the Metadata File (i.e., the metadata partition) has two extents: the first starts at logical address 100 and has 300 sectors, the second starts at 2000 and has 500 sectors. Therefore, the size of the metadata partition is 800 sectors. The logical address 5 in the metadata partition means a block offset 5 in the Metadata File, which is translated to logical address 105 in the type 1 partition, or physical address 1105 in the physical media. We put more examples of address translation in the following table.
Metadata Partition Logical Address Type 1 Partition Logical Address Physical Media Address
01001100
1342341234
30020003000
70024003400
75024503450
The content of the Metadata Bitmap File is a Unallocated Space Bitmap Descriptor. Similar to the bitmap in a type 1 or sparable partition, the bitmap has one bit for each block in the partition.

7 Pseudo-overwrite partition

The pseudo-overwrite partition (POW) is used for next-generation write-once media (e.g., Blu-ray Disc recordable or BD-R) on next-generation intelligent drives. These drives manage the address translation within the drive (what the virtual partition does before) to make the partition appear as an overwritable although the physical media is write-once. When POW partition is used, the metadata partition shall also be used for metadata, in the hope that metadata are clustered and achieve better performance. However, on write-once media, even when data are logically clustered in one partition, they may physically be far apart on the media. Because a longer physical distance often implies poorer performance, whether the use of metadata partition can improve performance is questionable. In a media that supports POW partition, the media can be separated into several tracks. Each track has a Next Writable Address (NWA). A new block can be written to the NWA of any track. An existing block can be overwritten. The NWA of any track can change at any time. So NWA must be queried before any new block is written.

8. Partition Descriptor and Partition Map

There are two ways to address a block on the media, the physical address (Logical Sector Number or LSN) and the logical address (Logical Block Number or LBN). Physical address is used to address metadata outside of partitions (such as the Logical Volume Descriptor). Logical address is used to address any block within partitions. Since there can be more than one partitions in a UDF volume, a Partition Reference number (PartRef) is needed in addition to LBN to address a block. We introduce to know how partitions are described before explaining how PartRef is decided. Partitions on a UDF volume are described by one or more Partition Descriptors (PD) and a partition map with one or more entries. The partition map is stored in the Logical Volume Descriptors. PD describes the physical properties of a partition. The most relevant information in a PD is its partition number, partition start location and length. It also tells whether this partition is read-only, write-once, rewritable or overwritable. The following table illustrates the basic information of two PDs. In order to reduce confusion, the partition numbers are intentionally chosen so that they do not overlap with PartRef, although in real UDF volumes, partition numbers often starts from 0.

PD 1Partition Number7
Partition Start LSN600
Partition Length400
Access Typeread-only
PD 2Partition Number4
Partition Start LSN1000
Partition Length8000
Access Typeoverwritable

A partition map has a number of entries describing the logical properties of the partition. Each partition map has a partition number indicating which PD this partition map refers to. There are two types of partition maps: type 1 or type 2. A type 1 partition is simply the partition with the information described in the PD with the corresponding partition number. A type 2 partition can be a sparable partition, a virtual partition, a metadata partition, or a pseudo-overwrite partition. The following table gives a possible partition map defined for the above two PDs. The 0-based index of each map entry is called the Partition Reference Number. The UDF file system can write the partition map entries in any order, which may change PartRef accordingly.

PartRefPartition Parameters
0Partition Number4
Partition Map Type2
Partition TypeSparable Partition
Sparing Table Locations500 and 9000
1Partition Number7
Partition Map Type1
2Partition Number4
Partition Map Type2
Partition TypeMetadata Partition
Metadata File FE Location0
Metadata Mirror File FE Location7000
Metadata Bitmap File FE Location1

This partition map indicates that there are three partitions. The first partition (whose PartRef is 0) is a sparable partition backed by the overwritable partition described by the second PD. The second partition (whose PartRef is 1) is a type 1 read-only partition backed by the read-only partition described by the first PD. The third partition is the metadata partition residing in the sparable partition whose PartRef is 0, because both partition map entries have the same partition number 4. In this multi-partition scenario, each logical block is identified by the PartRef and the logical block number. For example, the 10th block in the metadata partition is identified by (PartRef=2, LBN=9), the first block in the read-only partition is identified by (PartRef=1, LBN=0).

9. UDF File and Directory Structure

  • File Entry and Extended File Entry

No matter how the underlying partition structures are defined, the file and directory structures of all UDF volumes are the same. The main metadata describing file and directory structures are called Information Control Block (ICB). Their size is at most one block, and their data structures are either File Entry (FE) or Extended File Entry (EFE). Besides FE/EFE, the Allocation Extent Descriptor (AED) is used to represent very fragmented files. EFE is introduced in UDF 2.00 to represent files with named streams. EFE is very similar to FE, except with a few additional fields.
The word "File" in FE/EFE is broader than the regular file in a conventional file system. It is used to represent a stream of bytes with some attributes. The file offset in the stream of bytes starts from 0 and until the end of the file. A FE/EFE may represent a file, a directory, a logical space holding extended attributes, a stream directory, a named stream, a symbolic link, a special device node, or even the whole metadata partition and the metadata bitmap. We still use the word "file" to represent the stream of bytes.
  • Extent-based Space Allocation

FE/EFE uses extent-based space allocation to indicate which blocks belong to this file. There are four formats of extents, which is indicated by the lowest 2 bits of flag in the ICB of the FE/EFE. Only three of them are used in UDF. The Extended Allocation Descriptors are not used in UDF. The three formats are:
  • Short Allocation Descriptors. The file data are in the same partition as the allocation descriptors. Echo allocation descriptor records the start logical block number and length of the extent.
  • Long Allocation Descriptors. The file data may be in a different partition as the allocation descriptors. In addition to the information recorded in the short allocation descriptors, the partition reference number is also recorded to indicate the partition that the extent resides.
  • Single Allocation Descriptors. The file data is embedded in the same block as the FE/EFE block.
Each extent may have three different types, indicated by the highest 2 bits of the extent length. The normal type is recorded-and-allocated. The type not-recorded-not-allocated is used to represent holes in sparse files. The type not-recorded-allocated is used to represent pre-allocated space that has not be initialized yet. The size of each extent must be an integral multiple of block size except for the last extent of the file.
  • Directory Structure

A directory is like a file but its file type in ICB has the directory bit set. The directory entries are variable size entries stored linearly in the file. Each entry is described by a File Identification Descriptor (FID). The first FID must has an empty name and its file characteristics must has the parent entry set. This is equivalent to the ".." entry in a FFS/UFS directory. But unlike FFS, UDF does not have a "." entry to represent itself. The FID has the file name (called File Identifier in UDF), the address of the FE/EFE of the file, and an optional variable size space for implementation use. When a file is deleted from a directory, the file characteristics of the FID is marked as deleted. The space left by deleted FIDs can be freely reused by new entries if applicable.
  • Free Space Management (Space Bitmap)

UDF uses space bitmap to manage free space, similar to many file systems. Its bitmap is described by Space Bitmap Descriptor (SBD), one of a few descriptors that can be larger than a block. SBD has a UDF tag, two length fields indicating the number of bits and number of bytes of the bitmap, followed by the free space bitmap. The bitmap must be stored contiguously on the volume. Since the bitmap is called Free Space bitmap, a bit 1 in the bitmap means the block is available for allocation, and a bit 0 means the space has be allocated. This is different from bitmaps used in regular file systems.
The file content of the Metadata Bitmap File is also a SBD. Its allocation on the underlying type 1 or sparable partition may be fragmented.
  • Extended Attributes

Extended Attributes (EAs) are used to store additional file attributes, such as Finder Info and resource fork on Mac, and ACL on NTFS. EAs can be stored in two different places: embedded EA space and external EA space.
The embedded EA space is the spare space in the file entry block after the fixed fields in a FE/EFE and before the allocation descriptors. It is fast to access but only small EAs can be stored here.
The external EA space is a special file entry (EA File Entry) that pointed by the main file/directory file entry. The EAs are stored in the logical space described by the EA File Entry. Each EA has a header and a variable size body. In the external EA space, all EAs are concatenated together. Each EA header in the external EA space must starts at the block boundary.
There are three types of EAs: standard EAs, implementation use EAs and application use EAs. Examples of standard EAs are file times EA (backup time, creation time etc), device specification EA for device node. Examples of implementation use EAs are Macintosh Finder Info and Resource Forks. Application use EAs are defined and used by applications. Every EA type has a special EA called unallocated space EA, used to occupy unused space left by other EAs or for padding purpose.
In both embedded and external EA space, EAs are always grouped together based on their types. The standard EAs are stored first, followed by the implementation use EAs if any, and then followed by the application use EAs if any.
Because all EAs are stored together in one logical space, if an EA in the middle of the external EA space grows, all EAs after it must be shifted. This makes the space allocation of external EA space complex. It is not a problem in a read-only media, but makes supporting EAs difficult in writable media. Fortunately, named streams are introduced in UDF 2.x and later which does not have the problems of EAs. Most implementation use EAs defined in UDF 1.x are stored in named streams in UDF 2.x and later.
  • Named Streams

Named streams are introduced in UDF 2.x. The concept of stream is similar to the concept of fork in Macintosh and stream in NTFS. Every file or directory stores their data in the main stream. An arbitrary number of named streams can be stored in a file or directory. Each stream has a name. If a file/dir has named streams, the Extended File Entry (EFE) must be used for the file/dir. EFE contains the address of the file entry of a stream directory. The content of a stream directory is the same as a normal directory. It starts with a parent entry without name, followed by variable size directory entries. Each directory entry in a stream directory points to a named stream file entry, which describes the logical space storing the named stream.

  • Some UDF Terminologies

UDF has defined far more terminologies than most conventional hard-drive based file system. One reason is that ECMA-167 attempts to define a framework of many possible file systems instead of a specific one. Another reason is that UDF not only defines the file system structure, but also the volume and partition structure. Some UDF (or ECMA-167) terminologies are very abstract for conventional file system implementers. Fortunately, many of them are not meaningful in UDF. Some terminologies that may cause confusion are briefly discussed below.
  • Boot Block: can be ignored in UDF.
  • Physical Sector: can be treated the same as logical sector in UDF.
  • Logical Sector: a sector on the media, addressed using the physical address.
  • Logical Block: a sector within a partition, addressed using the logical block number for this partition (from 0 to n-1, where n is the partition size). The size of the logical block equals the size of the logical sector. It is usually 2KB for most optical media. Some UDF implementations hard-coded 2KB and can not support other sector sizes. UDF does not define the sector size if it is used for hard drive. However, since the physical sector size is supposed to be used in UDF, most existing hard drives may need to use 512B as the sector size.
  • Identifier: means name. For example, Logical Volume Identifier is the name of the volume, File Identifier is the file name.
  • Volume Recognition Space: for most media, VRS starts from the start of the media (logical sector 0); for multi-session optical media, VRS starts from the start of the last data session.
  • Volume Space: all the usable space on the media.
  • Volume Set: can be treated as a logical volume, since a volume set only contains one logical volume in UDF.
  • Logical Volume: this can be treated as all the UDF data structures on the media. It includes some metadata structures outside of partitions, and data in the partition.
  • Partition: partition is where the file system data (files, directories, etc) resides. Only metadata can exist outside of a partition. A Logical Volume has one or more partitions for this purpose. Multiple partition schemes is a unique property not seen in other file systems. For now, you can consider a partition as a contiguous logical space, where logical blocks are numbered from 0 to n-1, where n is the partition size. More details of the partition will be discussed in UDF Partition Structure.
  • File Set: a file set means a directory tree starting from the root directory. In UDF, each logical volume has only one file set. The only exception is that on write-once media, multiple file-sets may be recorded to represent multiple archive images on the media. See UDF 2.3.2 for more details.
  • Descriptor: a descriptor describes a UDF metadata. A descriptor usually does not exceed the size of a sector, but there are a few exceptions (such as the bitmap and sparing table descriptors). All descriptors start with a 16 bytes header called Descriptor Tag.
  • Descriptor Tag: a 16-byte header of any descriptor. It mainly includes a tag id telling the type of the descriptor, the checksums of the descriptor (or part of the descriptor for long descriptors) and the checksums of the tag itself.
  • ICB (Information Control Block): some types of descriptors describe "files". These descriptors always have a 20-byte ICB Tag following the Descriptor Tag. Here a "file" means anything that manages a logical stream of bytes. It could be a regular user file, a directory, a named stream, an extended attribute, etc. A descriptor with an ICB Tag is similar to an inode in FFS/UFS/ext3. ECMA-167 defines several complex ICB hierarchy strategies. These are useless for UDF. Practically UDF only uses the simplest one: strategy 4. In strategy 4, only one ICB descriptor describes the file, and there is no ICB hierarchy at all.
  • Character Sets and Encoding: ECMA-167 defines 9 character set, CS0-CS9, UDF only uses CS0, which is actually unicode. In practice, all other character set definitions can be ignored by UDF. The unicode can either use 16-bit character or use 8-bit character (called a compressed form). The 8-bit character can only be used if the high 8 bit of the unicode character is all 0. The first byte of a string tells whether 8-bit or 16-bit characters are used in this string. 16-bit unicode strings use big-endian byte-ordering.
  • Record Structure: This can be ignore in UDF, since all files in UDF are considered as byte stream and no record structures are defined.
  • Byte-order: all on-disk structures of UDF use little-endian, i.e., the byte-order used by Intel x86 processors. There is one exception: for strings (volume names, file and directory names, etc), if 16-bit unicode characters are used, big-endian byte-order is used.





 

No comments:

Post a Comment