Implementation of extended attributes on the FAT file system |
This was originally written because of all the queries about theEA DATA. SF file, which was a frequent subject of discussion.I have tried to explain what this file does, why it exists, and what one should and should not do with it.Various people gave me extra information;particular thanks to Dean Gibson (73427.2072@compuserve.com) who figured out the format of theEA DATA. SF file and put me right on a few points.Some of the following information is due to Dean.
In the following, all numbers are decimal unless followed by an H (in which casethey are hexadecimal)
All versions of OS/2 (except versions 1.0 and 1.1) support the concept of 'extended attributes'(EAs) on files.These are used for all kinds of things, and can be very small or quite large(the limit is 64KB per file at present).EAs might represent a file type, a file classification, an icon type, some freetext...practically anything. One important use is for the storage of instance data for someclasses of Workplace Shell objects.
EAs are supported directly by the High Performance File System (HPFS). They are stored in an efficient manner; a small EA does not effectivelytake any additional space most of the time (typically, if it is lessthan several hundred bytes).
For backwards compatibility the old DOS (File Allocation Table, or FAT) file system needs to support EAs too. In order to do this, and at the same time keep the file system consistent for DOSif it is booted instead of OS/2 on the same machine, some trickery is needed.
FAT directory entries have ten spare bytes in them, starting at offset0CH (immediately after the filename and the attribute byte); these arenormally zero. They are there because originally the directory entrylayout was modelled on the CP/M file system, and these bytes (amongothers) were used to describe the location of the disk extents making upthe file; they aren't used for that purpose under DOS.Two of these spare bytes (at offsets 14H and 15H within the directory entry)are used to head a chain of disk allocation units (or clusters) which hold the EAs for that file.This caused interesting problems (for example) with early versions of the Norton Utilities,which flagged the directory entry as one with an 'illegal' format!
So, effectively an OS/2 FAT directory entry can head two chains of clusters;one for the file itself (as usual) and one for the EAs attached to thefile. The latter listhead is often null (indicated by zeros).
All this would be fine until you ran CHKDSK under DOS. It would findall these clusters holding the EAs, and because they would appear not tobelong to any file, they would be collected up and marked as 'lost'clusters to be added to the free list.Disaster would ensue the next time OS/2 looked at the file (well, eventually anyway)because the chances are that the clusters making up the EAs would have been allocated to another file bythat time.To prevent this, the file namedEA DATA. SF (the EA datafile) is used.This file is never meant to be read directly, and indeed it should never normally be backed upas a file.Its directory entry heads a chain of clusters (as usual), but these clustersare thesame ones that hold all the EAs on that file system.In other words, there are two references to every EA cluster; one via the file'sdirectory entry and one via the EA datafile.This makes the disk appear consistent under DOS; all of the clusters used on the disk belong to avalid file, and of course DOS will not see the second reference because it ignores the EA listheadin the directory entry.
Microsoft have said that the EA datafile is position dependent,and it shouldn't be manipulated or deleted; to make this hard, it has astrange name with spaces in it (which defeats a lot of software), and itis marked readonly, system and hidden.Observation has shown this not to be strictly true;it seems that you can back up and restore the file without any damage (of course, the EA datafile must correspond to the files on the disk; if you attempted to restore such a file on its ownwithout also restoring the various files that reference it, you wouldhave problems).The snag is that restored files won't generally havethe entire directory entry restored, so the head of the EA cluster chain(in offsets 14H and 15H) will be lost (set to zero).
Notice the implication for backup under OS/2. A proper, EA-aware backupprogramshould not back up the EA datafile; it simply reads the EAs foreach file as it is backed up, and of course it restores them the sameway - with the relevant system calls.So, the fact that OS/2 locks the EA datafile open is actually a benefit of sorts - it saves the file being backed up when its contents will never be needed; and in any case it would be semi-uselessunless the directory entries were also restored in their entirety.
The EA datafile is created when the first EA is attached to any file on the disk;try it out with a diskette.It also takes one cluster (the first one) for some kind ofinternal housekeeping information. I suspected that this cluster is somekind of map similar to the FAT, chaining together the clusters relatingto one file within the EA datafile; if so, it would probably expand ifyou had a lot of EAs on your disk. Dean Gibson figured out a lot moreabout the format of the file; the details are given later.
EAs are removed from the EA datafile if the file to which they areattached is deleted; thisonly applies if deletion takes place underOS/2 (including DOS sessions).If deleted under vanilla DOS, the EA datafile retains the 'lost' EA clusters; they can be reclaimed byrunning CHKDSK under OS/2.
All this of course plays havoc with defragmenters. They have to workround all of the scattered, immobile clusters making up the EA datafile. Yes, it's a kludge; but quite a good one, given the constraint that ithas to look OK under normal DOS as well as provide the functionalityunder OS/2.
Most of this information came from Dean Gibson - many thanks, Dean! I havemade the occasional addition.
The actualEA DATA. SF file format is as follows. All references to 'words'mean 16 bit quantities.
Given a non-zero 16 bit EA pointer 'X' in a FAT system directory entry(in offsets 14H and 15H):
In order to keep theEA DATA. SF file logically contiguous when table B isexpanded into a new cluster or when an EA is deleted, the FAT cluster chainforEA DATA. SF is altered, and values in table A and/or segments of table Bare changed to reflect this.
The first word of the EA sector is for identification and contains the ASCIIcharacters 'EA'; the next word is the relative sector number of this sector(consistency check); then the next two words are zero; the next twelve bytescontain the target file name (no path); the next word has an as yetundeciphered meaning; then the next two words are zero; followed by the EAdata for the target file. The first word of the EA data is the length of theEA data in bytes, including the count word.