04/28/99

KDump - Ken's Dump Utility

What it is
Dump Image Format
1. The Data File
2. The Index File
The Warehouse
The Dump Process
The Restore Process
The Kdump Command
The Krest Command
The Recover Utility

What it is

Ken's Dump Utility is designed to overcome some of the shortcomings associated with the standard Berkeley dump utility as it was used on the Uniform Access computers. To wit:

UIDs greater than 65,535 were not supported.
There was no good way to get an index of what dump images contained a particular file or files.
Users could not initiate the restore process themselves.
A restore of a file near the end of a 9 gigabyte dump image required reading the entire dump image across the network.
There was no optimal way to restore a single directory and its contents from an incremental dump image. If not all files changed since the previous lower level dump, you needed to go to the parent dump(s), but that got files that had been deleted between the full and the incremental.

Ken's Dump Utility is not designed to do a system level dump of the root partition. That should be done with mksysb or the equivalent. Kdump will work best with user filesystems, particularly ones that require periodic restores on selected files or directories.

The alternatives:

IBM's ADSM product
- Rumored to not scale well to 60,000 users.
- ADSM maintains a number of versions of files rather than point-in-time snap shots. If you have a file that changes every day you have to keep 7 versions of each file if you want to be able to restore things to "what it looked like last week". For source code, you quite often want to restore things to "what it looked like six months ago" -- you don't want to keep 180 copies of each file.
Legato's Networker product
- Rumored to have high per-client costs.
- STK silos and tape drives don't appear to be supported. STK says yes, but Legato didn't return calls.

Dump Image Format

Each dump image consists of two "files". There's the data file and the index file. The index file contains the directory information from the dumped filesystem including the basic ownership and permissions on individual objects plus their size and last modification and a pointer into the data file. The data file contains the actual byte stream of each file plus any ancillary information such as ACLs.

The index file is generated in memory by the kdump utility as the filesystem is being dumped and is sent to the storage warehouse after the data file is fully transferred. The restore process first reads in the index file and then accesses the necessary portions of the data file. For this reason the warehouse has to store the index file on a random access file store or at least on separate tapes. Additionally the index file could be kept on the local system for ease of access. All integer fields in the index file are stored in network order using the ntohl() and ntohs() macros so dumps from one byte order architecture can be accessed on the other and vice versa.

The Data File

The data file is divided up into packets. Each packet consists of a header followed by a number of data bytes as specified in the header.

Data Packet: pkt_t structure
Field Length Comment

pkt_type int 16 Packet type

pkt_fmt int 16 Compression and encryption method

pkt_len int 32 Length of data field

Data variable File or ACL data

Data Packet: pkt_t structure
Field	Length	Comment
pkt_type	int 16	Packet type
pkt_fmt	int 16	Compression and encryption method
pkt_len	int 32	Length of data field
Data	variable	File or ACL data

The packet types currently defined are:

PK_DATA: Normal data
PK_EOD: End of data / last data packet
PK_ACL: ACL data
PK_SKIP: Skip sparse area

The packet formats currently defined are:

PK_F_NORMAL: Normal uncompressed/unencrypted data
PK_F_IBMACL: IBM binary format ACL
PK_F_LSCCRYPT: Data encrypted with LSC key

Each packet is separately compressed or encrypted based on whatever method the dumping utility decides would be best. There is no practical limit on the total size of the data file (all references are with 64 bit pointers -- I guess that's 9 thousand exabytes), but each packet can be no larger than 2 gigabytes. A file can span multiple packets. In practice a packet will likely be limited to a megabyte or two so that it can fit into a memory buffer on the dumping system. The restoring system should not need to hold an entire packet in memory so should plan for the ultimate 2 gigabyte packet.

The Index File

The index file starts out with a header:

Index Header: kdh_t structure
Field Length Comment

kdh_version int 32 Dump version signiture

kdh_root kdd_t Root directory offset and size

kdh_dtime int 32 Timestamp of dump image

kdh_ptime int 32 Timestamp of parent image

kdh_parent 80 char Name of parent dump

kdh_mount 80 char Original mount point

kdh_key 32 char Encryption key name

Index Header: kdh_t structure
Field	Length	Comment
kdh_version	int 32	Dump version signiture
kdh_root	kdd_t	Root directory offset and size
kdh_dtime	int 32	Timestamp of dump image
kdh_ptime	int 32	Timestamp of parent image
kdh_parent	80 char	Name of parent dump
kdh_mount	80 char	Original mount point
kdh_key	32 char	Encryption key name

The header is followed by a sequence of directories. The root directory is first followed by the rest of the directories in depth first order. Each directory consists of a sequence of directory entries followed by a variable length data block. The data block contains the filenames along with other ancillary information.

Directory Entry: kde_t structure
Field Length Comment

kde_type int 16 Entry type

kde_len int 16 Length of entry

kde_name int 32 Offset (from top of directory) to name

kde_owner int 32 Owner UID

kde_group int 32 Owner GID

kde_mode int 32 File protection modes

kde_mtime int 32 File modification time

kde_ctime int 32 File control time

kde_flags int 32 Sundry and various flags

kde_data variable Variable length data field

Directory Entry: kde_t structure
Field	Length	Comment
kde_type	int 16	Entry type
kde_len	int 16	Length of entry
kde_name	int 32	Offset (from top of directory) to name
kde_owner	int 32	Owner UID
kde_group	int 32	Owner GID
kde_mode	int 32	File protection modes
kde_mtime	int 32	File modification time
kde_ctime	int 32	File control time
kde_flags	int 32	Sundry and various flags
kde_data	variable	Variable length data field

The kde_data field depends on the basic type of the entry. It is one of the following:

Normal Directory (KDET_DIR): kdd_t structure
Field Length Comment

kdd_aoff0,1 int 64 Data file offset to ACL

kdd_off int 32 Index file offset to directory

kdd_len int 32 Length of directory

Normal Directory (KDET_DIR): kdd_t structure
Field	Length	Comment
kdd_aoff0,1	int 64	Data file offset to ACL
kdd_off	int 32	Index file offset to directory
kdd_len	int 32	Length of directory

Normal File (KDET_FILE): kdf_t structure
Field Length Comment

kdf_inode int 32 Inode number

kdf_len0,1 int 64 Length of file

kdf_off0,1 int 64 Offset to data

Normal File (KDET_FILE): kdf_t structure
Field	Length	Comment
kdf_inode	int 32	Inode number
kdf_len0,1	int 64	Length of file
kdf_off0,1	int 64	Offset to data

Symbolic Link (KDET_SLINK): kds_t structure
Field Length Comment

kds_len int 32 Length of symbolic link text

kds_off0,1 int 64 Offset to data

Symbolic Link (KDET_SLINK): kds_t structure
Field	Length	Comment
kds_len	int 32	Length of symbolic link text
kds_off0,1	int 64	Offset to data

The last directory entry is followed by a terminator record with a kde_type field value of KDET_LAST. This terminator record is only 4 bytes long, just enough to hold the kde_type field and is followed by the file names as a sequence of zero byte terminated character strings.

The Warehouse

All the dump images are piped across the network to the warehouse dæmon and stored on whatever media is appropriate. The warehouse dæmon supports several basic functions:

Create new image
Access existing image
Open data stream
Open index stream
Position and read N bytes
Close current stream
Commit image

The dump process creates a new image, opens the data stream, dumps the data file information, closes the data stream, opens the index stream, dumps the index file information, closes the index stream and then commits the image.

The restore process accesses the image, opens the index stream, reads a hunk of the index, presents a list of files to the operator to select, reads additional hunks of the index as required to fetch the entries that the operator references, closes the index stream, opens the data stream, positions and reads each of the selected files and then closes the data stream. Multiple adjacent selected files are retrieved with a single position request by specifying a length of the read that encompasses more than one file. Position and read requests will be queued up by the warehouse dæmon so they can all be sent prior to retrieving any of the actual data in order to better utilize the intervening network.

A suitable warehouse dæmon and its API are described in Ken's Archive Reference Manual.

The Dump Process

The dump process starts at the root of the particular filesystem (or a subdirectory if that's desired) and recursively goes through the directory structure, building the information for the index file in memory as it goes. All files found that have a ctime later than the time of the last lower level dump are sent to the warehouse as they're encountered. If this is a zero level dump then that will be all files. Files with an earlier ctime are not dumped, but are put into the index image with the KDEF_PRIOR flag set in its kde_flags field. After the entire filesystem has been processed, the data stream is closed and the index is flushed out to the warehouse and optionally to a local disk.

Files with multiple hard links require special processing. When such a file is first encountered, its inode number is saved in a temporary list. When it is encountered a second time the file is not dumped a second time. Instead, the pointer in the index entry, kdf_off0,1, for the second instance is merely copied from the first instance. The index entry for both of these files is given a special flag indicating that there was more than one hard link involved. When the restoring process restores these entries, it needs to remember where the first instance is and then link to it rather than rereading the data file if both instances happen to be restored. If only one instance is restored during a partial restore, it won't be linked to anything. C'est la vie.

The Restore Process

To restore a file or group of files, one must first specify the dump image to be referenced. That dump image would most likely be the most recent one created prior to the file being destroyed. If the file is actually on a previous fuller dump, the restore process will be able to figure that out. The latest dump image index file has an entry for all the files that were present at the time of the dump whether they were dumped or not. The files that appear on parent dumps are indicated by the KDEF_PRIOR flag in their kde_flags word.

Once the dump image is specified, the individual file or files to be restored need to be specified. This can be done with a GUI point and click or a command line interface. The restore process needs to read the dump image index file to get the directories for the root down to the individually specified files or directories. Application bits in the kde_flags word are used to indicate which files need to be restored and which ones can be ignored. These are temporary flags that the restore process sets on its memory resident copy of the index file. Only that portion of the index file necessary to find the selected files needs to be loaded into memory.

Once all the files to be restored have been selected, the restore process makes two sequential passes through the index file to get the individual files that need to be restored. The first pass generates "position and read" requests that are sent to the warehouse server. The second pass creates the files and populates them from the data file stream. The restore utility is designed to allow these two passes to be done concurrently in separate threads on systems that support threads.

Going through the index entries sequentially gets the files from the dump image data file in a mostly sequential manner. If a file with more than one hard link is dumped from one directory that is not being restored and then referenced from a directory that is being restored, the restore process may have to skip back to an earlier point in the dump image data file to retrieve the file. If the warehouse doesn't support random access to the data files, the restore process will have to do a prescan to find all the instances of this in order to access the data file sequentially. HPSS does allow random access, so the first cut of the restore process doesn't support unsupported random access.

If the specified dump image was an incremental and there are files that need to be restored from parent dumps, those files need to be found on the parent dumps. The control time on the dumped file will indicate which dump image we need to go back to so we may be able to skip intervening incremental dumps that we know won't have the file. Unfortunately the file may have had a different name when it was originally dumped. To find it, the restore process needs to find a file with a matching inode number. First we look in the last directory we looked in on any previous search. If an entire directory got renamed there may be several of these in a row. Next the directory where the file existed at the time of the later dump is searched (assuming it exists on the parent dump). If we still haven't found it we do a sequential recursive search through the whole filesystem. On a selective restore we may actually end up reading more of the parent dump image index file than the later one. Using the earliest dump image possible will be most efficient.

If the warehouse doesn't support access to multiple dump image data files at the same time (eg: because of limited staging space or a limited number of tape drives with no staging space) the restore process may have to simply mark the files required on parent dumps and grab them on another pair of passes through the index data.

The Kdump Command

The syntax of the kdump utility is:

Usage: kdump [-options parameters]
        -b: Internal buffer size [4194304]
	-c: Comment to go with dump [NONE]
        -d: Dump dates file [/etc/kdump.dat]
	-e: Encryption level [Default]
        -f: Index file to create [NONE].
	-i: Interactive status mode [YES].
	-k: Encryption key [default].
        -l: Dump level (0=full, 1=incremental, 2=...) [0].
        -m: Mount point of filesystem to dump [NONE].
        -n: Name of dump image [host.fs.year.month.day.hour.level].
	-p: Port on warehouse server.
	-s: Name of the warehouse server.
        -u: Update dump dates file [NO].
        -v: Verbose mode time interval [10 minutes].
        -x: Exception list file [e_list]

-b buffer_size: The default buffer size for building packets prior to sending them to the warehouse is two megabytes. This can be changed with the -b option. This will also affect the maximum sized packet in the final data file image.
-c comment: Each dump image can have an associated comment as a text string. The maximum length of a comment is dependant on various conditions but can be at least 80 characters long.
-d /etc/kdump.dat: The /etc/kdump.dat file parallels the /etc/dumpdates file that the Berkeley dump utility uses. It merely holds the timestamp for the last dump at a particular level for each mount point so we can do incremental dumps.
-e level: The encryption level specified can be some combination of index, private and public or default, all or none. Specifying private will cause all the private files (those without world read permission) to be encrypted on the data file. Public will encrypt all files whether they're readable or not and index will encrypt the whole index file, preventing any network sniffers from finding out what files were dumped. If the keyword default or no -e option is specified, kdump will use the default encryption level as specified in the /usr/local/lib/karc/conf file. If there is no encrypt directive in the configuration file, the default is none.
-f local_file: If desired, the index file can be written to a file on the local system. Ken's Archive warehouse can also write the index file to a local disk that can be NFS exported to trusted clients.
-i: By default, if the dump is done interactively, the name of the current directory will be printed out as it is dumped. The -i option will toggle the display of these messages.
-k key_name: The named key is used to encrypt files before being sent to the warehouse. The name of the key is also stored in the index header so the restore utility knows which key to use when it extracts the files. The key must be available to both the dumping and restoring systems. If unspecified, the key named on the encrypt directive of the karc configuration file is used.
-l level: A level zero dump is a full dump and dumps everything. A level one dump dumps everything modified since the timestamp for the previous level zero dump. A level two dump dumps everything modified since the last level one dump, etc. The timestamps for the lower numbered dumps are retrieved from the file specified with the -d option.
-m mount_point: The mount point must be specified. The -m is optional.
-n name: The name is used to retrieve the dump image later. The default naming convention allows one to determine all the necessary info in order to retrieve particular files later on.
-p port: The port to use on the warehouse server can be specified to override the default port. This is used as a debugging aid.
-s host: The name of the server to receive the dump image can be specified to override the default server.
-u: By default the dumpdates file will not be updated. Specifying the -u option will update the file for reference on future higher level dumps.
-v minutes: Status messages indicating the number of files and the number of bytes sent to the warehouse are sent to stderr periodically. The -v option can be used to alter the default 10 minute period.
-x file: The exception list specifies those files or directories that should not be dumped by kdump. Normally this file is kept at the top level of each filesystem that might contain files that should not be dumped.

The Krest Command

The krest command provides a screen mode curses menu to select individual files to be retrieved from a dump image. The syntax of the krest utility is:

Usage: krest [-option parameters] name
	-c: Initial current directory [filesystem root].
        -d: Index directory [*].
        -g: Group [*].
        -h: Host [*].
        -k: Key [*].
	-l: Line mode [no (screen)].
        -m: Mount point [NONE].
        -o: Overwrite existing files [no].
	-u: Alternate user
        -v: Verbose mode [no].

Only the name of the most recent dump image needs to be specified. The krest utility will figure out the parent dump(s) and access them as needed.

-c directory: By default the krest utility will start out displaying the root directory for the dump image. The -c option can be used to specify a subdirectory of this. The screen mode version of krest will not access any directories above the initial directory.
-d directory: The specified directory is checked first to find the index files prior to attempting to download them from the warehouse.
-g group

The Recover Utility

The krest utility will extract files from a particular dump image. The problem is figuring out which dump image to access. That's not something that the normal user is going to be able to figure out without some help. The recover utility provides that help.

The Bdump Utility takes care of scheduling and pruning dumps. It also maintains a list of dump images on the local filesystems that get dumped. This list of dump images is accessed by the recover utility. Recover will prompt the user through a sequence of menus to select the appropriate dump image and then invoke krest on that dump image. For a detailed description of recover, see The Recover User's Guide.

Ken Lowe
Email -- ken@u.washington.edu
Web -- http://staff.washington.edu/krl/