 |
04/28/99 |
KDump - Ken's Dump Utility
- What it is
- Dump Image Format
- The Data File
- The Index File
- The Warehouse
- The Dump Process
- The Restore Process
- The Kdump Command
- The Krest Command
- The Recover Utility
Ken's Dump Utility is designed to overcome some of the shortcomings
associated with the standard Berkeley dump utility as it was used
on the Uniform Access computers. To wit:
- UIDs greater than 65,535 were not supported.
- There was no good way to get an index of what dump images
contained a particular file or files.
- Users could not initiate the restore process themselves.
- A restore of a file near the end of a 9 gigabyte dump image
required reading the entire dump image across the network.
- There was no optimal way to restore a single directory and
its contents from an incremental dump image. If not all
files changed since the previous lower level dump, you needed
to go to the parent dump(s), but that got files that had been
deleted between the full and the incremental.
Ken's Dump Utility is not designed to do a system level dump of
the root partition. That should be done with mksysb or the
equivalent. Kdump will work best with user filesystems, particularly
ones that require periodic restores on selected files or directories.
The alternatives:
- IBM's ADSM product
- Rumored to not scale well to 60,000 users.
- ADSM maintains a number of versions of files rather than
point-in-time snap shots. If you have a file that changes
every day you have to keep 7 versions of each file if you
want to be able to restore things to "what it looked like
last week". For source code, you quite often want to restore
things to "what it looked like six months ago" -- you don't
want to keep 180 copies of each file.
- Legato's Networker product
- Rumored to have high per-client costs.
- STK silos and tape drives don't appear to be supported. STK
says yes, but Legato didn't return calls.
Each dump image consists of two "files". There's the data file and the
index file. The index file contains the directory information from the
dumped filesystem including the basic ownership and permissions on
individual objects plus their size and last modification and a pointer
into the data file. The data file contains the actual byte stream of
each file plus any ancillary information such as ACLs.
The index file is generated in memory by the kdump utility as the filesystem
is being dumped and is sent to the storage warehouse after the data file
is fully transferred. The restore process first reads in the index file
and then accesses the necessary portions of the data file. For this reason
the warehouse has to store the index file on a random access file store
or at least on separate tapes. Additionally the index file could be kept
on the local system for ease of access. All integer fields in the index
file are stored in network order using the ntohl() and ntohs() macros so
dumps from one byte order architecture can be accessed on the other and
vice versa.
The data file is divided up into packets. Each packet consists of a
header followed by a number of data bytes as specified in the header.
Data Packet: pkt_t structure
Field | Length | Comment |
pkt_type | int 16 | Packet type |
pkt_fmt | int 16 | Compression and encryption method |
pkt_len | int 32 | Length of data field |
Data | variable | File or ACL data |
The packet types currently defined are:
- PK_DATA: Normal data
- PK_EOD: End of data / last data packet
- PK_ACL: ACL data
- PK_SKIP: Skip sparse area
The packet formats currently defined are:
- PK_F_NORMAL: Normal uncompressed/unencrypted data
- PK_F_IBMACL: IBM binary format ACL
- PK_F_LSCCRYPT: Data encrypted with LSC key
Each packet is separately compressed or encrypted based on whatever
method the dumping utility decides would be best. There is no
practical limit on the total size of the data file (all
references are with 64 bit pointers -- I guess that's 9 thousand
exabytes), but each packet can be no larger than 2 gigabytes. A file
can span multiple packets. In practice a packet will likely be limited
to a megabyte or two so that it can fit into a memory buffer on the
dumping system. The restoring system should not need to hold an entire
packet in memory so should plan for the ultimate 2 gigabyte packet.
The index file starts out with a header:
Index Header: kdh_t structure
Field | Length | Comment |
kdh_version | int 32 | Dump version signiture |
kdh_root | kdd_t | Root directory offset and size |
kdh_dtime | int 32 | Timestamp of dump image |
kdh_ptime | int 32 | Timestamp of parent image |
kdh_parent | 80 char | Name of parent dump |
kdh_mount | 80 char | Original mount point |
kdh_key | 32 char | Encryption key name |
The header is followed by a sequence of directories. The root directory
is first followed by the rest of the directories in depth first order.
Each directory consists of a sequence of directory entries followed by
a variable length data block. The data block contains the filenames
along with other ancillary information.
Directory Entry: kde_t structure
Field | Length | Comment |
kde_type | int 16 | Entry type |
kde_len | int 16 | Length of entry |
kde_name | int 32 | Offset (from top of directory) to name |
kde_owner | int 32 | Owner UID |
kde_group | int 32 | Owner GID |
kde_mode | int 32 | File protection modes |
kde_mtime | int 32 | File modification time |
kde_ctime | int 32 | File control time |
kde_flags | int 32 | Sundry and various flags |
kde_data | variable | Variable length data field |
The kde_data field depends on the basic type of the entry. It is one of the
following:
Normal Directory (KDET_DIR): kdd_t structure
Field | Length | Comment |
kdd_aoff0,1 | int 64 | Data file offset to ACL |
kdd_off | int 32 | Index file offset to directory |
kdd_len | int 32 | Length of directory |
Normal File (KDET_FILE): kdf_t structure
Field | Length | Comment |
kdf_inode | int 32 | Inode number |
kdf_len0,1 | int 64 | Length of file |
kdf_off0,1 | int 64 | Offset to data |
Symbolic Link (KDET_SLINK): kds_t structure
Field | Length | Comment |
kds_len | int 32 | Length of symbolic link text |
kds_off0,1 | int 64 | Offset to data |
The last directory entry is followed by a terminator record with a kde_type
field value of KDET_LAST. This terminator record is only 4 bytes long, just
enough to hold the kde_type field and is followed by the file names as a
sequence of zero byte terminated character strings.
All the dump images are piped across the network to the warehouse
dæmon and stored on whatever media is appropriate. The warehouse
dæmon supports several basic functions:
- Create new image
- Access existing image
- Open data stream
- Open index stream
- Position and read N bytes
- Close current stream
- Commit image
The dump process creates a new image, opens the data stream, dumps the
data file information, closes the data stream, opens the index stream,
dumps the index file information, closes the index stream and then commits
the image.
The restore process accesses the image, opens the index stream, reads
a hunk of the index, presents a list of files to the operator to
select, reads additional hunks of the index as required to fetch the
entries that the operator references, closes the index stream, opens
the data stream, positions and reads each of the selected files and
then closes the data stream. Multiple adjacent selected files are
retrieved with a single position request by specifying a length of the
read that encompasses more than one file. Position and read requests
will be queued up by the warehouse dæmon so they can all be sent
prior to retrieving any of the actual data in order to better utilize
the intervening network.
A suitable warehouse dæmon and its API are described in
Ken's Archive Reference Manual.
The dump process starts at the root of the particular filesystem (or a
subdirectory if that's desired) and recursively goes through the
directory structure, building the information for the index file in
memory as it goes. All files found that have a ctime later than
the time of the last lower level dump are sent to the warehouse as
they're encountered. If this is a zero level dump then that will be
all files. Files with an earlier ctime are not dumped, but are
put into the index image with the KDEF_PRIOR flag set in its
kde_flags field. After the entire filesystem has been
processed, the data stream is closed and the index is flushed out to
the warehouse and optionally to a local disk.
Files with multiple hard links require special processing. When such a
file is first encountered, its inode number is saved in a temporary
list. When it is encountered a second time the file is not dumped a
second time. Instead, the pointer in the index entry,
kdf_off0,1, for the second instance is merely copied from the
first instance. The index entry for both of these files is given a
special flag indicating that there was more than one hard link
involved. When the restoring process restores these entries, it needs
to remember where the first instance is and then link to it rather than
rereading the data file if both instances happen to be restored. If
only one instance is restored during a partial restore, it won't be
linked to anything. C'est la vie.
To restore a file or group of files, one must first specify the dump
image to be referenced. That dump image would most likely be the most
recent one created prior to the file being destroyed. If the file is
actually on a previous fuller dump, the restore process will be
able to figure that out. The latest dump image index file has an entry
for all the files that were present at the time of the dump whether
they were dumped or not. The files that appear on parent dumps are
indicated by the KDEF_PRIOR flag in their kde_flags word.
Once the dump image is specified, the individual file or files to be
restored need to be specified. This can be done with a GUI point and
click or a command line interface. The restore process needs to
read the dump image index file to get the directories for the root down
to the individually specified files or directories. Application bits in
the kde_flags word are used to indicate which files need to
be restored and which ones can be ignored. These are temporary flags
that the restore process sets on its memory resident copy of the index
file. Only that portion of the index file necessary to find the selected
files needs to be loaded into memory.
Once all the files to be restored have been selected, the restore
process makes two sequential passes through the index file to get the
individual files that need to be restored. The first pass generates
"position and read" requests that are sent to the warehouse server.
The second pass creates the files and populates them from the data file
stream. The restore utility is designed to allow these two passes to
be done concurrently in separate threads on systems that support
threads.
Going through the index entries sequentially gets the files from the
dump image data file in a mostly sequential manner. If a file with
more than one hard link is dumped from one directory that is not being
restored and then referenced from a directory that is being restored,
the restore process may have to skip back to an earlier point in the
dump image data file to retrieve the file. If the warehouse doesn't
support random access to the data files, the restore process will have
to do a prescan to find all the instances of this in order to access
the data file sequentially. HPSS does allow random access, so the
first cut of the restore process doesn't support unsupported random
access.
If the specified dump image was an incremental and there are files that
need to be restored from parent dumps, those files need to be found on
the parent dumps. The control time on the dumped file will indicate
which dump image we need to go back to so we may be able to skip
intervening incremental dumps that we know won't have the file.
Unfortunately the file may have had a different name when it was
originally dumped. To find it, the restore process needs to find a
file with a matching inode number. First we look in the last directory
we looked in on any previous search. If an entire directory got
renamed there may be several of these in a row. Next the directory
where the file existed at the time of the later dump is searched
(assuming it exists on the parent dump). If we still haven't found it
we do a sequential recursive search through the whole filesystem. On a
selective restore we may actually end up reading more of the parent
dump image index file than the later one. Using the earliest dump
image possible will be most efficient.
If the warehouse doesn't support access to multiple dump image data
files at the same time (eg: because of limited staging space or a
limited number of tape drives with no staging space) the restore
process may have to simply mark the files required on parent dumps and
grab them on another pair of passes through the index data.
The syntax of the kdump utility is:
Usage: kdump [-options parameters]
-b: Internal buffer size [4194304]
-c: Comment to go with dump [NONE]
-d: Dump dates file [/etc/kdump.dat]
-e: Encryption level [Default]
-f: Index file to create [NONE].
-i: Interactive status mode [YES].
-k: Encryption key [default].
-l: Dump level (0=full, 1=incremental, 2=...) [0].
-m: Mount point of filesystem to dump [NONE].
-n: Name of dump image [host.fs.year.month.day.hour.level].
-p: Port on warehouse server.
-s: Name of the warehouse server.
-u: Update dump dates file [NO].
-v: Verbose mode time interval [10 minutes].
-x: Exception list file [e_list]
- -b buffer_size
-
The default buffer size for building packets prior to sending
them to the warehouse is two megabytes. This can be changed
with the -b option. This will also affect the maximum
sized packet in the final data file image.
- -c comment
-
Each dump image can have an associated comment as a text string.
The maximum length of a comment is dependant on various conditions
but can be at least 80 characters long.
- -d /etc/kdump.dat
-
The /etc/kdump.dat file parallels the /etc/dumpdates file that
the Berkeley dump utility uses. It merely holds the timestamp
for the last dump at a particular level for each mount point
so we can do incremental dumps.
- -e level
-
The encryption level specified can be some combination of
index, private and public or default,
all or none. Specifying private will cause all
the private files (those without world read permission) to be
encrypted on the data file. Public will encrypt all files
whether they're readable or not and index will encrypt the whole
index file, preventing any network sniffers from finding out
what files were dumped. If the keyword default or no
-e option is specified, kdump will use the default encryption
level as specified in the
/usr/local/lib/karc/conf
file. If there is no encrypt directive in the configuration
file, the default is none.
- -f local_file
-
If desired, the index file can be written to a file on the local
system. Ken's Archive warehouse can also write the index file
to a local disk that can be NFS exported to trusted clients.
- -i
-
By default, if the dump is done interactively, the name of the
current directory will be printed out as it is dumped. The -i
option will toggle the display of these messages.
- -k key_name
-
The named key is used to encrypt files before being sent to the
warehouse. The name of the key is also stored in the index
header so the restore utility knows which key to use when it
extracts the files. The key must be available to both the dumping
and restoring systems. If unspecified, the key named on the
encrypt directive of the karc configuration file is used.
- -l level
-
A level zero dump is a full dump and dumps everything. A level
one dump dumps everything modified since the timestamp for the
previous level zero dump. A level two dump dumps everything
modified since the last level one dump, etc. The timestamps for
the lower numbered dumps are retrieved from the file specified
with the -d option.
- -m mount_point
-
The mount point must be specified. The -m is optional.
- -n name
-
The name is used to retrieve the dump image later. The default
naming convention allows one to determine all the necessary info
in order to retrieve particular files later on.
- -p port
-
The port to use on the warehouse server can be specified to override
the default port. This is used as a debugging aid.
- -s host
-
The name of the server to receive the dump image can be specified
to override the default server.
- -u
-
By default the dumpdates file will not be updated. Specifying the
-u option will update the file for reference on future higher
level dumps.
- -v minutes
-
Status messages indicating the number of files and the number of
bytes sent to the warehouse are sent to stderr periodically. The
-v option can be used to alter the default 10 minute period.
- -x file
-
The exception list specifies those files or directories that should
not be dumped by kdump. Normally this file is kept at the top level
of each filesystem that might contain files that should not be dumped.
The krest command provides a screen mode curses menu to select
individual files to be retrieved from a dump image. The
syntax of the krest utility is:
Usage: krest [-option parameters] name
-c: Initial current directory [filesystem root].
-d: Index directory [*].
-g: Group [*].
-h: Host [*].
-k: Key [*].
-l: Line mode [no (screen)].
-m: Mount point [NONE].
-o: Overwrite existing files [no].
-u: Alternate user
-v: Verbose mode [no].
Only the name of the most recent dump image needs to be specified. The
krest utility will figure out the parent dump(s) and access them as
needed.
- -c directory
-
By default the krest utility will start out displaying the
root directory for the dump image. The -c option can
be used to specify a subdirectory of this. The screen mode
version of krest will not access any directories above the
initial directory.
- -d directory
-
The specified directory is checked first to find the index
files prior to attempting to download them from the warehouse.
- -g group
-
Specifies the Karc group. See:
Ken's Archive Reference Manual.
- -h host
-
Specifies the Karc host. The default asterisk uses the
first token (up to a period) of the dump image name.
- -k key
-
Specifies the Karc key. See:
Ken's Archive Reference Manual.
- -l
-
By default the krest utility will present menus via the
curses screen mode package. The -l option
can be used to select line mode. In line mode a
subset of the Berkeley restore utility's commands are
supported.
- add file
-
Add a file or directory to the extraction list.
- cd dir
-
Change the current working directory.
- extract
-
Restore the files on the extraction list.
- ls
-
Print out the contents of the current working directory.
- quit
-
Exit the utility.
- -m mount_point
-
The mount point can be specified if desired to produce a
warning error message if it doesn't match the actual mount
point as specified on the dump image. Normally this is not
specified, but is set to the mount point of the specified
dump image and is then verified when referencing subsequent
fuller dump images.
- -o
-
Overwrite existing files. By default the restore will abort
if an existing file or directory of the same name already
exists in the restore path. The overwrite flag can be used
to destroy conflicting files.
- -u user
-
The superuser can specify an alternate user on the krest command.
If the user is specified, krest will set the effective UID to that
user. As a result, all extracted files will be owned by that user
and any attempts to create files will be limited to that user's
disk quota.
- -v
-
The verbose flag causes extra information to be displayed on
output as each file is restored.
The krest utility will extract files from a particular dump image. The
problem is figuring out which dump image to access. That's not something
that the normal user is going to be able to figure out without some help.
The recover utility provides that help.
The Bdump Utility takes care of scheduling
and pruning dumps. It also maintains a list of dump images on the
local filesystems that get dumped. This list of dump images is accessed
by the recover utility. Recover will prompt the user through a sequence
of menus to select the appropriate dump image and then invoke
krest on that dump image. For a detailed description of recover, see
The Recover User's Guide.
Ken Lowe
Email --
ken@u.washington.edu
Web --
http://staff.washington.edu/krl/