The underlying mechanism is a general purpose, fault tolerant transaction engine. This is the same basic engine that is used to support our passwd file synchronization and email server aliasing.
The actual database is a simple a DBM database with the name acting as the key and the value acting as the content. This is backed up by the transaction logs that the transaction engine maintains. The transaction logs can be replayed at any time to rebuild the database.
knsX_group_member
which encodes the following information:0: | Level zero is unprivileged and can only access unprivileged commands. | |
2: | Level 2 is required to retrieve values. | |
3: | Level 3 is required to change values. | |
4: | Level 4 is required to delete existing names. | |
5: | Level 5 is required to create new names. | |
A: | Level 10 is required to checkpoint the database. | |
C: | Level 12 is required to checksum the database. | |
F: | The master servers all run at level 15. |
The clients who want to update the database will connect to the master KNS server. Commands coming into the master that affect the database in some way will be assigned the next transaction number by the master (after it validates that it is a legitimate request). This transaction will then be passed to each of the chiefs and then processed by the master itself and then written to the master's log. Each chief (and slave) in turn will pass the command to its slaves, process the command and log it. After the transaction number is assigned and passed to one or more of the chiefs, the master will bail out if it encounters some sort of error such as a full disk condition. One of the chiefs it sent that command to will now have the highest sequence number and that chief will become the new master. When the failing server is all better and reconnects to the new master as a chief, that master will pass the transaction that failed back to it and it will attempt to reprocess it.
When a server comes up for the first time or after being down for awhile it reads its configuration file to get the list of potential master servers and its log file to determine its last sequence number. It then sends out an "I am here" datagram to all the known servers asking for a suitable server to connect to. If it gets no answer, and it's a potential master itself, it will increment a counter and send its query again. While it's waiting for a response, if it receives an "I am here" query from another server with a higher sequence number it will reset its counter back down. Eventually one server will increment its counter to three and declare itself the master.
When the master server receives an "I am here" query it will reply with an "Attach to me" response unless it already has too many clients in which case it will pass the query down to one of its subordinant chiefs or slaves to respond. When the server connects it will request all transactions since its last transaction. Each transaction specifies the server ordinal that originally made the transaction and who made the previous transaction. If a transaction is out of sequence or has a mismatching previous transaction server the server goes into limbo, not accepting any new clients and not processing any new transactions, until things can be corrected by hand. This ensures that no transactions are missed and that all the connected servers are on the same page.
If the network becomes partitioned, multiple servers may declare themselves master and continue to process new transactions. If the master server does not have all of its chiefs connected, it will periodically send out "Hey, are you there" queries to the missing server(s). When the network reattaches itself, the two master servers will end up sending each other such a message. The one that processed the fewer number of transactions will yield the master status and attempt to connect to the other master. It may decide to put itself into limbo if the transaction history doesn't match up properly. In this case, the operator will be notified and someone will have to come in and do damage control by hand. The likely scenerio is to cut the limboed server's log off at the discrepancy, get it connected and then then reapply the transactions that followed that point. As this situation seldom, if ever, comes up, there is no automated procedure for correcting from this situation.
base
file and start a new log file. To
rebuild the database, a server will replay its base file followed
by the current log file. The checkpoint shortens the process
by allowing the server to skip all the changes that were
superseded by subsequent changes.
kns -c 'command'
The specified command is sent to the server. If no-c
option is specified or a single dash is specified for the command, all
lines on stdin are processed as separate commands until there is an
error, end of file or an end/exit/quit command is encountered.
extern char *kns_errmsg; extern int kns_errno; void ksn_close(void); int kns_cmd(char *msg); int kns_conf(char *dir); void kns_deinit(void); int kns_init(char *dir); int kns_open(void); int kns_rc(int rc);
Once initialized, the kns_cmd routine is called with an ASCII command to be processed by the server. Kns_cmd will call kns_open if necessary to get connected to the current master server. The reply from kns_cmd will indicate success or failure:
KNS_RETRY | -2 | Didn't work, but may later | |
KNS_ERROR | -1 | Unsuccessful return value | |
KNS_OKAY | 0 | Successful return value | |
KNS_MORE | 1 | More data available |
In addition, the kar_errno variable will contain an extended error
status as defined by the KNS_E_
xxx constants in the kns.h
include file and kns_errmsg will point to a suitable message to be
displayed as part of an error message as well as any response from the
server. Note that these two global variables (and others) indicate that
the KNS API is not officially "thread safe". Only one thread, normally
the main thread, should be making calls to the KNS library.
The kns_deinit routine can be called followed by a second call to kns_init to force a rereading of the configuration for processes that may continue to run for extended periods during which the configuration file is expected to change. The kns_close routine is called internally by kns_deinit and is not expected to be referenced by client applications directly.
The kns_rc routine is used internally to set the kns_errno
and kns_errmsg variables and return the actual reply code. It
also throttles requests that result in KNS_RETRY
responses
down to one every 10 seconds to reduce the overhead of processes spinning
their wheels. This routine is not intended to be called directly by the
client.