The Foraging Search Algorithm

Background

We assume an agent searching a vast, possibly infinite, space in which objects are distributed by some unknown dynamic process.  An agent is capable of observing a local environment which contains a set of hetrogenous objects and a set of paths to some next local environments.  Objects are recognized by sets of features that distinuish types.  There are four basic categories of objects with respect to an agent (as in how they effect the agent).  The first category contains objects which are critical to the well-being of the agent, e.g., food or water to an animal.  These are called resources.  The second category contain objects which may threaten the well-being of the agent or its success in finding a resource.  These are called threats.  Together, the first and second categories contain objects which are inherently recognized and trigger affective states within the agent (approach or flee).  The third category contains objects which, while not actual resources (or threats) in themselves, are, in some greater scheme, causally linked with resource (or threat) objects such that the occurance of such an object is typically encountered just prior to encounter with a resource (or threat).  That is to say, in the normal activities of the agent, the encounter with a 'cue' object predicts the near-future encounter with an affective object.  These cue objects have a semantic relationship with such objects.  All other objects are neutral.

Initially an agent knows of only resources, threats and neutral objects.  Resources are assumed to be sparsely distributed in the set of local environments.  Furthermore, object content and path structures are assumed to be dynamically altered on many different time scales.  The agent effects this dynamic in that it consumes resources it finds (the resource may be regenerated at a later time).  Thus, the agent will always be in a state of ignorance with respect to the actual location and timing of resources.  However, we also assume that the dynamics of the space operate in accordance with law-like or propensities of nature such that there is some consistency to the operation of the world of the agent.  Therefore, an agent that can learn the causal relations between initially neutral objects and resource or threat objects can exploit the occurance of the cues to find and consume the resources and avoid threats.  An agent that can learn a chain of causal relations can effect a more directed search and improve its chances of finding resources.

Foraging search is an attempt to emulate animals that solve the above problem in hunting for food, water, mates and other resources.  The algorithm depends on the agent's ability to partition the set of neutral objects into the cue objects and still neutral objects and further partition the former into resource and threat cues.  One must assume that there are sufficiently many objects that are causally related to the resources such that the agent which successfully learns, even a subset of these objects, improves its chances of success in finding resources.

Algorithm
In the below, typical keywords (IF, ELSE, etc.) are set in all caps.  Functions are given by identifiers with initial capital letters.  And variables (broadly interpreted) are in lowercase.

Forage
DO forever
   WHILE hunger > threshold
      WHILE no cues or resources or threats in local environment
         Wander
         select path based on direction generated from Wander
         Decay cue memory traces
         Record local environment - path-associated objects - in short-term trace
         proceed to next local environment
      ENDWHILE no cues
      IF cue = threat-associated  OR threat detected
         TurnAndRun
      ELSE IF cue = resource-associated
         select path leading to cue // or suggested by location of cue
         Record local environment - path associated objects - in short-term trace
      ELSE IF resource found
         Consume resource
         Learn cue chain // reinforce intermediate-term memory
      ELSE
         Decay cue memory traces
      ENDELSE
   ENDWHILE hungry
   Follow pheromone trail home
   IF home Learn cue chains // reinforce long-term memory
   Conduct other activities // internal loop
ENDDO forever

This algorithm assumes that a mission is to be carried out iterratively.  That is, a search for resources must be conducted periodically, conditioned by an internal motivational variable, here indicated as hunger.  It is assumed that consumption of the resource will lower hunger and that the extent of lowering is coupled with some quality parameter of the resourse.

The algorithm is an infinite loop.  As the agent conducts other activites it depletes its stores of energy (or some analogous variable level).  When energy falls (hunger rises) below (above) some threshold, the agent is motivated to leave its home and start searching for resources.

It enters the third loop and stays there as long as it does not detect a cue object in its local environment.  The Wander algorithm is described below.

Wander
   STATIC last_direction, last_deviation, last_sign, cumulative_deviation
 
   sign = exponDist(cumulative_deviation) // sign = +1 (pos) for right of center
                                                        -1 (neg) for left
   IF sign <> last_sign
      cumulative_deviation = 0
      last_sign = sign
   ENDIF
   deviation = exponDist(last_deviation) // low probability of high deviation magnitude
   last_deviation = deviation plus influence of any biases // biases are global variables
   cumulative_deviation = cumulative_deviation + deviation
   last_direction = last_direction + (last_sign * deviation)
 

Wander keeps track of the current direction (which may also be a net difference between two motor speed variables), the last deviation (change in direction or speed difference), a cumulative deviation in a given direction, and the sign of the previous deviations.  As the agent wanders it will tend to curve off to one side or another until the cumulative deviation in one direction (right or left) is high.  As the latter increases, it increases the probability of changing signs and reversing directions.  Overall the chances of a large deviation are small, while those for a small deviation are large.  Thus, the amount of change in direction from one call to wander to the next is usually small and stochastic.  This approximates a pink-noise or 1/f, power law, distribution.  The end result is the drunken-sailor walk as described in [Mobus & Fisher, 1999].  Such a walk approximates a randomized depth-first search.

The drunken-sailor walk can be biased such that the path takes on low amplitude deviations -- becomes closer to a straight line.  Or it may be biased toward the right or toward the left of center.  These biases are handled by global variables that are set by recognition processes.  So if an object on the left is "familiar", e.g. a potential cue, not yet elevated to the status of actual cue, the agent will preferentially be drawn in that direction.  See the discussion of interfacing the central pattern generator-based version of Wander in the MAVRIC robot in [Mobus & Fisher, 1999].

Every potential cue recognizer is linked to every resource or threat associator.  A memory trace is the relative strength of the link when a cue is activating.  Unlike traditional neural networks, the memory trace does not involve just one efficacy or weight value.  Rather, the efficacy is based on short-term, intermediate-term and long-term support terms as given in the adaptrode model [Mobus, 1994].  With each iteration, the strength of cue-resource (or cue-threat) link is decayed according to the adaptrode formulation.  This prevents transitory associations from being retained over longer times.

Detection of a cue associated with a harmful consequence (threat) results in a pre-programed response called TurnAndRun.  The agent executes a stylized escape behavior before returning to the Wander loop.

In the event that a resource cue is detected, the agent preferentially selects the path associated with the cue.  This preference may be stochastic with the probability of selection dependent on the strenght of the adaptrode support.  Once a path is selected the agent records all objects associated with (in the neighborhood of) the selected path.  These objects are potential cues It then follows the selected path and upon arriving at the next local environment .
 

 

Figure 1.  Foraging search in a discrete domain (graph).  See text for description.

Figure 1 depicts a situation in the discrete form of foraging search such as a Web search agent.  Each node represents a local environment (e.g., a Web page).  The agent enters the local environment from another node (to the right of the central node).  It then scans each out-path picking out the objects/features associated with each path.  In the figure these are represented by the "w1" through "w16" objects.  The agent has already stored the objects on the path which led to the central node (w1, w3, w15 and w16) in a short-term memory trace.  Upon scanning the various out-paths, the agent finds two paths that have the w1 and w3 associated.  There is a very weak propensity to follow one of these (over the third out-path).  And, all other things being equal, the agent will tend to take the central path since this represents the smalles deviation from its original direction (the angle of entry of the in-path represents the direction variable).  The Wander algorithm could cause the agent to choose either of the other two paths, but with greatly reduced probability due to the pink-noise distribution used to select paths.

Suppose the agent selects the central path.  The trace representing the w3 association will be mildly reinforced in short-term memory before the agent proceeds along that path to the next node.  The traces for w1, w15 and w16 will be decayed somewhat moreso than that for w3 as a result.  Upon arrival at the node (marked with the "k3" label), the agent scans the environment.  In this case the node is a leaf or terminal node.  It contains one of the search resources, the k3 label.  Since only one of several possible resources is present (and the use of just one item is an indication of a less useful amount of resource, e.g., a low page salience measure for a Web page) the energy value of the node is low, but sufficient to transfer the memory trace of the w3 object into intermediate-term memory through the adaptrode reward reinforcement mechanism.  The w3 object becomes a potential cue for k3 to the agent.

Since the node is a terminal, the agent backtracks to the central node in the figure.  It scans the remaining paths.  The path just taken is eliminated from further consideration since it has been visited and the resource consumed.  Suppose the Wander algorithm selects the upper path, due to the influence of the weak but still present w1 object trace.  It will find a richer set of resources (more energy) and will tend to boost the value of w1's trace, also transferring it into intermediate-term memory.  Now both w1 and w3 have stronger potential cue status.  By being in intermediate-term memory, both will be protected from too much decay.  w15 and w16, on the other hand, will both be so decayed that they can be eliminated from memory altogether.

When the agent found both nodes to contain resources, it "consumed" them.  In the case of a Web search agent, this means returning the URL to the user, for further evaluation.  The latter serves the purpose of providing confirmation to the adaptrode algorithm in the same way that the agent's successful return to its home (as in the case of the MAVRIC robot) provides.  The confirmation of value will cause the memory traces for w1 and w3 to be transfered to long-term memory if both nodes were deemed sufficiently valuable.

Upon return from the second node visited, the agent may or may not choose to take the remaining out-path.  The entry path is also a valid out-path - for backtracking.  By the time the agent has taken several out-paths and returned from terminal nodes (or box canyons) it has significantly altered its direction so that the entry path becomes more probable as an exit path (the entry path, when used for exit, does not require scanning).  Thus there is some likelihood that not all out-paths from a node will be selected, particularly if there are no cues or candidate cues associated with the path.  In this one way, this algorithm differs from traditional (graph) search.  There is no necessary attempt to do an exhaustive search of all reachable nodes.

The philosophy behind this approach is that in a truly vast space of possible nodes, it would be impossible and impractical for the agent to search everywhere.  The possibility exists for the agent to get stuck for a longer than optimal time in a dense portion of the graph that does not contain any resource.  This would cause the agent to loose excessive energy and forget possible cues.  In fact, in one variation of the above algorithm, the agent gets more frustrated with each node it searches that it doesn't find a resource.  As the level of frustration increases, the distribution parameter of the pink-noise generator (mu in an inverse exponential distribution) is adjusted such that the number of out-paths in a node that are likely to be searched is reduced.  The agent then has a stronger bias toward backtracking in order to cover a "broader" territory.  In one sense, the agent starts approaching a more breadth-first kind of search.

In the above scenario for Figure 1, the objects w1 and w3 were learned as possible cues.  The learning, however, given only one instance of a connection between them and the various resource objects, is weak.  And even though the traces are advanced to intermediate- and even long-term memory, the low levels would still allow them to decay over many non-occurances of the conjunction (temporally ordered) of the cues and resources.

What must happen is that the agent encounter additional such co-occurances in some reasonable time (in the case of the Web agent this might mean within the next several thousand page hits).  Each such encounter further reinforces the strength of connection between the cue and the resource recognizer.  Eventually, only those cues which prove to be useful in predicting the discovery of resources will be retained for the very long-term.

As an agent builds a set of reliable cue predictors its search efficiency increases.  That is, it becomes better at finding resources (and avoiding threats) by virtue of using its cue-based way-finding method to locate resources.  This, of course depends upon the richness of unique objects and causal relations between them and the resources (threats).  If there are too few objects which are causally related to the resources, and if the distribution of resources and cues are sparse, then this approach will not prove effective.  However, when one considers the richness of the real natural world then it seems quite reasonable to suspect that foraging search will provide a useful method for agent search.