A Behavioral Approach to Video Game Design


I don't know who first drew a comparison between video games and a "Skinner box." I heard the term "Virtual Skinner Box" several years ago and have since seen the occasional reference to this term on various games design discussion forums. The term has been heavily used in recent years in relation to links between violence and video games, and in relation to video game addiction.

As anyone who has had any exposure to the study of psychology in school will probably know, a Skinner box is a piece of laboratory equipment used to conduct operant conditioning experiments with animals, usually a rat or a pigeon. In the Skinner box there is usually a lever or a key that the animal can manipulate to obtain a reward such as a food or water. Psychologists use Skinner boxes to study the effect of various schedules of reward or punishment on the animal's behavior. For instance, the box could be configured to deliver a reward to the animal every time it presses the lever, every hundred times, or on some irregular schedule. Psychologists would then measure how effective a particular reward schedule was on the animal's behavior. The inventor of the Skinner box was an American behavioral psychologist B.F. Skinner who also invented the term operant conditioning to describe his field of study.

So why use the term "virtual Skinner box" to describe a video game? While the rich environment of a typical video game is far removed from that of a Skinner box, and it seems insulting to compare a game player's behavior to that of a rat, I do understand the sentiment behind the term. Many games require the performance of a repetitive task to achieve some goal, and I distinctly recall the image in my mind that I was not unlike a rat endlessly pressing on a lever hoping for a food pellet when, for instance, I was using the crafting skill in Dark Ages of Camelot, or pressing the button repeatedly in video poker.

Given this perceived link between operant-conditioning and video game design, what does the research of behavioral psychologists' into operant conditioning have to teach us about structuring rewards within a video game? In this article I propose that over a century of study by behavioral psychologists into conditioning does hold important lessons for the game designer, that some popular titles are today implementing operant conditioning techniques, and that the use of such techniques in a game's design can make that game more enjoyable and can increase its longevity. I will further discuss the ethical considerations of using such techniques, especially in light of recent concerns about the addictiveness of video games.


Behaviorism is an approach to psychology that uses scientific techniques to understand behavior. It makes no reference to mental events or internal psychological processes, holding that the sources of behavior are external (in the environment), and not internal (in the mind). The pure behaviorist will describe behavior with no reference to internal psychological processes or mental events. For instance, a behaviorist will not try to understand or describe an organism's feelings or motivations when studying its behavior, but will look only to observable behavior, and to the external environmental factors associated with that behavior. This pure behaviorist view of psychology is no longer popular; it is viewed as too restrictive an approach to gaining an understanding of behavior. Conditioning, however, remains a valid technique in understanding behavior. In this article on the use of conditioning in video games, I do not follow a pure behavioral approach and may occasionally discuss what feelings or motivations cause player's to behave in the ways they do.

Section I

Basic Principles of Operant Conditioning

What is Operant Conditioning?

The basic principal of operant conditioning is simply that the frequency of a behavior will increase if it is rewarded, and that it will decrease if it is punished. For instance, a hungry rat in a Skinner box will at first act in a manner that is natural to a hungry rat; e.g., running around the cage, squeaking, trying to escape, etc. If while it is performing these activities, one response — in this case pressing a lever — leads to the reward of securing food, the rat will gradually learn that pressing the lever leads to the reward of food. The behavior will be repeated and thus learned. The behavior that results in the reward becomes especially important to the rat. The same process can be applied to an action that allows the rat to escape from or avoid unpleasant stimuli.

Another principal of operant conditioning is that once a behavior is learned, the frequency of the reward can be reduced. For the behavior to be learned, it may be necessary at first to reinforce every occurrence of the behavior. Once learned, the reinforcements can be provided on an intermittent basis, and over time it is possible to reduce the frequency of rewards and still maintain the behavior. For instance, the number of times the lever has to be pressed to achieve a reward can gradually be increased from each time, to every ten times, to every hundred times, and so on, or the lever may need to be pressed repeatedly for a set period of time to achieve a reward. Behavioral psychologists have spent much time experimenting on what effect various schedules of reward have on behavior. These reward schedules are of particular importance to the video game designer and are discussed in detail below.

A further principle of operant conditioning is that it is possible to condition an individual to perform behaviors outside of their usual repertoire. If a behavior is particularly complex, for instance, if it is an action that requires multiple steps or takes much skill to perform, it may be impossible to directly reinforce that behavior. Instead, it is possible to reinforce behaviors that approximate the desired behavior and through step-by-step reinforcement of successive approximations, gradually produce the desired response. This principle is known as "behavior shaping." For example, a video game may implement various levels of difficulty, each successive level requiring the player to perform a more complex set of actions to succeed.

Scheduling Rewards

The basic principal of operant conditioning — that you can increase the frequency of a behavior by rewarding it — is fairly simple, and is one that many game designers will have already discovered from trial-and-error or through common sense. The study of operant conditioning becomes more interesting when we look at how reward systems (or "reinforcers," to use the psychological term) can be structured to produce the greatest effect on a behavior. There is extensive research on how reinforcers can be most effectively scheduled.

There are three types of schedules for reinforcers — continuous, extinction, and intermittent. With a continuous schedule, the behavior is reinforced each time it is performed. Extinction schedules are the opposite of continuous schedules in that no instance of the given behavior will be reinforced. Between these two extremes lie intermittent schedules, where only some of the instances of a behavior are reinforced.

Intermittent schedules include:

  • Ratio Schedules: In a ratio schedule, reinforcement is one where the behavior must be performed X times before it is reinforced. X can be a fixed or variable number.
  • Interval Schedules: An interval schedule is one where the first response occurring at any time after a fixed interval of time will be reinforced.
    Interval Schedule with Limited Hold: This schedule is like an interval schedule, except that in order to be reinforced the response must occur within a set period at the end of the interval.
  • Duration Schedules: To be reinforced, the behavior must be performed throughout the interval.

All of these schedules can also be fixed or variable. In a fixed schedule, the reinforcement will occur after a set period of time, or after a fixed number of responses. In a variable schedule, the time or number of responses will vary around a particular number; for instance, a reinforcement will be given every ten to twenty times the behavior is performed. If we treat continuous and extinction schedules as just two extremes of a fixed ratio schedule, we are left with eight basic reinforcement schedules:

  1. Fixed Ratio (FR) - A reinforcer is given after a specified number of correct responses.
  2. Variable Ratio (VR) - A reinforcer is given after a specified number of correct responses.
  3. Fixed Interval (FI) - The first response after a fixed time interval is reinforced.
  4. Variable Interval (VI) - The first response after a variable time interval is reinforced.
  5. Fixed Interval Limited Hold (FI-LH) - The first response after a fixed interval of time is reinforced, providing the response occurs within a set period at the end of the interval.
  6. Variable Interval Limited Hold (VI-LH) - The first response after a variable interval of time is reinforced, providing the response occurs within a set period at the end of the interval.
  7. Fixed Duration (FD) - To be reinforced, the behavior must occur continuously throughout a fixed time interval.
  8. Variable Duration (VD) - To be reinforced, the behavior must occur continuously throughout a variable time interval.

Key Terms.

Ratio Strain: If the ratio or interval is increased too rapidly, the responses may deteriorate as if the behavior were on an extinction schedule. This deterioration is often referred to as ratio strain.

Resistance to Extinction (RTE): As the interval or ratio is increased, the investment required to achieve a reward will eventually exceed the reward. At this point the frequency of a behavior will decrease. This is known as extinction. Some reinforcement schedules allow greater increases in intervals or ratios than others, and are said to have a high resistance to extinction.

Post-reinforement pause: Fixed schedules produce a dramatic drop-off of responding immediately after reinforcement). This is known as post-reinforcement pause. The length of the post-reinforcement pause depends on the interval, the higher, the longer the pause. These pauses are eliminated or are much smaller in variable schedules.

Each of these intermittent schedules has its own characteristics behavior patterns, making it suitable for different applications. Typically the Variable schedules of reinforcement are more effective at producing responses than Fixed schedules, and Ratio schedules are more effective at producing responses than Interval schedules. Of all the schedules, the Variable Ratio schedule is able to generate the highest level of responses over the longest period.

Does Conditioning Work on Humans?

Writers of fiction were quick to grasp the potential of conditioning on human society. In Anthony Burgess' work, The Clockwork Orange, aversion therapy is used to 'cure' the protagonist Alex of his brutal sociopath behavior. Aldous Huxley's Brave New World imagines a Dystopian society where science has learned to shape and control human emotions through conditioning.

Despite these fictional examples, the question remains whether these techniques — which have largely been used to conduct experiments on animals — work on humans. The answer is a qualified yes. Clinical psychologists and psychiatrists do frequently use operant conditioning techniques on humans. For instance, the technique is often used in the treatment of autism. A number of experiments have shown that humans show the same patterns of responding that other animals show when exposed to the basic schedules of reinforcement. The qualification comes due to the ability of humans to reason and verbalize rules that may prevent them from showing the same behavior as animals. For example, humans may alter their behavior based on what they think the experimenter wants to see, rather than in response to the actual reinforcement schedule.

Another difference with humans is the ability to explain to them what type of schedule of reinforcement they are on. A number of studies indicate that people perform more efficiently on various schedules if they have specific rules to follow regarding the schedule in effect. This openness contrasts with many video games, where the mechanics of the reward system remain opaque to the player. If the intent of the game's reward system is to increase the frequency of a given behavior, clearly explaining to the player how the reward system works will make the reward system more effective.



General Principles

A reward is anything that increases the frequency of a behavior. This reward can be the presentation of a positive event following a response, or the removal of an aversive event. Likewise, punishment is something that decreases the frequency of a response and can take the form of the presentation of an aversive event or the removal of a positive event. As mentioned, earlier psychologists tend to refer to anything that increases the frequency of a response as a reinforcer. Hence, rewards are referred to as positive reinforcers, and the removal of punishments is referred to as negative reinforcers.

Outcome of Conditioning
  Increase in Behavior Decrease in Behavior
Positive Stimulus Add Stimulus
(positive reinforcement)
Remove Stimulus
Negative Stimulus Remove Stimulus
(negative reinforcement)
Add Stimulus

Primary Vs. Secondary Reinforcers

The major factor in determining whether a behavior will be conditioned or not is the nature of the consequences that result from that behavior. If the consequence of a behavior is not one that is recognized by the subject as being a reinforcer, the behavior will not be reinforced. One set of consequences that are clearly reinforcers are those that satisfy some biological need. Food is an obvious example of such a reinforcer. To a hungry person, food will always be reinforcing. Reinforcers that meet a biological need or drive are known as Primary or Unconditioned reinforcers. Primary reinforcers include food, water, and avoidance of pain.

There are however many other consequences that people find reinforcing even though they do not satisfy a biological need. For instance, people are not born with any innate drive to earn money, yet through life experience, we learn to treat money as a reinforcer. These other reinforcers are referred to as Secondary or Conditioned reinforcers. Secondary reinforcers are learned through continued pairing with other exisiting reinforcers. For instance, we learn to treat money as a reward because it allows us to obtain other reinforcements, e.g. food. The process by which the range of reinforcers is expanded is known as Classical Conditioning (see sidebar).

Some conditioned reinforcers are especially effective as they can be paired with many types of reinforcers. These are called generalized reinforcers. Money, tokens, approval, and affection are generalized reinforcers since they can be associated with a variety of other events that are themselves reinforcing. For instance, money can be exchanged for many other events that are reinforcing, such as snacks, toys, and video games.

How Does Operant Conditioning Compare to Classical Conditioning?

If you have had any exposure to the field of psychology, you have also probably heard of Pavlov's experiments with dogs. In these experiments, Pavlov rang a bell just before presenting food to some dogs. He found that after several repeated pairings of the bell with the presentation of food the dogs began to drool in response to the bell in the same manner that they did to the presentation of food. This was a demonstration of Classical conditioning a process whereby a neutral or conditioned stimulus (the CS) — in Pavlov's experiment the sound of the bell — is repeatedly presented with an unconditioned stimulus (the US) — in Pavlov's experiment the presentation of food. After repeated pairings of the CS and US, the CS comes to produce the behavioral response by itself, i.e. drooling. This response is known as a conditioned response (CR) because it is occurring in response to the CS.

Classical conditioning can be contrasted with operant conditioning in that with respondent conditioning the CS leads to the CR, whereas in operant conditioning it is the Response that leads to the Stimulus. Another contrast is that respondent conditioning deals with involuntary behaviors such as salivation, whereas operant conditioning deals with voluntary behaviors; e.g., with pressing a lever.

Classical conditioning can be used effectively within a video game. The repeated pairing of eerie music with dangerous situations within the game will rapidly condition the player to react to the music alone. This tool is frequently used in horror movies where scary music is used to build suspense.

Concurrent Reinforcement Schedules

In life we are often presented with multiple reinforcement schedules, and our actions at any one time are a result of our making a choice between alternatives. Psychologists have tried to understand how organisms choose between multiple reinforcement schedules and have found a remarkable consistency in how we choose. They have found that organisms choose one reinforcement schedule over another in direct proportion to the frequency, magnitude, or delay in reinforcement of reinforcers for each schedule. For instance, if a pigeon receives one pellet of food for tapping a blue key five times, but two pellets for tapping a red key five times, the pigeon will tap the red key twice as often as the blue. Likewise, if one schedule provides reinforcement twice as often as another, organisms will select that schedule over the other on a 2:1 ratio. The same goes for delays in reinforcement. If one schedule provides a reinforcement after a two-second delay, while another has a four-second delay, the animal will prefer the first over the second by the same 2:1 ratio. This relationship is known as "matching law", which holds that the relative rate of response on an alternative is approximately equal to the relative, rate, magnitude, and immediacy of reinforcement provided for responding to that alternative.

Studies of animal feeding behavior in their natural environment have produced results consistent with matching law. Optimal foraging theory says that feeding behavior is sensitive to the relation between the amount of energy that is expended in finding, securing, and consuming food and the amount of energy or nutrition that the food provides. The net return of energy is determined by the size, quality, scarcity, and work involved in subduing prey. When given a choice between different foods, animals will select in direct proportion to the net return of energy of the various food choices. The foraging behavior of animals as diverse as bees, owls, and rodents have all been accounted for with extreme accuracy through optional foraging theory.

1 | 2 | Next>>

Copyright Sean Butcher, 2004