Learning and Reinforcement
Basic Models of Learning
- How do organizations offer appropriate rewards in a timely fashion?
Learning may be defined, for our purposes, as a relatively permanent change in behavior that occurs as a result of experience. That is, a person is said to have learned something when she consistently exhibits a new behavior over time. Several aspects of this definition are noteworthy.
First, learning involves a change in an attitude or behavior. This change does not necessarily have to be an improvement, however, and can include such things as learning bad habits or forming prejudices. In order for learning to occur, the change that takes place must be relatively permanent. So changes in behavior that result from fatigue or temporary adaptation to a unique situation would not be considered examples of learning. Next, learning typically involves some form of practice or experience. For example, the change that results from physical maturation, as when a baby develops the physical strength to walk, is in itself not considered learning. Third, this practice or experience must be reinforced over time for learning to take place. Where reinforcement does not follow practice or experience, the behavior will eventually diminish and disappear (“extinction”). Finally, learning is an inferred process; we cannot observe learning directly. Instead, we must infer the existence of learning from observing changes in overt behavior.
We can best understand the learning process by looking at four stages in the development of research on learning (see (Figure)). Scientific interest in learning dates from the early experiments of Pavlov and others around the turn of the century. The focus of this research was on stimulus-response relationships and the environmental determinants of observable behaviors. This was followed by the discovery of the law of effect, experiments in operant conditioning, and, finally, the formulation of social learning theory.
Classical conditioning is the process whereby a stimulus-response (S-R) bond is developed between a conditioned stimulus and a conditioned response through the repeated linking of a conditioned stimulus with an unconditioned stimulus. This process is shown in (Figure). The classic example of Pavlov’s experiments illustrates the process. Pavlov was initially interested in the digestive processes of dogs but noticed that the dogs started to salivate at the first signal of approaching food. On the basis of this discovery, he shifted his attention to the question of whether animals could be trained to draw a causal relationship between previously unconnected factors. Specifically, using the dogs as subjects, he examined the extent to which the dogs could learn to associate the ringing of a bell with the act of salivation. The experiment began with unlearned, or unconditioned, stimulus-response relationships. When a dog was presented with meat (unconditioned stimulus), the dog salivated (unconditioned response). No learning was necessary here, as this relationship represented a natural physiological process.
Next, Pavlov paired the unconditioned stimulus (meat) with a neutral one (the ringing of a bell). Normally, the ringing of the bell by itself would not be expected to elicit salivation. However, over time, a learned linkage developed for the dog between the bell and meat, ultimately resulting in an S-R bond between the conditioned stimulus (the bell) and the response (salivation) without the presence of the unconditioned stimulus (the meat). Evidence emerged that learning had occurred and that this learning resulted from conditioning the dogs to associate two normally unrelated objects, the bell and the meat.
Although Pavlov’s experiments are widely cited as evidence of the existence of classical conditioning, it is necessary from the perspective of organizational behavior to ask how this process relates to people at work. Ivancevich, Szilagyi, and Wallace provide one such work-related example of classical conditioning:
An illustration of classical conditioning in a work setting would be an airplane pilot learning how to use a newly installed warning system. In this case the behavior to be learned is to respond to a warning light that indicates that the plane has dropped below a critical altitude on an assigned glide path. The proper response is to increase the plane’s altitude. The pilot already knows how to appropriately respond to the trainer’s warning to increase altitude (in this case we would say the trainer’s warning is an unconditioned stimulus and the corrective action of increasing altitude is an unconditioned response). The training session consists of the trainer warning the pilot to increase altitude every time the warning light goes on. Through repeated pairings of the warning light with the trainer’s warning, the pilot eventually learns to adjust the plane’s altitude in response to the warning light even though the trainer is not present. Again, the unit of learning is a new S-R connection, or habit.
Although classical conditioning clearly has applications to work situations, particularly in the area of training and development, it has been criticized as explaining only a limited part of total human learning. Psychologist B. F. Skinner argues that classical conditioning focuses on respondent, or reflexive, behaviors; that is, it concentrates on explaining largely involuntary responses that result from stimuli.
More complex learning cannot be explained solely by classical conditioning. As an alternative explanation, Skinner and others have proposed the operant conditioning model of learning.
The major focus of operant conditioning is on the effects of reinforcements, or rewards, on desired behaviors. One of the first psychologists to examine such processes was J. B. Watson, a contemporary of Pavlov, who argued that behavior is largely influenced by the rewards one receives as a result of actions.
This notion is best summarized in Thorndike’s law of effect. This law states that of several responses made to the same situation, those that are accompanied or closely followed by satisfaction (reinforcement) will be more likely to occur; those that are accompanied or closely followed by discomfort (punishment) will be less likely to occur.
In other words, it posits that behavior that leads to positive or pleasurable outcomes tends to be repeated, whereas behavior that leads to negative outcomes or punishment tends to be avoided. In this manner, individuals learn appropriate, acceptable responses to their environment. If we repeatedly dock the pay of an employee who is habitually tardy, we would expect that employee to learn to arrive early enough to receive a full day’s pay.
A basic operant model of learning is presented in (Figure). There are three important concepts of this model:
Drive. A drive is an internal state of disequilibrium; it is a felt need. It is generally believed that drive increases with the strength of deprivation. A drive, or desire, to learn must be present for learning to take place. For example, not currently being able to afford the house you want is likely to lead to a drive for more money to buy your desired house. Living in a run-down shack is likely to increase this drive compared to living in a nice apartment.
Habit. A habit is the experienced bond or connection between stimulus and response. For example, if a person learns over time that eating satisfies hunger, a strong stimulus-response (hunger-eating) bond will develop. Habits thus determine the behaviors, or courses of action, we choose.
Reinforcement or reward. This represents the feedback individuals receive as a result of action. For example, if as a salesperson you are given a bonus for greater sales and plan to use the money to buy the house you have always wanted, this will reinforce the behaviors that you believed led to greater sales, such as smiling at customers, repeating their name during the presentation, and so on.
A stimulus activates an individual’s motivation through its impact on drive and habit. The stronger the drive and habit (S-R bond), the stronger the motivation to behave in a certain way. As a result of this behavior, two things happen. First, the individual receives feedback that reduces the original drive. Second, the individual strengthens his or her belief in the veracity of the S-R bond to the extent that it proved successful. That is, if one’s response to the stimulus satisfied one’s drive or need, the individual would come to believe more strongly in the appropriateness of the particular S-R connection and would respond in the same way under similar circumstances.
An example will clarify this point. Several recent attempts to train chronically unemployed workers have used a daily pay system instead of weekly or monthly systems. The primary reason for this is that the workers, who do not have a history of working, can more quickly see the relationship between coming to work and receiving pay. An S-R bond develops more quickly because of the frequency of the reinforcement, or reward.
Operant versus Classical Conditioning
First, the two approaches differ in what is believed to cause changes in behavior. In classical conditioning, changes in behavior are thought to arise through changes in stimuli—that is, a transfer from an unconditioned stimulus to a conditioned stimulus. In operant conditioning, on the other hand, changes in behavior are thought to result from the consequences of previous behavior. When behavior has not been rewarded or has been punished, we would not expect it to be repeated.
Second, the two approaches differ in the role and frequency of rewards. In classical conditioning, the unconditioned stimulus, acting as a sort of reward, is administered during every trial. In contrast, in operant conditioning the reward results only when individuals choose the correct response. That is, in operant conditioning, individuals must correctly operate on their environment before a reward is received. The response is instrumental in obtaining the desired reward.
Social Learning Theory
The last model of learning we should examine is noted psychologist Albert Bandura’s social learning theory. Social learning theory is defined as the process of molding behavior through the reciprocal interaction of a person’s cognitions, behavior, and environment.
This is done through a process that Bandura calls reciprocal determinism. This concept implies that people control their own environment (for example, by quitting one’s job) as much as the environment controls people (for example, being laid off). Thus, learning is seen as a more active, interactive process in which the learner has at least some control.
Social learning theory shares many of the same roots as operant conditioning. Like Skinner, Bandura argues that behavior is at least in part controlled by environmental cues and consequences, and Bandura uses observable behavior (as opposed to attitudes, feelings, etc.) as the primary unit of analysis. However, unlike operant conditioning, social learning theory posits that cognitive or mental processes affect our response to the environmental cues.
Social learning theory has four central elements: attention, retention, reproduction, and incentives. Before someone can learn something, they must notice or pay attention to the thing that is to be learned. For example, you probably would not learn much as a student in any class unless you paid attention to information conveyed by the text or instructor. Retention is the process by which what you have noticed is encoded into your memory. Reproduction involves the translation of what was recorded in your mind into overt actions or behaviors. Obviously, the higher the level of attention and the greater the retention, the better the reproduction of what was learned. Finally, incentives can influence all three processes. For example, if you are rewarded (say, praised) for paying attention, you will pay more attention. If you are rewarded for remembering what you studied (say, good grades), you will retain more. If you are rewarded for reproducing what you learned (say, a promotion for effectively motivating your subordinates), you will produce that behavior more.
Central to this theory is the concept of vicarious learning. Vicarious learning is learning that takes place through the imitation of other role models. That is, we observe and analyze what another person does and the resulting consequences. As a result, we learn without having to experience the phenomenon firsthand. Thus, if we see a fellow employee being disciplined or fired for being disruptive in the workplace, we might learn not to be disruptive ourselves. If we see that gifts are usually given with the right hand in the Middle East, we might give gifts in that manner ourselves.
A model of social learning processes is shown in (Figure). As can be seen, three factors—the person, the environment, and the behavior—interact through such processes as vicarious learning, symbolic representations, and self-control to cause actual learned behaviors.
Major Influences on Learning. On the basis of this work, it is possible by way of summary to identify several general factors that can enhance our learning processes. An individual’s desire to learn, background knowledge of a subject, and the length of the learning period are some of the components of a learning environment. Filley, House, and Kerr identify five major influences on learning effectiveness.
Drawn largely from behavioral science and psychology literature, substantial research indicates that learning effectiveness is increased considerably when individuals have high motivation to learn. We sometimes encounter students who work day and night to complete a term paper that is of interest to them, whereas writing an uninteresting term paper may be postponed until the last possible minute. Maximum transfer of knowledge is achieved when a student or employee is motivated to learn by a high need to know.
Considerable evidence also demonstrates that we can facilitate learning by providing individuals with feedback on their performance. A knowledge of results serves a gyroscopic function, showing individuals where they are correct or incorrect and furnishing them with the perspective to improve. Feedback also serves as an important positive reinforcer that can enhance an individual’s willingness or desire to learn. Students who are told by their professor how they performed on an exam and what they could do to improve next time are likely to study harder.
In many cases, prior learning can increase the ability to learn new materials or tasks by providing needed background or foundation materials. In math, multiplication is easier to learn if addition has been mastered. These beneficial effects of prior learning on present learning tend to be greatest when the prior tasks and the present tasks exhibit similar stimulus-response connections. For instance, most of the astronauts selected for the space program have had years of previous experience flying airplanes. It is assumed that their prior experience and developed skill will facilitate learning to fly the highly technical, though somewhat similar, vehicles.
Available evidence suggests that when a task consists of several distinct and unrelated duties, part learning is more effective. Each task should be learned separately. However, when a task consists of several integrated and related parts (such as learning the components of a small machine), whole learning is more appropriate, because it ensures that major relationship among parts, as well as proper sequencing of parts, is not overlooked or underemphasized.
The final major influence on learning highlights the advantages and disadvantages of concentrated as opposed to distributed training sessions. Research suggests that distribution of practice—short learning periods at set intervals—is more effective for learning motor skills than for learning verbal or cognitive skills.
Distributed practice also seems to facilitate learning of very difficult, voluminous, or tedious material. It should be noted, however, that concentrated practice appears to work well where insight is required for task completion. Apparently, concentrated effort over short durations provides a move synergistic approach to problem-solving.
Although there is general agreement that these influences are important (and are under the control of management in many cases), they cannot substitute for the lack of an adequate reinforcement system. In fact, reinforcement is widely recognized as the key to effective learning. If managers are concerned with eliciting desired behaviors from their subordinates, a knowledge of reinforcement techniques is essential.
General Motors has learned by experience that it pays not to have managers learn only by experience how to function effectively while working in foreign countries. Managing expatriate assignments in difficult locations was brought to life by the experiences of Richard Pennington, General Motors’ head of global mobility for the EMEA (Europe, Middle East, and Africa) region. He knows from experience some of the things that tend to go well, as well as some of those that don’t, and has learned lessons from moving employees to places like Uzbekistan. This became important when the company took on a new engine manufacturing operation in the capital, Tashkent, as well as an existing manufacturing plant in Andijan. The objectives were the same as for most global mobility projects: to get the right people to the right place at the right time for the right cost. The general approach was Action—Plan—Do—Check. Pennington urged potential relocation candidates not to be overreliant on the Internet and, if possible, to go and see for themselves. “Nothing beats going to a location—particularly a harsh location—yourself,” he says. Pennington also emphasizes the importance of selecting suppliers on the ground carefully, even if you already have a network of existing suppliers. Strong relationships in the host location are of paramount importance. In difficult locations, it is particularly important that the local HR, finance, and legal staff work with you proactively, as making payments at the right time can be critical. Equally, cultural training and language providers are essential.
These training programs involve a wide variety of teaching methods. Factual information may be conveyed through lectures or printed material. More subtle information is learned through role plays, case studies, and simulations.
The research on cross-cultural training suggests that the more involved participants are in the training, the more they learn, and that the more they practice or simulate new behaviors that they need to master in the foreign environment, the more effective they will be in actual situations.
The results for GM have been impressive. Most companies that do not provide cross-cultural training for their employees sent on international assignments experience failure rates of about 25 percent, and each failure or early return costs the company on average $150,000. GM has a failure rate of less than 1 percent. Also, in GM’s case, the training has been extended to the manager’s family and has helped reluctant spouses and children more readily accept, if not embrace, the foreign assignment.
Sources: F. Furnie, “International assignments: Managing change and complexity,” Relocate Global, September 23, 2015, https://www.relocatemagazine.com/articles/4697international-assignments-managing-change-and-complexity; J. Lublin. “Companies Use Cross-Cultural Training to Help Their Employees Adjust Abroad.” Wall Street Journal, August 4, 2004 p. B1.
- How can learning theory be used to change behaviors?
- Define classical conditioning, and differentiate it from operant conditioning.
- What is social learning theory?
- How do organizations offer appropriate rewards in a timely fashion?
People learn through both direct experience and vicarious experience. What is retained and produced as behavior is a function of the positive and negative consequences either directly experience by individuals or observed as the result of the actions of others. Often, managers and trainers underestimate the power of vicarious learning. Also, keep in mind that reinforcement that has some variability in its application (variable ratio or interval) has the strongest and longest-lasting impact on desired learned behaviors.
Learning is a relatively permanent change in behavior that occurs as a result of experience.
Thorndike’s law of effect notes that behavior that is rewarded is likely to be repeated, whereas behavior that is punished is unlikely to be repeated. Operant conditioning can be distinguished from classical conditioning in two ways: (1) it asserts that changes in behavior result from the consequences of previous behaviors instead of changes in stimuli, and (2) it asserts that desired behaviors result only when rewards are tied to correct responses instead of when unconditioned stimuli are administered after every trial.
Social learning is the process of altering behavior through the reciprocal interaction of a person’s cognitions, previous behavior, and environment. This is done through a process of reciprocal determinism.
Vicarious learning is learning that takes place through observation and imitation of others.
Learning is influenced by (1) a motivation to learn, (2) knowledge of results, (3) prior learning, (4) the extent to which the task to be learned is presented as a whole or in parts, and (5) distribution of practice.
- Classical conditioning
- The process whereby a stimulus-response bond is developed between a conditioned stimulus and a conditioned response through the repeated linking of a conditioned stimulus with an unconditioned stimulus.
- Conditioned response
- The process of conditioning through the repeated linking of a conditioned stimulus with an unconditioned stimulus.
- An internal state of disequilibrium; it is a felt need. It is generally believed that drive increases with the strength of deprivation.
- The experienced bond or connection between stimulus and response.
- Law of effect
- States that of several responses made to the same situation, those that are accompanied or closely followed by satisfaction (reinforcement) will be more likely to occur; those that are accompanied or closely followed by discomfort (punishment) will be less likely to occur.
- Operant conditioning
- Measures the effects of reinforcements, or rewards, on desired behaviors.
- Reciprocal determinism
- This concept implies that people control their own environment as much as the environment controls people.
- Social learning theory
- The process of molding behavior through the reciprocal interaction of a person’s cognitions, behavior, and environment.
- Unconditioned response
- From classical conditioning, a response to an unconditioned stimulus that is naturally evoked by that stimulus.
- Vicarious learning
- Learning that takes place through the imitation of other role models.