Learning Theory

1. Learning Theory

The psychological construct of learning refers to the development of a relatively lasting change in behaviour as the result of a single or repeated experience.

Non associative learning:

These are simple forms of learning demonstrated in lower animals where only single events are used in learning paradigm - no pairing or ‘operation’ on the environment is required.
- Habituation is a non-associative learning in which repeated stimulation leads to a reduction in response over time as the organism ‘learns’ the stimulus.
- Sensitization is an increase in response to a stimulus as a function of repeated presentations of that stimulus. Similar to habituation, repetition of exposure is required to elicit the learning effect, but the response rates go up, not down (i.e. opposite to the effect seen in habituation).
- Pseudoconditioning (cross-sensitization): The emergence of a response to a previously neutral stimulus simply as a result of exposures to a different but powerful stimulus.

Associative learning:

Here learning occurs through the association of two events.
- Classic conditioning: learning takes place through repeated temporal association of two events. The learning organism is passive, respondent (i.e. shows an innate, reflexive response such as salivation) but not instrumental (i.e. does not actively operate on its environment).

- Operant conditioning: learning results from consequences of one’s actions – operations. The learning organism actively operates (instrumental) on the environment.

- Social learning theory: combines both classic and operant models of learning, and includes cognitive processes and social interaction to be relevant in human learning. Classical conditioning is produced by repeatedly pairing a neutral conditioned stimulus (CS e.g. bell) with an unconditioned stimulus (UCS e.g. food) that naturally evokes an unconditioned response (UCR e.g. salivation). Eventually the neutral stimulus alone eventually evokes the desired response (salivation – now called conditioned response, CR). It is a relatively rapid process and depends upon the nature of the unconditioned stimulus. Pavlov first demonstrated this paradigm in dogs.
The development of the association between the CS and the UCR resulting in a CR is called acquisition.
For animals this takes around 3 and 15 pairings; if sufficient emotional involvement is present acquisition can occur with even one pairing.

Type of conditioning Pairing procedure
Delayed or forward conditioning. CS (bell) presented before UCS (food); the CS+ UCS pairing continued till UCR (saliva) appears
Backward conditioning. UCS (food) presented before CS (bell) – not useful in animals; used in advertising
Simultaneous conditioning. UCS + CS presented together – often the case of learning in real life situations.
Trace conditioning. CS presented and removed before UCS presented – conditioning depends on memory trace.

A delay of less than 0.5ms is proposed to be the optimum for trace conditioning.
Temporal contiguity (time between stimulus and response) is important for conditioning according to Pavlov. But Rescorla showed that predictability is more important than temporal contiguity in humans i.e. if one can predict painful tooth extraction on hearing the dentist’s drill, then the noise gets conditioned to elicit fear response better than two unconnected, unpredictable events having temporal contiguity. Note that for classical conditioning it is not necessary that the organism understands an association in cognitive terms but such awareness facilitates the learning.

Higher-order conditioning refers to the use of an already conditioned stimulus CS1 as UCS for the next level of conditioning and eliciting a CR for another stimulus CS2. In this way second order and subsequently higher order conditioning are possible. Animals do not respond higher than 4th order usually.

Pavlovs’ experiments were conducted using human subjects by Watson & Rayner. Watson produced ‘phobia’ in an infant called Little Albert. By exposing him to loud frightening noise whenever he was shown a white rat, eventually Albert became fearful of the white rat, even when he heard no loud noise. A similar fear response was seen when any furry white object was shown to Albert. This ‘spread’ of associative learning from one stimulus to other is called stimulus generalisation.

Discrimination is a process diametrically opposite to generalization; in many situations associative learning can be very selective. In such cases, learned responses are made only to specific stimuli and not to other similar stimuli e.g. a child may be afraid of dogs but not all four-legged animals.

Extinction: reduction/disappearance of a learned response when the UCS – CS pairing (or the reinforcer in operant conditioning; see below) is not available anymore. Faster extinction may mean weaker learning. Extinction does not mean loss of learning, but only a suppression of behavioural response. Spontaneous recovery refers to regaining a previously extinguished learned response after a period of time.

Counter conditioning is a form of classical conditioning where a previously conditioned response is replaced by a new response that may be more desirable. Utilised in behavioural therapy - systematic desensitisation, aversion therapy.

Latent inhibition: A delay in learning the association between UCS and CS is seen if previous exposure to an isolated presentation of CS is present.

An organism learns an appropriate behaviour after many trials because the right behaviour is followed by appropriate (desirable) consequence. This forms the basis of the concept of operant conditioning; this phenomenon is termed the law of effect and is often demonstrated using trial-and-error learning experiments originally described by Thorndike.

A conditioning that leads to increase in the frequency of behaviour following learning is called reinforcement. A conditioning that leads to decrease in the frequency of behaviour following learning is called punishment. Both reinforcement and punishment can be positive (i.e. something is given) or negative (something is taken away).

Positive Reinforcer Food for pressing a lever (given)
Negative Reinforcer Ceasing of electric shock on pressing a lever (taken away)
Positive Punishment Points on your driving license for speeding (given)
Negative Punishment A monetary fine from a parking ticket (taken away)
Primary Reinforcer Stimulus affecting biological needs (such as food)
Secondary Reinforcer Stimulus reinforcing behaviour associated with primary reinforcers (money, praise)
Both positive and negative reinforcement increase the desired response.
The use of a “star chart,” with a variable interval schedule so that about 2 or 3 stars are administered per day depending on the good behaviour, and none for bad behaviour. This part would be positive reinforcement by giving something additional to increased the desired response

In a patient with OCD, compulsions provide short-term relief of obsessional anxiety via negative reinforcement. When carrying out compulsive rituals, anxiety is reduced acutely. This provides a reinforcement to engage in the compulsions repeatedly - the termination of the aversive anxiety cued by obsessions, increases the compulsive behaviour that removed the anxiety, without addressing the core of obsessions.

Reinforcement Schedules

A reinforcement schedule refers to how and when behaviour is reinforced on the basis of the number of responses.

Reinforcement Schedule Explanation/Example
Continuous (aka contingency reinforcement) Reinforcement every time the positive response occurs - e.g. food pellet every time a rat presses a lever in an experiment
Partial Only some of the positive responses result in positive reinforcement – the reinforcement is determined by number of responses (ratio) or time (interval)
Fixed Interval Reward occurs after a specific period of time regardless of number of responses e.g. a monthly salary irrespective of your level of performance!
Variable Interval Reward occurs after a variable (unpredictable) period of time, regardless of the number of responses e.g. an angler catching a fish - the first may be after 10 minutes, the next after 45, then 5 minutes etc.
Fixed Ratio Reward occurs after a specific number of responses e.g. after completing 20 MCQs, you give yourself a coffee (or chocolate) break.
Variable Ratio Reward occurs after a random number of responses e.g. gambling slot machines. Your first win of £20 on a gamble may occur after 3 tries; then the next win may not occur even if you play 30 times, while the third win may follow in quick succession after the second.
Important points to note:
  • In fixed schedules, a pause in response is seen after reinforcement as the organism knows the reinforcement will not be happening for some reasonable time or attempts hereafter. The pause for fixed interval schedule is greater than the pause for fixed ratio schedule. When we interpret an operation to be under control (as in fixed schedules) we learn more quickly.

  • Variable schedules generate a constant rate of response as the chance of obtaining a reward stays the same at any time and for any instance of behaviour. In general, partial schedules are more resistant to extinction than continuous schedule though they take longer to learn. Variable ratios are the most resistant to extinction. This may explain why gambling is such a difficult habit to eradicate.

  • Another important determinant of operant conditioning is contingency - learning the probability of an event.

Premack’s principle (a.k.a. Grandma’s rule): high-frequency behaviour can be used to reinforce lowfrequency behaviour e.g. “eat your greens and you can have dessert”. An existing high-frequency behaviour (eating dessert) is used to reward low-frequency behaviour (eating greens).

Avoidance learning: an operant conditioning where an organism learns to avoid certain responses or situations. Avoidance is a powerful reinforcer and often difficult to extinguish. A special form of avoidance is escape conditioning seen in agoraphobia where places in which panic occurs are avoided / escaped from leading to a housebound state eventually.

Aversive conditioning: This is an operant conditioning where punishment is used to reduce the frequency of target behaviour e.g. the use of disulfiram (noxious stimuli) to reduce the frequency of drinking alcohol.

Covert reinforcement: In covert reinforcement schedules, the reinforcer is an imagined pleasant event rather than any material pleasure e.g. imagining MRCPsych graduation event to reinforce the behaviour of practicing MCQs. Covert sensitization: The reinforcer is the imagination of unpleasant consequences to reduce the frequency of an undesired behaviour e.g. an alcoholic may be deterred from continuing to spend on alcohol by imagining his wife leaving him, being unable to support himself and ending up broke and homeless.

Flooding: An operant conditioning technique where exposure to feared stimulus takes place for a substantial amount of time so the accompanying anxiety response fades away while the stimulus is continuously present e.g. a man with a phobia of heights standing on top of the Burj Khalifa or the Shard. This will lead to the extinction of fear. When a similar technique is attempted with imagined not actual exposure then this is called implosion.

Shaping (a.k.a. successive approximation): This is a form of operant conditioning where a desirable behaviour pattern is learnt by the successive reinforcement of behaviours closer to the desired one. Note that shaping is used when the target behaviour is yet to appear (i.e. it is novel and does not exist already).
Dog runs towards a wheel but doesn’t jump Runs and makes a jump close to the wheel Runs, jumps through the wheel Runs, jumps through the wheel on fire Circus on show
Gets a bone Gets a bone Gets a bone Gets a bone Behaviour is shaped

Chaining: This refers to reinforcing a series of related behaviours, each of which provides the cue for the next to obtain a reinforcer. Chaining is used when the target behaviour is already notable in some form but not in the fully formed sequence. An example is teaching a child to write his name. The shape of individual alphabets is first taught using reinforcers and forward chaining can be used to link each alphabet in the correct order, finally reinforcing the completed name. Backward chaining starts at the end e.g. when making cupcakes, the child is first taught how to sprinkle over a fnished cupcake, the next time icing the cake and sprinkleing, the next time placing the prepared cake mixture into cupcake wrappers then icing then sprinkling etc.

Incubation: An emotional response increases in strength if brief but repeated exposure of the stimulus is present. Rumination of anxiety-provoking stimuli can serve to increase the anxiety via incubation. This is a powerful mechanism that maintains phobic anxiety and PTSD. Stimulus preparedness (Seligman) explains why snake and spider phobia are commoner than ‘shoe phobia’ or ‘watch phobia’. In evolutionary terms, the stimuli that were threatening to hunter-gatherer men has been hard wired into our system, reflexively eliciting responses immediately – and phobia develops more readily for such ‘prepared stimuli’.

Learned helplessness (Seligman): initially put forward as a behavioural model for depression. When confronted with aversive stimuli from which escape is impossible, an animal stops making attempts to escape. This was shown experimentally with a dog on an electrified floor unable to escape. After a while, the dog stopped trying, as if accepting its fate. This paradigm is frequently invoked to explain the dependence seen in victims of domestic abuse.

Reciprocal inhibition (Wolpe): If stimulus with desired response and stimulus with the undesired response are presented together repeatedly, then the incompatibility leads to a reduction in frequency of the undesired response. This is evident when your dog barks at your friend; try hugging her in front of your dog every time the dog barks and slowly the dog will stop barking at your friend. This is used in relaxation therapy for anxiety and in systematic desensitisation.

Cueing (a.k.a. prompting): specific cues can be used to elicit specific behaviours – e.g. in a classroom a teacher puts her finger on her lips to reduce chatter and elicit the response of silence. The process of unlearning such cue associations is called fading.

Bandura’s social learning theory: Bandura believed that not all learning occurred due to direct reinforcement, and proposed that people could learn simply by observing the behaviour of others and the outcomes. According to behaviourists, learning is defined as a relatively permanent change in behaviour but social learning theorists differentiate actual performance from learning a potential behaviour. Social learning theorists emphasize the role of cognition in learning; awareness and expectations rather than the actual experience of reinforcements or punishments are sufficient to have a major effect on the behaviours that people exhibit.

Cognitive processing during social learning:
  1. Attention to observed behaviour is the basic element in learning.
  2. Visual image and semantic encoding of observed behaviour memory
  3. Memory permanence via retention and rehearsal
  4. Motor copying of the behaviour and imitative reproduction
  5. Motivation to act.
Reciprocal causation: Bandura proposed that behaviour can influence both the environment and the individual and each of these three variables, the person, the behaviour, and the environment can have an influence on each other. The most commonly discussed experiment illustrating Bandura’s theory is the Bobo Doll experiment. Children watching a model showing aggression against a bobo doll learnt to display the aggression without any reinforcement schedules.

Cognitive learning (Tolman): reinforcement may be necessary for a performance of learned response but not necessary for the learning itself to occur (latent learning). He inferred that rats can make cognitive maps of mazes – called place learning - which consists of cognitive expectations as to what comes next.

Insight learning (Kohler) is diametrically opposite to associative learning and views learning as purely cognitive and not based on S-R mechanism - a sudden idea occurs and the solution is learnt.

Hierarchy of learning: Gagne’s hierarchy of learning (see the attached table) describes that simple or basic learning steps are prerequisites for later complex learning. This pattern of learning can also be seen during human development and in the hierarchy of evolution.
Stages Gagne’s learning hierarchy
1 Classical conditioning (signal learning)
2 Operant conditioning
3 Chaining
4 Verbal association
5 Discrimination learning
6 Concept learning
7 Rule learning
8 Problem solving

Comments

Popular posts from this blog

Basic principles of visual and auditory perception

MEMORY