Mayer’s principles for multimedia learning

14 minute read

Richard Mayer is professor of psychology at the University of California, Santa Barbara. In 2001, he set out his principles for multimedia learning, which have become a standardised approach in instructional design methods.

These principles were expanded upon in e-Learning and the Science of Instruction, co-authored by Ruth Colvin Clark. The 4th edition of this text (published in 2016) has been used to make this guide.

Multimedia principle

… or people learn better from words and pictures than from words alone.

In any kind of training, it is customary to use words, either printed or spoken, as the main method of sharing information. Words are quick and cheap – an instructional designer doesn’t need specialist software or expertise to produce them. The question explored in this principle is whether there is a return on investment for supplementing words with pictures (either static or dynamic), and whether people learn more deeply from words and graphics than from words alone.

Research results suggest that words and graphics are more effective when combined than just words alone, with some provisos:

  • graphics should not be an afterthought: they should be planned alongside the text to maximise understanding
  • decorative graphics do not improve learning

The rationale for combining text with graphics is that “people are more likely to understand material when they can engage in active learning” (Clark & Mayer, 2016. p71). Multimedia presentations that represent material in both words and pictures encourage learners to make connections between the pictorial and verbal representations of the information, making the experience more meaningful and more likely to be committed to long-term memory. By contrast, providing words alone may encourage learners – especially those with less expertise – to engage in shallow learning by not making connections with other knowledge.

There is more to instruction than simply presenting information, and page after page of text is rarely sufficient.

Selecting graphics to support learning

Clark & Mayer (2016) suggest that there are six possible functions of graphics:

Graphic typeDescription
DecorativeVisuals added for aesthetic appeal or humor
RepresentationalVisuals that illustrate the appearance of an object
OrganizationalVisuals that show qualitative relationships among content
RelationalVisuals that summarize quantitative relationships
TransformationalVisuals that illustrate changes in time or over space
InterpretiveVisuals that make intangible phenomena visible and concrete


Based on these categories, it is recommended that decorative and representational images are minimised, and instead focus on graphics that help the learner to understand the material presented, or organise the material in a useful way.

“We favor a knowledge construction view in which learning is seen as a process of active sense-making and teaching is seen as an attempt to foster appropriate cognitive processing in the learner.”

– Clark & Mayer (2016, p76)

What the research says

Consistently, students who receive a multimedia lesson consisting of words and pictures perform better on a subsequent transfer test than students who received words alone. Across the eleven studies cited in Clark & Mayer (2016), a median percentage gain of 89% was achieved with a median effect size greater than 1 when comparing words with pictures and words alone.

The multimedia effect “establishes the potential for multimedia lessons to improve human learning” (Clark & Mayer, 2016. p79), and it therefore belongs firmly at the top of this list of principles.


There are some important caveats to the multimedia principle.

Learners aren’t always the best judge

While it is clear from the description above that not all graphics are equally effective, students frequently misjudge the value of these graphics. In a 2012 study, students did not learn better when added illustrations were purely decorative or seductive, though they reported liking the lesson better when it contained any kind of illustration.

Liking != learning

As a result of this inability to distinguish helpful and unhelpful illustrations, instructional designers should only use highly relevant, instructional illustrations, and even include pointers in the text as to what to look for in the provided illustrations.

There is a diminishing effect

A combination of words and graphics are particularly useful and important for novices, though less useful for expert learners.

Experts are able to create their own mental images as they read a text, making use of relevant schema that they have formed previously in order to comprehend. The provision of words and graphics can actually negatively affect expert learners. If teaching a more advanced group of learners who are experienced in the topic being presented, they may be able to learn well mainly (or entirely) from text, or mainly from graphics.


A number of studies have failed to find that animations are more effective than a series of static frames depicting the same material. In summing up a study that compared an animation of how lightning storms develop with a series of static illustrations supported by printed text, Clark & Mayer explain this as follows:

“Presumably, the so-called passive medium of illustrations and text actually allowed for active processing because the learners had to mentally animate the changes from one frame to the next, and learners were able to control the order and pace of their processing. In contrast, the so-called active medium of animations and narration may foster passive learning because the learner did not have to mentally animate and could not control the pace and order of the presentation.”

– Clark & Mayer, 2016. pp81-3.

When to use animations

Despite the results above, animations or videos have been shown to work well in tasks that show complicated manual skills. They worked well in a task where students made paper flowers and learned to tie knots, for example. In contrast to these examples, explanations of how complex systems work (such as braking systems, or waves in the ocean) have been shown to be just as effective or more effective when presented as static diagrams and text rather than animations.

They can also be useful in time-lapse sense, showing phenomena that are otherwise difficult to visualise, such as seed germination or hummingbirds in flight.

It seems that hands-on procedures can be guided effectively using animated visuals, but conceptual information is more effectively shared with static visuals.

Coherence principle

… or adding extra material can hurt learning.

People learn better when extraneous words, pictures and sounds are excluded rather than included.

“Perhaps our single most important recommendation is to keep the lesson uncluttered. In short, […] you should avoid adding any material that does not support the instructional goal.”

– Clark & Mayer, 2016. p151.

There is a need to remove any media that is not central to the instructional goal of the lesson – a process that Mayer and Moreno called weeding. Some instructional designers have attempted to make use of background music and exciting or interesting imagery, or what Mayer calls seductive details in order to reduce dropout rates on e-learning courses, arguing that their inclusion may motivate learners, but this flies in the face of the body of research.

“When learners use their limited processing capacity on extraneous material, less capacity is available for making sense of the essential content.”

– Clark and Mayer, 2016. p152.

Ways to apply the coherence principle

Remove extraneous words

Cute stories and interesting pieces of trivia can feel to the instructional designer like harmless additions to a multimedia presentation, but research suggests that they may not produce the desired effects. The rationale for excluding extraneous words is based upon the cognitive theory that assumes that working memory capacity is very limited.

Clark & Mayer (2016, p155) identify three distinct types of extraneous wording used for different purposes:

  • for interest: related to the topic but not relevant to the instructional goal
  • for elaboration: expands upon the key ideas of the lesson
  • to technical details that go beyond the key ideas of the lesson

They recommend against all three, suggesting that when these additions are more interesting than the fundamental content of a lesson that they can distract learners away from achieving the instructional goals. Not only do they not help learning, but in some cases they can even hurt learning.

Evidence for this can be found in many studies conducted over the last 20 years. Mayer, Heiser and Lonn (2001) conducted an experiment that concluded that presenting more information can result in less learning: the addition of additional narration segments to the lesson distracted students away from the core instructional goals. A related study conducted in 2007 found that college students who read the lesson with seductive details “spent less time reading the relevant text, recalled less of the relevant text and showed shallower processing on an essay task as compared to students who read the lightning passage without seductive details” (Clark & Mayer, 2016. p156).

Adding seductive details harms learning by distracting learners from the important information and by disrupting the coherence of the lesson.

Contiguity principle

… or on-screen text should be placed close to the graphics to which they refer.


People learn better when corresponding words and pictures are presented near to each other rather than far from each other on the page or screen. Presenting graphics followed by explanatory text further down the screen forces the user to scroll up to see the graphic & scroll down to see the text. This physically separates the text and graphic, which should be considered to be two parts of a wider whole. This is referred to as the spatial contiguity principle: related text and graphics should be presented together.

Legends presented alongside charts, with labels linked to corresponding numbers on a diagram, break this principle, forcing the user to shift their attention back and forth from the graphic to the legend. Consider the following example:


In the example above, the bones of the skull are labelled using a legend, with descriptions off to the side of the image. Numbers are used to link the areas identified with the names. This divides learners’ attention and should be avoided.

In this example, labels are provided on top of the graphic, which makes it easier to focus on the content.


Similarly, presenting an animation that is followed by audio narration separates the two in time, resulting in less learning than if the animation and narration were synchronised in time. This is referred to as the temporal contiguity principle: related media should be integrated and presented synchronised in time.

3. Redundancy principle

People learn better from graphics and narration than from graphics, narration and on-screen text. And when words are presented as narration rather than narration and on-screen text.

2. Signalling principle

People learn better when cues that highlight the organisation of essential information are added.

6. Segmenting principle

People learn better from a multimedia lesson when it is presented in learner-controlled segments rather than a continuous unit.

7. Pre-training principle

People learn better from a multimedia lesson when students know names and behaviours of system components.

8. Modality principle

People learn better when words are presented as narration rather than on-screen text.

10. Personalisation principle

People learn better from multimedia lessons where words are spoken in conversational style rather than formal style.

11. Voice principle

People learn better when the narration in multimedia lessons is spoken in a friendly human voice rather than a machine voice.

12. Image principle

People do not necessarily learn better from a multimedia lesson when the speaker’s image is added to the screen.


Mayer, Heiser and Lonn (2001). Cognitive Constrains on Multimedia Learning: When Presenting More Material Results in Less Understanding. Journal of Educational Psychology Vol. 93(1), pp187-198 [link].

Clark & Mayer (2016). e-Learning and the Science of Instruction, 4th ed. Hoboken, NJ: Wiley.