Monaco, "The Language of Film: Signs and Syntax"

from: James Monaco, How to Read a Film. New York: Oxford UP, 1981. pp. 121 - 191.

Film is not a language in the sense that English, French, or mathematics is. It is, first of all, impossible to be ungrammatical in film. And it is not necessary to learn a vocabulary. Infants appear to understand television images, for example, months before they begin to develop a facility with spoken language. Even cats watch television. Clearly, it is not necessary to acquire intellectual competence in film in order to appreciate it, at least on the most basic level.

But film is very much like language. People who are highly experienced in film, highly literate visually (or should we say "cinemate"?), see more and hear more than people who seldom go to the movies. An education in the quasi-language of film opens up greater potential meaning for the observer, so it is useful to use the metaphor of language to describe the phenomenon of film. In fact, no extensive scientific investigation of our ability to comprehend artificial sounds and images has as yet been performed, but nevertheless we do know through research, that while children are able to recognize objects in pictures long before they are able to read, they are eight or ten years of age before they can comprehend a film image the way most adults do. Moreover, there are cultural differences in perception of images. In one famous 1920s test, anthropologist William Hudson set out to examine whether rural Africans who had had little contact with Western culture perceived depth in two-dimensional images the same way that Europeans do. He found, unequivocally, that they do not. Results varied—there were some individuals who responded in the Western manner to the test—but they were uniform over a broad cultural and sociological range. /122/

Figure 3-1. CONSTRUCTION-TASK FIGURES. Subjects asked to reconstruct these figures in three dimensions using sticks or rods, respond in different ways. People from Western cultures, trained in the codes and conventions that artists use to convey three-dimensionality in a two-dimensional drawing, see A as three dimensional and B as two-dimensional. The operating code for three-dimensionality here insists that the dimension of depth be portrayed along the 45 oblique line. This works well enough in A, but not in B, where the oblique lines are not in the depth plane. Subjects from African cultures tend to see both figures as two-dimensional, since they are not familiar with this Western three-dimensional code. Figures C and D illustrate the models of A constructed by Western and African observers, respectively. (From, "Pictorial Perception and Culture," Jan B. Deregowski. (~) 1972 by Scientific American, Inc. All rights reserved.)

The conclusions that can be drawn from this seminal experiment and others that have followed are two: first, that every normal human being can perceive and identify a visual image; second, that even the simplest visual images are interpreted differently in different cultures. So we know that images must be "read." There is a process of intellection occuring— not necessarily consciously—when we observe an image, and it follows that we must have learned, at some point, how to do this.

The "ambiguous Trident," a well-known "optical illusion," provides an easy test of this ability.

Figure 3~2. THE AMBIGUOUS TRIDENT. The illusion is intriguing only because we are trained in Western codes of perspective. The psychological effects is powerful: our minds insist that we see the object in space rather than the drawing on a plane.


It's safe to say that the level of visual literacy of anyone reading this book is such that observation of the trident will be confusing to all of us. It would not be for someone not trained in /123/Western conventions of three-dimensionality. Similarly, the well-known optical illusions in Figures 3-3 and 3-4 demonstrate that the process of perception and comprehension involves the brain: it is a mental experience as well as a physical one.

Figure 3-3 THE NECKER CUBE. Devised in 1832 by L. A. Necker, a Swiss naturalist. The illusion depends, once again, on cultural training.


Whether we "see" the Necker Cube from the top or the bottom or whether we perceive the drawing in Figure 3-4 as either a young girl or an old woman depends not on the physiological function of our eyes but on what the brain does with the information received.

Figure 3-4. "My Wife and My Mother-in-Law," by cartoonist W.E. Hill, was published in Puck in 1915. It has since become a famous example of the phenomenon known as multistable figure. The young woman's chin is the old woman's nose. The old woman's chin is the young woman's chest.


The word "image," indeed, has two conjoined meanings: an image is an optical pattern; it is also a mental experience, which is why, we can assume, we use the word "imagine" to describe the mental creation of pictures.

So there is a strong element of our ability to observe images, whether still or moving, that depends on learning. This is, interestingly, not true to a significant extent with auditory phenomena. If the machines are sophisticated enough, we can produce recorded sounds that are technically indistinguishable from their originals. The result of this difference in mode of the two systems of perception—visual and audi- /125/ tory—is that whatever education our ears undergo in order to perceive reality is sufficient to perceive recorded sound, whereas there is a subtle but significant difference between the education necessary for our eyes to perceive (and our brain to understand) recorded images and that which is necessary simply to comprehend the reality that surrounds us. It would serve no purpose to consider phonography as a language, but it is useful to speak of photography (and cinematography) as a language, because a learning process is involved.


Another way to describe this difference between the two senses is in terms of the function of the sensory organs: ears hear whatever is available for them to hear; eyes choose what to see. This is true not only in the conscious sense (choosing to redirect attention from point A to point B or to ignore the sight altogether by closing our eyes), but in the unconscious as well. Since the receptor organs that permit visual acuity are concentrated (and properly arranged) only in the "fovea" of the retina, it's necessary for us to stare directly at an object in order to have a clear image of it.

You can demonstrate this to yourself by staring at the dot in the center of this page. Only the area immediately surrounding it will be clear. The result of this foveated vision is that the eyes must move constantly in order to perceive an object of any size. These semiconscious movements are called "saccades" and take approximately 1/20 second each, just about the interval of persistence of vision, the phenomenon that makes film possible.

The conclusion that can be drawn from the fact of foveated vision is that we do indeed read an image physically as well as mentally and psychologically, just as we read a page. The difference is that we know how to read a page in English, from the left to right and top to bottom—but we are seldom conscious of how precisely we read an image.

A complete set of physiological, ethnographic, and psychological experiments might demonstrate that various individuals read images more or less well in three different ways:


Figure 3~5. SACCADE PATTERNS. At left, a drawing of a bust of Queen Nefertiti; at right, a diagram of the eye movements of a subject viewing the bust. Notice that the eye follows regular patterns rather than randomly surveying the image. The subject clearly concentrates on the face and shows little interest in the neck. The ear also seems to be a focus of attention, probably not because it is inherently interesting, but rather because it is located in a prominent place in this profile. The saccadic patterns are not continuous; the recording clearly shows that the eye jerks quickly from point to point (the "notches" in the continuous line), fixing on specific nodes rather than absorbing general information. The recording was made by Alfred L. Yarbus of the Institute for Problems of Information Transmission, Moscow. (From "Eye Movements and Visual Perception, " by David Noton and Lavdrence Stark, June 1971.


The irony here is that we know very well that we must learn to read before we can attempt to enjoy or understand literature, but we tend to believe, mistakenly, that anyone can read a film. Anyone can see a film, it's true, even cats. But some people have learned to comprehend visual images—physiologically, ethnographically, and psychologically—with far more sophistication than have others. This evidence confirms the validity of the triangle of perception outlined in Chapter 1, uniting author, work, and observer. The observer is not simply a consumer, but an active or potentially active—participant in the process.

Film is not a language, but is like a language, and since it is like language, some of the methods that we use to study language might probably be applied to a study of film. In fact, during the last ten years, /127/

Figure 3-6. THE PONZO ILLUSION. The horizontal lines are of equal length, yet the line at the top appears to be longer than the line at the bottom. The diagonals suggest perspective, so that we interpret the picture in depth and conclude, therefore, that since the "top" line must be "behind" the "bottom" line, further away, it must then be longer.


this approach to film—essentially linguistic—has grown considerably in importance. Since film is not a language, strictly linguistic concepts are misleading. Ever since the beginning of film history, theorists have been fond of comparing film with verbal language (this was partly to justify the serious study of film), but it wasn't until a new, larger category of thought developed in the fifties and early sixties—one that saw written and spoken language as just two among many systems of communication—that the real study of film as a language could proceed. This inclusive category is semiology, the study of systems of signs. Semiologists justified the study of film as language by redefining the concept of written and spoken language. Any system of communication is a "language"; English, French, or Chinese is a "language system." Cinema, therefore, may be a language of a sort, but it is not clearly a language system. As Christian Metz, the well-known film semiologist, pointed out: we understand a film not because we have a knowledge of its system, rather, we achieve an understanding of its system because we understand the film. Put another way, "It is not because the cinema is language that it can tell such fine stories, but rather it has become language because it has told such fine stories" [Metz, Film Language, p. 47].

For semiologists, a sign must consist of two parts: the signifier and the signified. The word "word," for example—the collection of letters or sounds—is a signifier; what it represents is something else again—the "signified." In literature, the relationship between signifier and signified is a main locus of art: the poet is building constructions that, on the one hand, are composed of sounds (signifiers) and, on the other, of meanings (signifieds), and the relationship between the two can be fascinating. In fact, much of the pleasure of poetry lies just here: in the dance between sound and meaning.

But in film, the signifier and the signified are almost identical: the sign /128/ of cinema is a short-circuit sign. A picture of a book is much closer to a book, conceptually, than the word "book" is. It's true that we may have to learn in infancy or early childhood to interpret the picture of a book as meaning a book, but this is a great deal easier than learning to interpret the letters or sounds of the word "book" as what it signifies. A picture bears some direct relationship with what it signifies, a word seldom does.

[Pictographical languages like Chinese and Japanese might be said to fall somewhere in between film and Western languages as sign systems, but only when they are written, not when they are spoken, and only in limited cases. On the other hand, there are some words—"gulp," for example—that are onomatopoeic and therefore bear a direct relationship to what they signify, but only when they are spoken.]

It is the fact of this short-circuit sign that makes the language of film so difficult to discuss. As Metz put it, in a memorable phrase: "A film is difficult to explain because it is easy to understand." It also makes "doing" film quite different from "doing" English (either writing or speaking). We can't modify the signs of cinema the way we can modify the words of language systems. In cinema, an image of a rose is an image of a rose is an image of a rose—nothing more, nothing less. In English, a rose can be a rose, simply, but it can also be modified or confused with similar words: rose, rosy, rosier, rosiest, rise risen, rows (ruse), arose, roselike, and so forth. The power of language systems is that there is a very great difference between the signifier and the signified; the power of film is that there is not.

Nevertheless, film is like a language. How, then, does it do what is does? Clearly, one person's image of a certain object is not another's. If we both read the words "rose" you may perhaps think of a Peace rose you picked last summer, while I am thinking of the one Laura Westphal gave to me in December 1968. In cinema, however, we both see the same rose, while the filmmaker can choose from an infinite variety of roses and then photograph the one chosen in another infinite variety of ways. The artist's choice in cinema is without limit; the artist's choice in literature is circumscribed, while the reverse is true for the observer. Film does not suggest, in this context: it states. And therein lies its power and the danger it poses to the observer: the reason why it is useful, even vital, to learn to read images well so that the observer can seize some of the power of the medium. The better one reads an image, the more one understands it, the more power one has over it. The reader of a page invents the image, the reader of a film does not, yet both readers must work to interpret the signs they perceive in order to complete the process of intellection. The more work they do, the better the balance between observer and creator in the process, the better the balance, the more vital and resonant the work of art. /129/

The earliest film texts—even many published recently—pursue with shortsighted ardor the crude comparison of film and written/spoken language. The standard theory suggested that the shot was the word of film, the scene its sentence, and the sequence its paragraph. In the sense that these sets of divisions are arranged in ascending order of complexity, the comparison is true enough; but it breaks down under analysis. Assuming for the moment that a word is the smallest convenient unit of meaning, does the shot compare equivalently? Not at all. In the first place, a shot takes time. Within that time span there is a continually various number of images. Does the single image, the frame, then constitute the basic unit of meaning in film? Still the answer is no, since each frame includes a potentially infinite amount of visual information, as does the soundtrack that accompanies it. While we could say that a film shot is something like a sentence, since it makes a statement and is sufficient in itself, the point is that the film does not divide itself into such easily manageable units. While we can define "shot" technically well enough as a single piece of film, what happens if the particular shot is punctuated internally? The camera can move; the scene can change completely in a pan or track. Should we then be talking of one shot or two?

Likewise, scenes, which were defined strictly in French classical /130/ theater as beginning and ending whenever a character entered or left the stage, are more amorphous in film (as they are in theater today). The term scene is useful, no doubt, but not precise. Sequences are certainly longer than scenes, but the "sequence-shot," in which a single shot is coterminous with a sequence, is an important concept and no smaller units within it are sequential.

It would seem that a real science of film would depend on our being able to define the smallest unit of construction. We can do that technically, at least for the image: it is the single frame. But this is certainly not the smallest unit of meaning. The fact is that film, unlike written or spoken language, is not composed of units, as such, but is rather a continuum of meaning. A shot contains as much information as we want to read in it, and whatever units we define within the shot are arbitrary. Therefore, film presents us with a language (of sorts) that:

a) consists of short-circuit signs in which the signifier nearly equals the signified; and

b) depends on a continuous, nondiscrete system in which we can't identify a basic unit and which therefore we can't describe quantitatively. The result is, as Christain Metz says, that: "An easy art, the cinema is in constant danger of falling victim to this easiness." Film is too intelligible, which is what makes it difficult to analyze. "A film is difficult to explain because it is easy to understand."



Films do, however, manage to communicate meaning. They do this essentially in two different manners: denotatively and connotatively. Like written language, but to a greater degree, a film image or sound has a denotative meaning: it is what is and we don't have to strive to recognize it. This factor may seem simplistic, but it should never be underestimated: here lies the great strength of film. There is a substantial difference between a description in words (or even in still photo- graphs) of a person or event, and a cinematic record of the same. Because film can give us such a close approximation of reality, it can communicate a precise knowledge that written or spoken language seldom can. Language systems may be much better equipped to deal with the nonconcrete world of ideas and abstractions (imagine this book, for example, on film: without a complete narration it would be incomprehensible), but they are not nearly so capable of conveying precise information about physical realities.

By its very nature, written/spoken language analyzes. To write the /131/