The analysis

Our analysis focuses on the intonation melody and only concerns boundaries that have tonal implications. Only the boundaries of the intonation phrase are therefore transcribed.

Our analysis differs from two earlier systems for the transcription of Dutch Intonation:

  • The IPO grammar of intonation (Collier and 't Hart 1981, 't Hart, Collier and Cohen 1990), which is based on pitch movements. Instead, like ToBI, ToDI is based on pitch targets or tones.
  • The tone-based analysis of Gussenhoven (1988, 1991). ToDI is a less abstract version of that analysis and as a result is easier to apply. It is close to its computer implementation (Gussenhoven & Rietveld 1992), which also formed the inspiration for the synthesis-by-rule program included in the second edition.

This course assumes no familiarity with any of these systems. Its primary aim is to help you become proficient in intonation transcription in a relatively short amount of time.

However, this is not a course in phonetics. It is assumed that you have some background in phonetics and phonology, and are familiar with speech analysis tools used in phonetic research.

The resynthesis rules for the generation of the F0 contour were especially developed for this course by Joop Kerkhoff. The speech files were produced with the PSOLA-resynthesis option in the Praat program (Praat: doing phonetics by computer; Paul Boersma & David Weenink).

New in the second edition:

  • The exercises feature a keyboard-like console with dedicated buttons for each pitch accent and each boundary tone.
  • A synthesis facility which allows the user to synthesize every transcribable contour for every utterance in the exercises.
  • Improved section on appended constructions.
  • The symbol H*+L to describe a steep fall before a gradual rise has been replaced with H*LH (section 1.8).

Literature

M.E. Beckman and G.M. Ayers (1994). Guidelines for ToBI transcription. Version 2.0. Ms

R. Collier and J. 't Hart (1981). Cursus Nederlandse Intonatie. Leuven: Acco.

T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vreken (1996), "The MBROLA Project: Towards a Set of High-Quality Speech Synthesizers Free of Use for Non-Commercial Purposes" Proc. ICSLP'96, Philadelphia, vol. 3, pp. 1393-1396 .

C. Gussenhoven (1988). Adequacy in intonation analysis: The case of Dutch. In H. van der Hulst & N. Smith (eds.) Autosegmental Studies on Pitch Accent. Dordrecht: Foris.

C. Gussenhoven (1991). Tone segments in the intonation of Dutch. In Thomas F. Shannon & Johan P. Snapper (eds.) The Berkeley Conference on Dutch Linguistics 1989. Lanham (MD): University Press of America.

C. Gussenhoven (forthcoming) Transcription of Dutch Intonation. In Sun-Ah Jun (ed.) Prosodic Typology and Transcription: A Unified Approach. Oxford: Oxford University Press.

C. Gussenhoven & T. Rietveld (1992). A target-interpolation model for the intonation of Dutch. In Proceedings ICSPL 92. 1235-1238.

J. 't Hart, R. Collier and A. Cohen (1990). A perceptual study of intonation. Cambridge: Cambridge University Press.

Taalportaal: the linguistics of Dutch, Frisian and Afrikaans online. Dutch - Phonology - Accent and intonation - Intonation

User interface

Audio examples

If you open a link to an audio example (by clicking the left mouse button), the sound file is downloaded and played by a "helper application", as defined by your web browser.

If you want to listen to an audio example repeatedly, it is usually best not to click the audio link again, but to supply the appropriate command to the helper application (mostly a play button). This avoids creating a trail of instances of the helper application.

It may occasionally happen that the picture that is provided with the audio example (the graphic representation of the pitch contour) is incomplete. Use the "refresh" option in your right mouse button menu to correct this.

It is also possible to save sound files to disk, so you can use them in your speech analysis software. Click the right mouse button, and select the appropriate menu option to do this.

All sound files used in this course are 16-bit, 16KHz Windows Wave files (extension .wav), and can be used with speech analysis programs such as Praat, which is widely used.

A copy of Praat (Unix or PC) can be obtained from Praat home page.

Exercises

The exercises form an essential part of the course. Learning to apply the ToDI system is largely a matter of gaining practical experience. The exercises require a JavaScript compatible browser.

The examples in the exercises are presented as a string of words and a string of boxes, one for each word. Below the example you find a console with boxes, each box representing a symbol, arranged in three groups, Initial boundary symbols, Final boundary symbols and Pitch accent symbols. This is your symbol bank.

Only boxes that are marked with a black dotted line can be filled in by the user. Boxes with a white dotted line cannot be edited; they are not relevant for this exercise. To do so, click on the box, which makes available the appropriate section of the symbol bank, from which you can then make your choice. It will appear in the relevant box in black font. You can correct your choice at any time, including replacing with 'no symbol'. Boxes that cannot be filled in or in which a symbol has already been selected are given in white font. The full set of symbols in the symbol bank will only gradually become available, in tune with the unfolding of the explanation in the text.

Below the symbol bank, there are controls that allow you (a) to inspect the F0 contour (Contour), (b) to check your transcription (Check), (c) to request the correct transcription (Key) and (d) to inspect the previous transcription you entered for the same utterance, if available (Previous).

At the bottom of the screen, you will find a synthesis facility with which you can synthesize the intonation contour of the transcription given in the boxes on the basis of a diphone speech file. A tempo control allows you to adjust the speed of utterance. Any legitimate transcription will be synthesized. You can compare the original with the synthesized version to see if the intonation patterns are the same, and if they are not, try another. You could take any example and go through all possible transcriptions, just to see what they sound like.

Overview of symbols and conventions

In the ToDI transcription system, an intonation contour is a melodic pattern which occurs in an Intonational Phrase. Each contour consists of a number of tones, H(igh) and L(ow). The notation T is used to refer to either H or L. These tones occur in two locations:

  1. at the edges of Intonational Phrases
  2. at accented syllables

Intonational Phrases are demarcated by %. Not every edge will have a tone. Tones occurring at IP edges are notated %T (initial boundary tone) or T% (final boundary tone). Every accented syllable is marked by a tone or a cluster of tones, called a 'pitch accent'. The first tone of a pitch accent is notated T*. A T* may be followed by further tones to describe the movement from the accented syllable onward.

Here is an example:

All the symbols will be explained in the text. By way of quick reference guide, a list of each element is given, as well as a list of all pitch accents, with approximate indications when they are used.

Symbols

H*L*high/low accent
HLupward/downward movement after L*/H*
H%L%rising/low ending of IP
%H%Lhigh/low beginning of IP
%HLInitial falling pitch not marking accent
%half-completed fall/rise at end of IP
!H*downstepped H*

The pitch accents, in the order as they appear in the symbol bank:

H*LHigh fall from accented syllable
!H*LLow fall from accented syllable, also called 'downstepped fall'
H*High level from accented syllable
!H*Lowered level from accented syllable, also 'downstepped level'
H*LHPre-final steep fall followed by a gradual rise towards the next accented Syllable, also 'pre-nuclear fall-rise'
L*!HLLow rise fall from accented syllable, also 'downstepped delayed peak'
L*HRise from low from accented syllable
H*!HVocative chant

The melodic shape of the contour is defined by the consecutive contour shapes defined by the boundary tones and the pitch accents.

Some examples

The following examples illustrate ToDI transcriptions of some contours with a single accent in a single IP. (Don't worry about the transcription symbols, which will all be dealt with later in the course.)
Listen to the synthesized utterance transcribed below:

Audio Daar moeten we nog eens overpraten vind ik
%LH*LL%

In the text, we often provide graphic representations of pitch contours, like the one below. To hear them, just click anywhere in the grey area.

Play audioAudio example

In these representations, the ToDI transcription symbols are aligned with the corresponding pitch events in the contour. The beginning of the gloss below the transcription as well as all pitch accented syllables are lined up with the corresponding tone symbols.

Now compare this synthesized utterance with the same utterance in natural speech, spoken with the same intonation contour:

Play audioAudio example

You may also find that the tempo of the human speaker is faster than that of the synthesized voice. In general, computer speech tends to be slower than natural speech. Remember you can always adjust the tempo of the synthesized utterance.

Phrasing, accentuation and melody

An intonation transcription contains three elements:

  1. phrasing
  2. accentuation
  3. melody

We will briefly discuss these three aspects separately. However, do bear in mind that such a separation is artificial, and serves only an expository purpose. Your understanding of the phrasing of utterances is intimately tied up with your understanding of the accentuation and the melodic aspects: in any given utterance the three aspects are always considered together.

Phrasing

Speech is not simply a continuous string of smoothly concatenated spoken words. For one thing, speakers may remain silent for brief periods of time at largely unpredicatble points, or utter noises like err or erm. These discontinuities are known as hesitations. For another, speech is "chunked up" in a way that somehow reflects the structure of the sentence or sentences. This is known as the prosodic phrasing. The more noticeable of such prosodic phrases can be heard in the intonation. The discontinuity that separates such intonational phrases, or IPs, may consist of a (brief) pause, of a relatively long syllable before the end of a phrase, or of a melodic feature, or indeed by any combination of these.

There are usually different ways of phrasing a single sentence, i.e., it is not the case that there are rules that uniquely determine a sentence's intonational phrasing. The sentence "En toen ik thuiskwam waren alle vissen dood" may be pronounced as one IP, or as two, in which case the boundary will fall after the subordinate clause, i.e. after "thuiskwam". If an IP-boundary occurs, it may be pronounced in different ways.

Listen to four pronunciations of this sentence. In version 1, it is spoken as a single IP. In version 2, the end of the subordinate clause is marked by a melodic feature, but there is no pause. In version 3, there is a pause as well as a melodic feature. In version 4, there is a hesitation, before the word "vissen". In spite of this discontinuity, version 4 constitutes a single IP.

En toen ik thuiskwam, waren alle vissen dood
Audio Audio Audio Audio
1234

When the end of the IP is marked by a melodic feature, a final boundary tone, L% or H% is transcribed, and if there is just a pause, the symbol % is used to transcribe the IP-end. The beginning of an IP is transcribed by %L or %H.

If we do not transcribe tones, we can still transcribe the phrasing, as shown below for the third example above.

Audio En toen ikthuiskwam,warenallevissendood 
% %%   % 

IPs are grouped together in Utterances. We will not mark Utterance boundaries in this course, because all examples begin and end with an Utterance boundary. The symbol "{" can be used to mark beginnings of Utterances, and "}" to mark their ends.

Accentuation

The accentuation concerns the locations of so-called sentence accents or pitch accents. They occur on one or more words in the IP. The presence of a sentence accent on a word does not necessarily make that word, or the syllable the accent occurs on, sound more prominent of emphatic, although it often has this effect. As the term "pitch accent" suggests, it is a melodic element. An example of a pitch accent is H*L. Here, the star indicates that the H-tone is pronounced on the stressed syllable, while the L-tone is pronounced after it, on the next syllable if there is one. Together with the boundary tones, the tones of the pitch accents determine the intonation contour of the utterance. Thus, you can recognise them, because you recognise the intonation contour which they create.

It is useful to make a distinction between "sentence accent" and "pitch accent". This is because the accent in any accented word can be realised by different "pitch accents", where the choice of pitch accent will determine the intonation pattern. Therefore, the "sentence accent" just refers to the location of the accent, and abstracts away from the particular pitch accent used.

Here are some examples with accented syllables that we think are easily recognised even without knowing the kind of pitch accent that is used. To see if we are right, click the accented words (this does not work with all browsers).

Audio Er moet nog maar 's over gesproken worden

Audio We hebben de hele dag binnen gezeten

Audio Daar gaan deRijcke, Meulemeester en Versmissen door de derde bocht

The last pitch accent in an IP is often called the nuclear pitch accent, while those before the final pitch accent are called pre-nuclear pitch accents.

Melody

The information provided by the IP-boundaries and the location of the accents is completed by the melodic information, the third and last type of information recorded in an intonation transcription. As observed above, it is specified by tones. There are two tones, H (high) and L (low), which can be are referred to together as T (tone). The tones are aligned with words in two kinds of locations: the boundaries of the IP and the sentence accents.

A given sentence may be spoken with infinitely many physical pitch contours, but large groups of these will be recognized by listeners as realisations of the same intonation, expressing the same meaning. In this sense, a machine can produce the same intonation pattern as a human being.
Here is an example of an intonation contour produced in four versions.

  • original utterance
  • resynthesized with a stylized pitch contour, using PSOLA
  • synthesized with Fluent Dutch Text-To-Speech, using the standard values of that program
  • imitated by another speaker

Er staan me daar een hoop mensen te wachten. 
Audio Audio Audio Audio
OriginalResynthesisText-To-Speech Repetition

In the same way, we can say that two different sentences have the same intonation pattern. We show this with a fully transcribed example. Recall that the tones that are aligned with the boundaries of the IP are known as boundary tones (%T for inital boundary tone and T% for final boundary tone). The tones that are aligned with the accented syllables are the pitch accents, and written T*. The boundary tone and the pitch accent may consist of just a single tone or of a sequence of two tones.

In this example, the speaker uses the same intonation contour twice. Listen to the whole utterance first, then compare each IP.

Play audio
Play audioAudio example
Play audioAudio example

The aim of the course, then, is to teach you to transcribe intonation contours, and to read transcriptions back. Some people are naturally better at this than others, and you may not find this course at all easy. The idea is that you learn to do this step by step, that is, contour by contour. At any point during this course, you will hopefully feel confident with certain contours, but really have no idea of many others. As you go along, perhaps with frequent revisions of the old material, you increase the number of patterns you can recognise, and ideally, reproduce from memory. The reason why some people's progress is faster than that of others is not clear. In general, people differ in the extent to which they can have overt intuitions about the (sound) structure of language. Syllables are easy, vowels just a little harder, stress feet harder still, and pitch accents possibly even harder than stress feet. But with the right kind and amount of attention, you can recognise them in much the same way that you can learn to identify the vowels of Dutch.

Learning how to do this is a matter of listening, and recognising the pattern. You must learn to listen not just "phonetically", but "linguistically", that is, to interpret the contour in terms of the pitch accents and boundary tones we will present. Looking at a graphic representation of a pitch contour can be instructive in the beginning, for instance if you cannot hear whether the pitch goes up or down, but it can also be very confusing, because the same pitch accent will look very different in different contexts, depending of number of syllables or the kind of consonants. Always try to determine auditorily which pattern you're dealing with, and then check and see what the contour looks like. There will be a separate section on pitch graphs.

Sameness and difference

In the previous section, you have seen that the same intonation contour can be produced by different speakers, or even be produced in audibly different ways by a machine. That is, in this course, you are not trained to describe all the differences you can hear, but only those that determine the linguistic identity of a contour, in other words, only the contrastive differences.

Very large (but not linguistically contrastive) phonetic differences may exist between the same contours. One source of variation is the pitch span. Listen to the following three pronunciations of the sentence "De deur is helemaal kromgetrokken". The highest pitch and the low end pitch are indicated in Hz.

Play audioAudio example
Play audioAudio example
Play audioAudio example

In general, the wider the pitch span, the more emphatic will be the pronunciation. The point here is that all three pronunciations represent the same sentence, spoken with the same intonation contour. We can change the sentence by replacing "kromgetrokken" by "scheefgetrokken", or we can change the intonation contour by pronouncing the sentence like this:

Play audioAudio example

Variations in pitch register will move a contour up or down in the speaker's pitch range. When women imitate the speech of men, they usually bring their pitch down, and pronounce everything at a lower register (for them), while conversely, men will raise their register when imitating the speech of children or women. Here is an example of a sentence pronounced at two registers by the same speaker.

Play audioAudio example
Play audioAudio example

Manipulations of the pitch span and the pitch register can have very dramatic communicative effects, but this course is not concerned with these aspects of speech production.

Perhaps unexpectedly, linguistically different contours may be very similar phonetically. Frequently, this situation arises for pre-nuclear pitch accents in the IP: we may not be certain whether it is there or not. In such cases, it will not make much of a difference whether we transcribe a pitch accent, since with or without, the result sounds much the same. Here is an example, with the doubtful pitch accent placed in brackets. (Don't worry about the transcription symbols, which will all be dealt with later in the course.)

Play audioAudio example

When transcribers come up with different analyses of the same utterance, this either means that they have come upon a doubtful utterance, in which case both transcriptions have something to recommend themselves, or one of the transcribers has made a error. Some of those errors are well-known, and easy to identify. We will identiy each of these as "frequent error no#" at the point in the course where they are best dealt with.

Pitch tracks vs auditory impression

Technically, pitch is the auditory sensation of the periodicity in the speech signal, the sequence of air pressure variations emitted by the speaker and impinging on the listener's eardrum. This periodicity amounts to repetitions of the same, or nearly the same, pattern of vibration. The pattern of vibration determines the sound quality (the vowel quality, say) that we perceive. The periodicity is determined by the rate of vocal cord vibration, the speed with which the vocal cords open and close, or vibrate. Rates of vibration in male speakers average around 125 Hz, while those in female speakers, whose larynxes are much smaller in the front to back dimension than those of men, average around 225 Hz, where "Hz" stand for the number of opening-and-closing actions per second. A speaker can change the rate of vocal cord vibration by making the vocal cords slacker (for lower rates) or tenser (for higher rates).

In this course, you will often see pitch tracks of the examples. These are produced by analysis programmes that take the speech signal as input and are sensitive to the periodicity in the signal caused by the vibrations of the vocal cords. Short of measuring these vibrations at the larynx itself with strap-on electrodes on the skin covering the larynx, this is our best estimate of how fast the vocal cords are vibrating. Nevertheless, they should be treated with caution, for a number of reasons:

  1. Pitch trackers make mistakes

    When the voice becomes creaky at lower pitches, the algorithm may be confused by peaks in the signal that do not derive from the vibratory action, and doubling errors may occur. Here is an example.

    Play audioAudio example

    Similarly, the algorithm may miss every second periodicity peak, 'believing' these peaks determine the sound quality rather than the periodicity. In such cases halving errors will occur. Here is an example.

    Play audioAudio example

    Doubling errors and halving errors are usually easy to detect; the pitch track shows a sudden change to a value half or double that of the immediately preceding value, and there is no auditory impression that corresponds to this jump. (No, our ears never make such errors in normal speech signals!)

    We hope that we have spotted, and corrected, all the doubling and halving errors that we encountered in the material in this course.

  2. Pitch tracks of voiced and voiceless consonants

    Pitch tracks will show no record during voiceless consonants, while voiced obstruents, particularly plosives like [b,d], may impede the airflow needed to keep the vocal cords vibrating, causing the periodicity to dip down. Because we do not perceive these effects intonationally, the same intonation contour will look rather different depending on the consonants in them. Compare the pitch tracks for "papapa", "dadada" and "mamama", pronounced with the same pitch.

    Play audioAudio example
    Play audioAudio example
  3. Effects of articulation

    The articulation of vowels and consonants may affect the tenseness of the vocal cords, and so interfere with the rate of vocal cords vibration. High vowels like [i,u] are pronounced with the tongue high in the mouth, causing the root of the tongue to pull up the forward part of the larynx to which the vocal cords are attached. For this reason, higher vowels will on average be pronounced with higher vibration rates than lower vowels, like [a]. We don't normally hear this type of difference, and it doesn't usually cause any problems when relating the pitch track to our auditory impression of the intonation.
    Second, voiceless consonants like [p,s] are pronounced with an active opening gesture of the vocal cords, to ensure an open glottis. This gesture causes the vocal cords to be a little tenser than they are for the corresponding voiced consonants, with the result that the rate of vocal cord vibration immediately after a voiceless obstruent is on average a little higher than after voiced consonants. This effect is again cancelled out by the listener, but it can be very noticeable in a pitch track. In the following example, the pitch (in the technical, 'auditory' sense) smoothly goes from "niks" to "anders", but the pitch track shows a raised F0 after the [s] of "niks".

    Play audioAudio example

In general, you must train yourself to recognise intonation contours on an auditory basis. The pitch track will serve various functions during this process. In the beginning it may teach you basic things like whether the pitch goes up or down. Later, you will see how consonants may cause gaps in the tracks, or distort them in ways indicated above. And later still, you may learn to see in more detail how the physical appearance of the same intonation contour may vary according to the segmental structure.

Before we really begin...

We now begin our discussion of the contours of Dutch. Attempt to learn them per contour. As you go along, you should aim at gradually increasing the number of contour types you know and reducing the number of unknown contour types. Only expect to be able to transcribe contours that have been explicitly dealt with, and don't make up your own transcriptions for new contours.

We will structure the course so as to deal with contours with pitch accents that describe a falling pitch, i.e. H*L, and then move on to rising and level contours.

H*L pitch accent

The most neutral way of saying a one-word phrase like "Geschiedenis." is with low "Ge-", high "-schie-" and a fall to low through "-denis". We transcribe this contour by %L, H*L (on the accented syllable), and L%.

Here are a few examples with low initial pitch and one H*L, in a variety of positions, and a variety of peak heights (prominences):

Play audioAudio example
Play audioAudio example
Play audioAudio example

In the next example the first syllable of the utterance is accented:

Play audioAudio example

Pre-nuclear H*L

Frequently, more than one H*L appears in the same phrase.

Here are some examples.

Play audioAudio example
Play audioAudio example

Notice that the way the H*L is realised is different in final position from nonfinal position. While falls will vary in slope, in final position the fall to L is usually steep and in nonfinal position it is usually more gradual, and in fact depends on how long the stretch is between the two accents. In the first example, the stretch "niet altijd op 'n mis" contains six syllables, while in the second the stretch "heeft nog" contains only two.

Here are three more examples. The first two have two accents each and the third has four accents.

Play audioAudio example
Play audioAudio example
Play audioAudio example
Play audioAudio example

Exercise 1A

H*L close to H*L

It may happen that a non-final H*L is so close to the next H*L that it is not very clear whether the non-final H*L actually falls, or whether there is no L between the H*'s. In such cases, the contour may still sound like a sequence of two H*L's, and so we still transcibe H*L H*L L%. In section 1.7 Downstep with spreading we will learn contours with single H*'s. Here is an example with two H*L pitch accents on one word.

Play audioAudio example

Here is an example with two accented words.

Play audioAudio example

Final incomplete fall

In this section, we will distinguish contours with H*L L%, which fall to low pitch, from contours in which the final H*L does not go firmly down to low pitch. Such contours are called 'half-completed falls'. First listen to a H*L L% contour.

Play audioAudio example

It often happens that speakers use a more gradual fall ending at mid pitch. The effect may be that the speaker sounds more tentative, or seems to express that what is said is not terrifically important. Listen to the following example and compare it with the previous one.

Play audioAudio example

This half-completed fall is transcribed H*L %. That is, the final boundary is just marked "%".

Here are some more examples:

Play audioAudio example
Play audioAudio example
Play audioAudio example

In nonfinal IPs, such suspended falls may be used just to indicate non-finality. In the following example, the non-finality of the phrase "waar rook is" is indicated in this way:

Play audioAudio example

Final H% after H*L

In the examples so far, the pitch after the last H*L went down to mid or low at the end of the intonation phrase. It is also possible to pronounce a boundary H%, in which case the last syllable has rising pitch, or is entirely pronounced with high pitch.
Functionally, the occurrence of H*L H% may signal a question, as in the first example, or a reminder, as in the second, or a suggestion, which may have a ring of self-evidence, as in the third.

Play audioAudio example
Play audioAudio example
Play audioAudio example

It is this pattern which is often misanalysed as having an accent on the last word. This is "frequent error #1". Notice that the final syllable of the next utterance is high, but not accented:

Play audioAudio example

An indication of the status of the high pitch is that it remains on the last syllable when the utterance is lengthened with an unaccented word.

Play audioAudio example

Here is another example. The only pitch accent is on "vraag", and the final syllable "doe" has the H%. Again, we can see it is a boundary tone by replacing "doe" with "uitspreek" or "uitgesproken heb" or "uitgesproken heb deze keer", and see how the final high tone always goes to the last syllable, whatever it is.

Play audioAudio example

Often, too, the sequence H*L H% is used to signal nonfinality. In this case, the utterance consists of minimally two IPs, as in our next example. Listen to the whole utterance first.

Audio

Play audioAudio example
Play audioAudio example

Here is another example.

Play audioAudio example

Exercise 1B

Downstepped H*

Often, H* is downstepped in falling contours. This means that the height of the tone is distinctly lower than that of a preceding H-tone.
The difference between contours with a downstepped H* and the same contours without the downstep is that the downstepped contour sounds more as if the speaker is not interested in further discussion of the point she makes in the utterance concerned. It thus has a 'final' ring about it.

Often, there are more than two H*L's in an IP, in which case each H* will be lower than the one before. In this contour, the pitch is low before each downstepped !H*, due to L.

A downstepped H* is symbolised !H*.

Here is an example with three H*L's: the second is downstepped relative to the first, and the third relative to the second.

Play audioAudio example

In the following example, the same contour occurs with only two H*L's, the second of which is downstepped.

Play audioAudio example

Compare with the same utterance in which the same syllables are accented, but without downstep:

Play audioAudio example

In sentences with more than two H*L pitch accents, it may happen that only the pre-nuclear H*L's are downstepped: the last H*L goes all the way back up. Here is an example. (This speaker articulates his words very precisely and makes brief pauses between his words.)

Play audioAudio example

Also, only the last H*L's of a series are downstepped, as in the following example.

Play audioAudio example

Although downstepped contours typically end fully low, they may also be ended by % and H%. Here is an example of H*L !H*L H%.

Play audioAudio example

Another example.

Play audioAudio example

Downstep with spreading

Frequently, downstepped patterns are pronounced somewhat differently.

Listen to the following example.

Play audioAudio example

In this example, the high pitch for each H* is extended until just before the next downstepped H*, causing a terrace-shaped contour, with the beginning of each level of the terrace marking an accented syllable. This is known as spreading of H*. Notice that only non-final H*'s are spread in this type of contour.

Don't expect to see perfectly terraced contours in naturally spoken utterances. Often, they just seem to slither down, even though the auditory impression is that of accents stepping down.

Play audioAudio example

Compare with the same utterance which is also downstepped, but without spreading:

Play audioAudio example

Downstepped contours with spreading can also end with % or H%, as in the following examples:

Play audioAudio example
Play audioAudio example

Here are some more examples, with varying numbers of accents. The first has three accents. Notice that the last one does not fall to fully low pitch, and we therefore transcribe it as a half-completed, downstepped H*L. The second example has two accents, and ends with fully low pitch.

Play audioAudio example
Play audioAudio example

Low pitch on !H*

The pitch of the final downstepped H* is usually very low, and may vary between just above fully low or fully low, without any difference in effect. In all such cases, we transcribe !H*, even though the final accented syllable may be pronounced with fully low pitch.

This is the case in the next example, for instance. The word "huis" has fully low pitch; equivalently, it might have had a fall from mid to low, but all such contours have the same transcription.

Play audioAudio example

Missing the last !H*L

When the last accent falls on the phrase-final word, it is often overlooked and misheard as low, unaccented pitch. This is "frequent error #2". In the following two examples, the final syllables "pot" and "uit" have downstepped !H*L's.

Play audioAudio example
Play audioAudio example

In particular when the accents occur on the last syllable, as in the above examples, the !H*L's are mistaken for lack of accent because they are low-pitched. When we lengthen the utterances, it is in fact easier to hear the accents.

Play audioAudio example
Play audioAudio example

Here are three examples: "tegen", "dier" and "uit" are accented.

Play audioAudio example
Play audioAudio example
Play audioAudio example

Exercise 1C

Pre-nuclear H*LH

The nuclear contour H*L H% has a pre-nuclear version in which the boundary H% no longer functions as a boundary tone, but occurs just before the next pitch accent. Compare the pronunciation of the following example

Play audioAudio example

in which "In Amerika" and "doen ze dat anders" are two IPs, with a pronunciation in which only single IP occurs:

Play audioAudio example

The medial %L has disappeared, but these two contours are really equivalent, except for the absence of the medial IP-boundary in the second example, which causes H% to be timed like the L of H*L: it moves to the right, causing a gradual rise. The L now occurs immediately after H*.

The distance between the accents is larger in the following example, showing the gradual rise quite clearly.

Play audioAudio example

When the first accent is pronounced with H*L, we get a gradual fall, as usual. Notice that the intonation pattern is quite different.

Play audioAudio example

Here is another example of an early pronunciation of the L in a pre-nuclear H*LH.

Play audioAudio example

Exercise 1D

Exercise 1E

Transcription of pauses

It is sometimes thought that pauses always indicate IP boundaries. This is not the case. Speakers frequently interrupt their production in the middle of the IP, pause to think - apparently about what to say next - and then continue as if there had been no break at all. You may also reread our discussion of phrasing in the introduction section of the course.

Examples of hesitation pauses, pauses that do not indicate phrase boundaries, are given in the following two utterances.

In such cases you may use the P symbol, aligned with the beginning of the pause. Pause fillers, like ehhh, are marked E.

Audio En danzien ze eenhelekuddewildezwijneneeh staan  
%LH*L H*L!H*L!H*L!H*LP-E-PL%

Audio Hoeze van diezwijnenaf kunnen komen  
%L H*L H*!H*LPL%

Here is another example. Listen to the whole utterance first:

Play audio

This utterance has a hesitation pause during which the speaker takes a breath:

Play audioAudio example
Play audioAudio example

Exercise 2

Low preheads

The part of the utterance before the first accent, known as the prehead, is usually low in pitch, but may also be falling from mid to low if the prehead is particularly long. These preheads are transcribed %L.

Play audioAudio example
Play audioAudio example

When the first syllable of the IP is accented, there is no space for the low prehead, although frequently the pitch can be seen to rise in the first half of the accented syllable, when pronounced with H*(L). Even when this rising section is not present, which typically happens when the syllable begins with a voiceless consonant, the syllable sounds just like an accented syllable that is preceded by a low prehead which is physically there. We therefore transcribe them the same, with %L, which could thus be seen as the 'neutral' way to begin an IP.

Play audioAudio example
Play audioAudio example

High preheads

The prehead may also be high-pitched. This type of prehead sounds quite different from the low prehead. It is usually high level, although it may also be slightly falling. This prehead is transcribed %H. It may sometimes be difficult to distinguish H% from an accent H*.

Play audioAudio example
Play audioAudio example

Downstep after %H

Initial %H may be followed by !H*. The following example illustrates the way the title of a story may be read by someone reading a story to a group of children.

Play audioAudio example
Play audioAudio example

Falling preheads

A falling prehead is much rarer. It usually only occurs in a particular reading style, and makes the utterance sound lively. It is transcribed %HL.

%HL occurs in the following examples. They start with a fall and the accented syllable has H*L:

Play audioAudio example
Play audioAudio example

Exercise 3A

Exercise 3B

Types of level contours

The most common nuclear level contour is simply referred to as the level tone, and transcribed H* %. In addition to this high or mid level tone, we discuss two less frequent tone, the vocative chant, H* !H %, which consists of two or more level pitches, and the low level ('scathing', L* %). In section 5.5 we continue the discussion of level tones with the prenuclear uses of H* and L*.

Level tone

Many utterance-final pitch accents are variants of H*L. However, there also a number of non-falling final pitch accents.

One of these consists of a level tone. A level pitch accent is transcribed H*, followed by % if IP-final.

Physically, these level tones are not purely monotonous: the speaker may waver somewhat, but will avoid the impression of either a fall or a rise. Here is an example of a H* on a non-final syllable, followed by %.

Play audioAudio example

These level stretches after an accented syllable can be very short. In the following utterance, we have the H* % contour twice, once on a final syllable and once on a penultimate syllable. The utterance is closed by %L H* !H*L L%.

Play audioAudio example
Play audioAudio example
Play audioAudio example

Here are two more examples.

Play audioAudio example
Play audioAudio example

This example illustrates the use of H* % to mark non-finality. Notice how the speaker continues with %H, an easy thing to do, as the pitch was high already.

Play audioAudio example

Another example.

Play audioAudio example

Vocative chant

A very characteristic contour is sometimes used to call somebody by name. It consists of a high level and a mid level tone. The H* of course begins on the accented syllable, and the !H, the mid level, is realised on the last syllable, or on a stressed syllable, if there is one between the accented syllable and the last syllable. Often, the accented syllable and the syllable on which the !H begins are lengthened. When the accented syllable is the last syllable, the two levels are both pronounced on that syllable, which as a result is broken into two sections.

Transcription of the vocative chant (also "chanted call") is H* !H %. The pitch accent H* !H is single pitch accent, even though the !H may occur a few syllables after H*.

Play audioAudio example

The lengthening does not occur when the level pitch of H* is spread over more than two syllables ("loenen te"):

Play audioAudio example

No lengthening is used when the contour is used to mark continuation, which sometimes occurs in instructions which the speaker wishes to sound unproblematic.

Example:

Play audioAudio example

The vocative chant is also used on utterances with an early accent, in which case each of the following unaccented words is likely to have a pitch level. These levels descend in the same way that series of downstepped accents descend. Here is an example.

Play audioAudio example

As is clear from these examples, the last pitch level has mid pitch, rather than low pitch. For this reason, we transcribe the right-hand boundary as toneless: %. Vocative chants with L% and H% do exist, but are rare. The one with L% has a fully low-pitched final pitch level, and may make the speaker sound exasperated. Here is an example.

Play audioAudio example

The vocative chant with H% is like H*!H % with an added rise in the last part of the last pitch level. It may give an impression of wheedling. Here is an example.

Play audioAudio example

'Scathing' intonation

Nuclear low level pitch is sometimes used by speakers when repeating a statement made by the listener. This contour is used to indicate that the statement is not to be taken seriously, and hence may be termed 'scathing' intonation. If there are prenuclear accents, these too are low pitched, and if there are three accents, the transcription would be L* L* L* %. Understandably, the accents in this type of contour are very hard to distinguish, since the entire contour is low pitched, but since it is always a word-by-word repetition of another utterance, the accent positions are given by that model utterance. Currently, we have no example of this contour.

Exercise 4A

Exercise 4B

Types of rises

We distinguish two classes of rising pitch accents, based on the pitch of the beginning of the rise in the accented syllable. On this basis, we distinguish high rises and low rises.

Low rises come in a number of kinds. The high rise is introduced in the next section.

Example:

Play audioAudio example

High rise

Auditorily, an IP-final high rise begins at mid pitch, which continues until a rise at the IP-boundary. Usually, the mid pitch is reached late in the accented syllable, while the beginning is low, as is commonly the case for H*. It is mostly used in questions.

We transcribe: H* H%.

Play audioAudio example
Play audioAudio example
Play audioAudio example

High rises are frequently used utterance-internally, to indicate that another IP follows.
The following example has an accent on "praten". Listen to the whole utterance first.

Audio

Play audioAudio example
Play audioAudio example

In the next example, the first phrase is realised with H*H%, followed by a single H* in the second phrase, i.e., a level tone:

Play audioAudio example
Play audioAudio example

The following utterance illustrates a high rise in nonfinal position in IP, realised on a monosyllable.

Play audioAudio example

Low rises

Low rises begin with L*. There are three subtypes: low rise with H%, low rise without H%, and the low low rise. The first two differ only at the end at the IP, where the low rise with H% has an extra rise: L*H % vs L*H H%. These two subtypes will be compared with the level tone (H* %) and the high rise (H* H%). Distinguishing among these four contours is sometimes difficult, in particular when the accented syllable is IP-final. Remember that one way of determining contours is to add unaccented syllables to the final syllable, something people can do subvocally.

After an exercise dealing with the four contours mentioned above, the course continues with the pre-nuclear occurrence of the low rise, and with the low low rise, the third subtype. This type, transcribed L* H%, begins low and stays low, until the end of the IP, where a rise occurs.

Low rises with H%

If the accented syllable with a low rise is IP-internal, its pitch is low throughout. An IP-final accented syllable with a low rise has rising pitch.

The first type of low rise rises to mid pitch (in the next syllable, if there is one), and is followed by H%, a further rise at the end of the IP.

We transcribe: L*H H%

Play audioAudio example
Play audioAudio example

The next example illustrates an IP with a high rise, followed by an IP with L*H H%. Listen to the whole utterance first.

Audio

Play audioAudio example
Play audioAudio example

And here is an example of L*H H% preceded by pre-nuclear H*L.

Play audioAudio example

Low rises without H%

The L*H pitch accent also occurs without H%.This contour, L*H %, is known as the half-completed rise.

Play audioAudio example

The L*H % contour is distinct from the level tone (H* %), the high rise (H* H%) and the low rise with H% (L*H H%). Like H* %, it may suggest continuation, and like L*H H%, it has a fully low syllable. Compare the previous example with the following three examples.

Play audioAudio example
Play audioAudio example
Play audioAudio example

The L*H % is one of the many ways in which speakers indicate that another IP follows in the same utterance (continuation).

Play audioAudio example
Play audioAudio example

Exercise 5A

Pre-nuclear L*H

L*H also occurs non-finally in the IP. The accented syllable is low, and the climb to H may be faster or slower, depending on how far away the next accent is. We transcribe L*H.

The first example shows a non-final L*H with a slow rise followed by a final L*H H%.

Play audioAudio example

Before H*(L), the same pre-final pitch pattern can be used: the accented syllable has L*, then the rise follows to H*. We transcribe this pre-nuclear rise as L*H. The presence of the H after L* describes the rising pitch up to the following H*. In some pronunciations of this contour, there is a step-up from the end of the pre-H* rise to the target of H*, which is the realisation of H before H*.

The prenuclear L*H contrasts with L*, which describes a low accent, followed by level low pitch until the next accent. This low level prenuclear accent is discussed in section 5.5.2.

We show this contour in the following examples. Those who are familiar with the television programme "Glamourland" may recognise this contour as one that Gert-Jan Drge, its presenter, frequently uses. The falling prehead, introduced in section 3.3, is typical of this contour.

Play audioAudio example
Play audioAudio example

In the following example, the prehead is low.

Play audioAudio example

As said above, the H after L* can sometimes be heard separately. In the next example, we can hear high pitch on "niet" (the target of H) lower pitch on "te", which lies between the two high targets, and high pitch on "voer" again (H*).

Play audioAudio example

Downstep after L*H

Pre-nuclear L*H may also appear before a downstepped !H*. Again, the rise may be faster or slower, depending on the distance between the accents. In this contour, the H of L*H is higher than the immediately following !H*, which has the usual mid or low pitch, usually followed by L%.

Play audioAudio example

Compare !H*L with H*L in this context, as in one of our previous examples, repeated here, where the pitch on "voer" is fully high.

Play audioAudio example

Low low rise

The last type of low rise is one which continues the low pitch of the accented syllable until the end of the IP, where H% is used. We accordingly transcribe L* H%. Frequently, the pitch before the accented syllable is high, usually due to %H.

Play audioAudio example
Play audioAudio example

Even though there is no obvious pitch movement in or immediately after the syllable "loe", this syllable is nevertheless accented, and the pitch movement at the end is a boundary tone. It is of course possible to say the same sentence with a rising contour, say L*H H%, on the final word (and syllable) "veel", with no accent on "loe". But this contour, shown in the next example, sounds different.

Play audioAudio example

Some rises as markers of non-final IPs

We have come across quite a number of ways in which speakers can indicate that another IP follows. They are illustrated in the following examples, in part repeated from earlier sections. Notice that the IP-boundary need not have a pause, and that high pitch at the end of the first IP often spills over into the next.

Play audioAudio example
Play audioAudio example
Play audioAudio example
Play audioAudio example

Exercise 5B part 1

Pre-nuclear singleton H* and L*

In this section, we will look at contours in which the pre-nuclear accent may not stand out much from its environment, because the pitch continues unchanged from the accented syllable onwards: singleton H* and singleton L*.

Singleton H*

We have already come across singleton pre-nuclear H* in contours with downstep-plus-spreading. Singleton pre-nuclear H* can also appear in other contours. Here is an example of H* before a non-downstepped H*L.

Play audioAudio example
Play audioAudio example

Now compare this non-downstepped H* H*L contour with a contour that has downstepping and spreading, i.e., H* !H*L. Observe that the pitch fall for the final accent is earlier in the case of the downstepped contour. Both contours are informally known as the 'flat hat', but notice the difference between them.

Play audioAudio example
Play audioAudio example

In H* H*L contours, the second H* may be considerably higher than the first: after a mid level plateau, the pitch goes up to the final H*, and then falls. We consider the difference between this contour and one without a raised final peak to be a pitch range difference, and do not transcribe them differently. Compare the following two realisations of the same contour:

Play audioAudio example
Play audioAudio example

These two pronunciations of the same contour may be compared with H* !H*L, where the second accent has a downstepped !H*L.

Play audioAudio example

Here is an example of a 'flat hat' with a late, raised fall in combination with a H%.

Play audioAudio example

Singleton H* also occurs before H* % and H* H%. In such cases the pre-final H* may be difficult to hear (and you might argue it is not there), unless the level pitch for the second H* is a little lower than the level pitch of the preceding H*. Here is an example.

Play audioAudio example

In the next example, the IP is immediately followed by a %L H* !H*L %L contour.

Play audioAudio example

Singleton L*

Pre-nuclear accented syllable can be low, after which the pitch continues low to a following (nuclear) accented syllable with low pitch. In spite of the low salience of the pitch, the impression of accent can be unmistakable.

Play audioAudio example

In the next example, a singleton L* can be heard on "kijken", and some may hear one on "in" as well. Notice that the contour would not be much different if we left the L*'s out and considered "kijken" and "in" unaccented.

Play audioAudio example

In other cases, the low pitched stretch may sound as if there are no accents. Not surprisingly, you may be in doubt if a L* is really there. In the following example, we think there is no L* on "Marietje", but you may think there is. In such cases, just make decision, and go on to the next item.

Play audioAudio example

Exercise 5C

Prefix L* or "delay"

In a particular style of speech, which is sometimes used by speakers addressing children, H*L may be prefixed with L*. This means that the accented syllable has low pitch, which is then quickly followed by the pitch peak of H*L. Because together, L* and H*L mark only a single accented syllable, we transcribe L*HL. Of course, when IP-final, this pitch accent may be followed by %, H% or L%, depending of the pitch of the final boundary.

Examples:

Play audioAudio example
Play audioAudio example
Play audioAudio example

Exercise 6

Appended constructions

Sentences may be followed by unaccented words that do not belong to the sentence proper. There are two kinds of such appended constructions. In one case, the words are simply included in the IP, and there is no pause between them and the last word of the sentence proper. They are treated in section 7.1, where they are referred to as 'tagged constructions'. In a second type, the appended construction forms a separate IP, even though it has no accent. In this case, there is a clear prosodic break separating it from the preceding word. They are treated in section 7.2, as Accentless IPs.

Tagged constructions

The classic tag is a short word appended to a sentence to appeal to the listener in various ways. They include hè, hoor, nietwaar, toch and zeg. Their meaning broadly corresponds to English tags like is it, wasn't he, etc. Unlike these English tags, the Dutch tags are unaccented, and included in the preceding IP. They are intonationally idiomatic, and cannot therefore freely occur with all contours. For instance, the tag always occurs with H*L H% (or its delayed form L*HL H%) on the preceding accented word, as in the following example.

Play audioAudio example

The other tags, too, may show a bias towards H*L H%. However, here is an example with H*L L% and nietwaar.

Play audioAudio example

Vocatives are similarly tagged onto the sentence, and kept in the same IP. A combination of a word and a vocative, such as (Wil je) koffie, Dik? therefore sounds the same as a compound, Koffiedik in this case. The vocative need not be a name, but could be any description of the addressee used as a vocative, including expletives. Here are some examples.

Play audioAudio example
Play audioAudio example
Play audioAudio example

An entire clause may be appended in this way, by way of cohesion marker. In the following example, the clause als je begrijpt wat ik bedoel is similar in meaning to of course.

Play audioAudio example

Referents of pronouns may be made explicit as unaccented, appended items, as in the next example.

Play audioAudio example

Various approximative items, like ongeveer, of zoiets, en dergelijke are similarly treated.

Play audioAudio example

These unaccented tags differ from accented words that appear in similar positions, but represent different constructions, as shown by the following examples.

Play audioAudio example
Play audioAudio example

Exercise 7A

Unaccented IPs

In some utterances with more than one IP, there may be a non-initial IP which has no accent. Such unaccented IPs may consist of reporting clauses ("zei Jan"), or of a rewording of the contents of a preceding IP.

Here are some examples. The first has a rewording of "wij" as "wij tweeën", the second and third have a reporting clause. Unaccented IPs usually have the same post-accentual tones as the IP they are attached to. For example, if the preceding IP ends in H*L L%, the unaccented IP will have L L%, and similarly H*L H% will be followed by L H%. To indicate the close connection between the unaccented IP and the preceding IP we do not transcribe an initial %.

Play audioAudio example
Play audioAudio example

In the next two examples, the unaccented IP is not utterance-final; the first IP has a half-completed fall in the first example, and a fully low %L in the second one.

Play audioAudio example
Play audioAudio example
Play audioAudio example

Exercise 7B

Exercise 8A

Exercise 8B

Exercise 8C