Gathercole, S., & Baddeley, A. (1993). Working memory and language (pp. 1-12). Hillsdale, NJ: Lawrence Erlsbaum.
Working memory, here, is “the short-term memory system, which is involved in the temporary processing and storage of information.” Baddeley’s model is a “resources” model (as opposed to a discrete slots model, or a decay model, or an interference model) of short-term memory.
The working memory model proposed is tripartite. The first component, a central executive, monitors two slave systems: a phonological loop and a visuo-spatial sketchpad. (This latter is little involved in speech perception, so I will ignore it here.)
The central executive is involved in selective control of action, planning, coordination of tasks, possibly consciousness. This executive might be a unitary process, or it might be several cooperative subprocesses. Tasks that require the inhibition of prepotent responses in favor of more novel responses would seem to involve the central executive.
The authors propose that the phonological loop has two processes. The first of these is a passive buffer of sorts, that takes in phonological information from the environment. This buffer is subject to word length effects: the more syllables in a set of target words, the more difficult those targets are to remember, possibly because the ribbon of the loop is too short to capture them all. The buffer is also subject to articulatory suppression effects–when we are prohibited from beginning to subvocally rehearsing, retention suffers.
The second process is an articulatory rehearsal process. This process is subject to phonological similarity effects: when target words share phonological characters, they are more difficult to remember. It is subjct to irrelevant speech effects, too. When non-target words share similar phonology–independent of whether they might share semantic or lexical similarity–they interfer more readily with retention of target words.
Some of my questions
- Do fast talkers routinely test better on working memory than slow talkers do?
- The authors offer up as evidence of capacity limits for the phonological loop that longer words (more syllables and/or more time necessary to articulate) results in lower recall scores. Is word length confounded with corpus frequency? Do the effects remain when controlling for distributional differences? [2b] In visual attention/working memory research, we see that sometimes what’s touted as resource-driven, qualitative effects (e.g. lower memory span for complex objects than for simple objects; as in Alvarez & Cavanaugh, 2004), can be explained in favor of a simple, relatively high fidelity “slot” model, where the object is stored well (complex or not), but where participants are just really bad at making comparisons (Awh, Barton, & Vogel, 2007). I wonder if something similar might occur in the phonological loop?
- “The probability of losing a phonological feature which discriminates the item form other members of the memory set will be greatest when the number of discriminating features is smallest.” Two questions about this:
- If we’re considering counts of features, this would seem to make sense. However, what about a different dimension, like temporal extent of features? Would we expect a length-limited “tape” in the phonological loop to record with high-fidelity features that have relatively short duration, and perhaps to suffer when important information spans time longer than the tape?
- In aggregate, it might be true that items in a set discriminated by n features are less likely to be remembered than items discriminated by n + 1. However, shouldn’t we expect that features enjoy different weighting, and that this cannot be a simple, linear, additive model? For some reason, I’m thinking of the Family Guy use of the utterance “Cool Whip” with the initial consonant in whip oddly aspirated. (Here’s where my ignorance of linguistics starts to show.) In English, aspirated consonants are not generally contrastive (right?), but might we expect sometimes that violations of expectations laid down by the distribution of our experience to be quite marked, and notable even if it’s just a single feature? Similarly, if one lives in a society where post-vocalic /r/ is some marker of group membership, or status, or whatever, might we expect this feature to carry more weight than some other, more culturally neutral, single feature?