69. How Computers Get Grammar Wrong (2)

.

Confusing Computer

Grammar-checking software sometimes suggests correct grammar is wrong, especially passive verbs and acceptable uses of normally-wrong structures

THE FAILINGS OF COMPUTERISED GRAMMAR CHECKING

Most people who write with a computer word processor are likely to have used a grammar-checking facility to find and correct their grammar mistakes. This is generally a useful thing to do because awareness of the mistakes we make can be the first step towards eradicating them (see 202. Some Strategies for Successful English Learning).

However, grammar-checkers are not as reliable as spellchecking software. They often miss grammar mistakes altogether (see 138. Test your Command of Grammar 1), or suggest good grammar is bad. These weaknesses can have some fairly serious consequences for writers. A program’s failure to highlight an erroneous structure can reinforce the writer’s belief that it is possible or correct. Unjustified highlighting of correct structures leads many writers who cannot see why their grammar has been questioned to accept the computer’s incorrect alternative, quite reasonably trusting that their own poor understanding of grammar is the cause of their confusion. Over time, this can create uncertainty where previously there was none, thus undermining confidence.

My belief is that the way to enjoy the benefits of computerised grammar checking whilst minimising its weaknesses is through developing a more critical (but not totally negative) attitude to the grammatical suggestions, and that this can be achieved by means of extensive practice in analysing questioned wording and suggested alterations. In this post, I wish to analyse a number of examples of computer software wrongly questioning English grammar usage, and to seek some general features in them that might explain the computer struggles and also help writers to recognise similar examples as deserving of scepticism. For further examples, see 68. How Computers Get Grammar Wrong1 and 275. How Computers Get Grammar Wrong 3.

.

COMPUTER CRITICISMS OF PASSIVE VERBS

Sometimes sentences are underlined because they are quite long and have their main verb in the passive voice. The advice is usually to make the verb active. The “problem” that the programmers see here is not so much one of grammar as of style: the criticised passive verbs usually break no grammar rules but are perceived instead to be over-complicated, “clumsy” or “unnatural” (see 100. What is a Grammar Error? and 142. Grammar Errors with Passive Verbs).

This advice is questionable because it is based on opinion rather than scientific fact. I have argued elsewhere (see 27. How to Avoid Passive Verbs) that the very existence of the passive voice in English is taken by linguistics experts to be evidence of its value. Much work has been done to establish exactly what this value is (see, for example, my own article within these pages entitled Active-Passive Paraphrases in English and What They Mean for Teaching). For a specific example of a desirable passive, see 265. Grammar Tools for Better Writing, #5.

It is true that the active voice is more common than the passive in English, but the mere fact that a computer word processor has underlined a passive voice verb does not necessarily mean that the verb should be replaced by an active.

.

STRUCTURES THAT ARE USUALLY WRONG BUT SOMETIMES RIGHT

Computers seem to have been programmed with a list of English grammar combinations that they must highlight as wrong. The problem is that a surprisingly large number of usually-wrong grammar combinations can become correct in particular circumstances (see 124. Structures with a Double Meaning 1), a possibility that the programmers seem not to have given enough recognition to.

An example of a usually-wrong combination is the attachment of BE…-ing (normally a marker of the present continuous tense in English) to a verb like KNOW, SEEM or OWN, which cannot usually be in this tense. Here is a case where this combination is definitely wrong:

(a) *Even the youngest of the children is knowing complicated algebra.

The underlined verb should, of course, have the present simple tense form knows, not the present continuous.

Yet combinations like is knowing are of a kind that can become correct in the right circumstances:

(b) The key to sounding formal in writing is knowing which words to avoid.

This is a perfectly acceptable sentence, with the message “the key = knowing which…”. In grammatical terms, is is not an auxiliary verb combining with a participle to make the present continuous tense of KNOW, but the verb BE by itself (meaning “equals”) combining with a noun-like use of -ing (i.e. a “gerund” rather than a “participle” – see 71. Gerund and Participle Uses of “-ing”), making it into a “complement” (see 220. Features of Complements, #3). Most verbs are usable in this alternative way after BE. And knowing in particular can also be an adjective rather than participle – another possibility after BE (see 254. Tricky Word Contrasts 10, #5)

My word processor, however, does not recognise such alternatives, instead highlighting is knowing and suggesting that I “correct” it to knows. It seems to have been programmed to make no exceptions in singling out instances of BE -ing around verbs like KNOW, ignoring the alternative ways of using BE, despite the fact that small words with alternative uses are common in English (see 3. Multi-Use Words).

Various other English grammar structures are likewise questioned by my computer when in fact they are right but are following a rather unusual grammar rule. Examples are:

(c) Not all active verbs with no following noun are a problem.

(d) Both words can act as conjunctions, which means having a following subject + verb.

(e) Also useful to know is the fact that share prices can fall.

My computer’s suggested correction of (c) was to change no into a, on the grounds that only one negative (not) is necessary. The common error that the computer thought was present was the use of two negative words to express the meaning of just one, like this:

(f) *The refugees did not have no money (= The refugees had no money).

Such sentences are correct in some non-standard English dialects but rarely acceptable in writing. The indicators of this non-standard usage are not and no used together. The computer did not seem to “know” that they occasionally go correctly together in formal writing, cancelling each other out in sentences like (c) (see 9. Double Negatives). The way a human writer knows when not … no is correct is by thinking about the resultant meaning – something that computers seem very weak at.

In sentence (d), the computer wanted the singular-showing -s of the verb means to be removed, on the grounds that its subject which had been given plural meaning by the noun immediately before it (conjunctions). The reality is that which does not have this meaning, but rather stands for the whole of the preceding statement can act as conjunctions, a singular idea. The computer seemed not to have been programmed to bear in mind this less common possibility of a whole statement before which determining its meaning rather than a single word (see 200. Special Uses of Relative Clauses).

In (e), the computer called for a comma after the starting also. The “rule” that it appeared to be following was that starting adverbs need a following comma – a rule that certainly does apply often but has exceptions. The rule is more accurately that starting adverbs relating to a whole sentence rather than to any particular part of one (usually) need a following comma (see 121. Sentence-Spanning Adverbs). Although also does very often relate to a whole sentence (functioning as a “connector”: see 40. Conjunctions versus Connectors), in (e) it just relates to useful to know.

In sentence (e), also is not being used as a “sentence” adverb. Instead, it is closely linked to the word after it, the adjective useful. It is rare for an adjective to be placed near the start of a sentence before the subject of the verb, but very possible when it has the role of complement (cf. is), the purpose being to highlight its meaning as the main message of the sentence (see 220. Features of Complements, #6). Also cannot have a following comma when used like this.

.

SPELLINGS THAT USUALLY MEAN ONE THING BUT SOMETIMES MEAN SOMETHING ELSE

English spellings representing two or more different words are examined in detail in this blog in 11. Homonyms and Homographs. I suspect that, when one of the words is rare or unconventional, they can be as problematic for grammar-checking programs as they are for English learners reading. Consider the following extract, which my computer wrongly said contained a “fragment” (“fragments” tend to be mentioned a lot in mistaken criticisms by computers):

(g) Each new birth confronted my parents with a major childcare problem, since my father never seemed able to take time off from his very demanding work. Leo’s arrival, some months after our change of address, particularly sticks in my memory.

The computer thought something vital was absent from the second of the two sentences here. Typically, no indication was given of what exactly this vital element was, but knowing that it usually turns out to be either the subject or the verb of the sentence, I followed the useful first step of looking carefully at the main verb of the sentence (sticks), and was soon able to hypothesise that the computer analysed this word as a noun, since it often is one and is slightly informal as a verb. This meant the sentence seemed to lack a main verb, and hence had to be labelled a “fragment”.

A different kind of alternative spelling is involved in the following:

(h) Schools start the new year in September.

The common error that the computer thought it recognised here was the use of new year without starting capital letters. It seemed unaware that this phrase can sometimes have small letters, and that the choice once again depends on meaning: if the new year in question is the calendar event of 1st January, or its celebration, then capitals are normal; but otherwise they are not. For more about this variability in the use of some capital letters, see 62. Choices with Capital Letters.

68. How Computers Get Grammar Wrong (1)

.

BadComp

Grammar-checking software sometimes suggests correct grammar is wrong, especially when interrupted structures or ellipsis are involved

THE INFLUENCE OF COMPUTERISED GRAMMAR CHECKING

Most people who write with a computer word processor are likely to have used a grammar-checking facility to find and correct their grammar mistakes. This is generally a useful thing to do because awareness of the mistakes we make can be the first step towards eradicating them (see 202. Some Strategies for Successful English Learning).

However, grammar-checkers are not as reliable as spellchecking software. They often miss grammar mistakes altogether (see 138. Test your Command of Grammar), or suggest good grammar is bad. These weaknesses can have some fairly serious consequences for writers. A program’s failure to highlight an erroneous structure can reinforce the writer’s belief that it is possible or correct. Unjustified highlighting of correct structures leads many writers who cannot see why their grammar has been questioned to accept the computer’s incorrect alternative, quite reasonably trusting that their own poor understanding of grammar is the cause of their confusion. Over time, this can create uncertainty where previously there was none, thus undermining confidence.

My belief is that the way to enjoy the benefits of computerised grammar checking whilst minimising its weaknesses is through developing a more critical (but not totally negative) attitude to the grammatical suggestions, and that this can be achieved by means of extensive practice in analysing questioned wording and suggested alterations. In this post, I wish to analyse a number of examples of computer software wrongly questioning English grammar usage, and to seek some general features in them that might explain the computer struggles and also help writers to recognise similar examples as deserving of scepticism. For further examples, see 69. How Computers Get Grammar Wrong 2 and 275. How Computers Get Grammar Wrong 3.

.

TWO IMPORTANT REASONS FOR INCORRECT GRAMMAR CRITICISM

There seem to be a variety of reasons why computer word processors incorrectly criticise written grammar. Two that may be especially important involve interrupted structures and ellipsis.

.

1. Interrupted Structures

Interrupted structures (my own term – technically they are called “discontinuity”) are the focus of the Guinlist post 2. Interrupted Structures. They are partner words split by other words that do not belong to the partnership. For example, in the phrase the industrialization/ food security conundrum in China, the word the is a partner of conundrum, and not of any other noun in the phrase, the reason being the English grammar rule that the before two or more nouns placed directly next to each other “goes with” the last of them (see 38. Nouns Used Like Adjectives). Hence, the “structure” the … conundrum is “interrupted” by the words industrialization/ food security.

This particular interrupted structure is one of many that are not prone to underlining by my word processor. However, for some reason, certain others are. Consider this:

(a) Learner motivation may occur because of the possibility mentioned above that learners can enjoy reading aloud.

The word sometimes given coloured underlining here is learners, the proposed “correction” being to remove –s. The explanation is that “a noun and the words that modify that noun must agree in number”. This means the computer thinks that is partnering (“modifying”) the noun learners – that it is the singular of adjectival those – and hence that learners should mirror its singular form.

The true partner of that is the earlier noun possibility. Not all nouns can be followed by a phrase starting with the conjunction that, but possibility is one that can (see the end of 153. Conjunction Uses of “that”). That… is needed here rather than of + -ing because its verb and the main one in the sentence have different subjects (see 181. Expressing Possibility). This is a completely different kind of that from the singular of those, even being pronounced differently (with /ә/ instead of /æ/). Combining with a preceding noun is just one of its various uses.

The interrupting words here – the probable cause of the computer’s error – are the participle phrase mentioned above. They too are combining with possibility – they are one of various types of grammatical construction that can make a noun into a longer “noun phrase” (see 252. Descriptive Wording after Nouns 1). They seem to have made the computer program “forget” possibility by the time it comes to that, so that the link with learners is made instead.

Other examples of interrupted structures confusing the computer are:

(b) Catholic worshippers in those pre-Vatican II days had to answer in Latin.

(c) Prepositions have to be used in sentences with a noun, often called their “object”, that is usually positioned straight after them.

In (b), the computer wants those to be that because Vatican II is singular. It appears to miss the fact that Vatican II is a noun used like an adjective and hence not the one that that goes with (that/those goes with the last noun in a group – days – just like the). Perhaps the presence of the Roman numeral II is a factor in this computer error, but the interruptedness of the structure may contribute too.

In (c), the computer points out that the relative pronoun that cannot have a comma before it. This is generally true (see 34. Relative Pronouns and Commas), but not here because of the interrupting words often called their object. These words form a parenthesis of the kind that needs to be surrounded by commas, separating that from the word it really partners (noun not object). For other examples of a parenthesis making a comma necessary where one would not normally go, see 50. Right and Wrong Comma Places.

.

2. Ellipsis

Ellipsis is leaving “obvious” words unsaid without breaking any grammar rule. It is considered in some detail within this blog in the post 36. Words Left Out to Avoid Repetition. It appears to give problems to computers just as it can to readers of English. The following ellipsis-containing sentences were all wrongly questioned by the word processor on my computer:

(d) One of these housed the chapel, another a library.

(e) Such an argument is of course highly subjective and thus open to dispute.

(f) Initially, “sweetshop” was a pile of sweets that my father would sometimes buy as a treat and invite each of us to choose from in turn.

The suggested correction of sentence (d) was to remove either another or a. The computer failed to recognise the unmentioned repetition of the verb housed between them, of which another (the pronoun) is the subject and a library is the object. It thought that the writer was trying to use both another (the adjective/determiner) and a with the same noun library – a grammatically impossible combination (see 110. Nouns without “the” or “a”).

The advice for sentence (e) was to add -s to open. The computer thought open was a verb when in fact it is here an adjective – a kind of word that cannot have -s. Most verbs cannot also be used as an adjective, but a significant number can (see 66. Types of Passive Verb Meaning).

The cause of the computer’s error, assuming it “knew” open can be an adjective as well as a verb, was a failure to recognise ellipsis of is, the clue to open’s adjective status. Human readers would know from the way and is used here that is before it must be understood to apply after it as well (see 192. When BE can be Omitted). They would then recognise from the lack of -ing or -ed on open (endings that any verb must have after is) that it was an adjective. This sort of impact of conjunctions often seems lost on computers.

In sentence (f), the computer also wanted -s to be added. Here, although the word in question, invite, is a third-person singular verb, ellipsis explains why it still cannot have -s. The ellipted word is would, a “modal” verb that needs a verb without -s after it (see 148. Infinitive Verbs without “to”). It is recognizable from the earlier verb would … buy, the link again being the conjunction and.

Interrupted structures and ellipsis appear to be two especially important reasons why word processors wrongly analyse grammar.  For some others, see 69. How Computers Get Grammar Wrong (2).

67. Numbers in Spoken English

.

Numspeak

Spoken numbers often differ from written ones and are frequently said incorrectly as a result

THE PROBLEM OF SPOKEN NUMBERS

Numbers are fairly common in academic and professional communication, for example in dates, percentages and statistics. They are not usually a problem in writing (though see percent in 201. Words with Complicated Grammar), whereas in speaking and reading aloud there are some characteristic errors that people with a different mother tongue from English often make. The reason in most cases is that full information about saying the number is absent when the number is written in numerical form instead of words.

In this post I wish to set out a number of differences between written and spoken numbers in English so as to highlight the speaking errors that they cause. For information about other common speaking errors, see 91. Pronunciation in Reading Aloud and 243. Pronunciation Secrets.

.

PROBLEMS IN SAYING AND WRITING ENGLISH NUMBERS

1. Alternatives to “zero”

English does not always represent the idea of “nothing” with the word zero. Other possibilities are nought, nothing, none, oh, nil, love and a duck. The choice usually depends on the kind of English or the kind of “nothing” involved. American English would probably use zero for all of the following preferences in British English:
.
ADJECTIVE USE (TEMPERATURE): zero or nought (zero/nought degrees Celsius).
ADJECTIVE USE WITH OTHER NOUNS: zero (Their efforts had zero success/Zero marks were given).
NOUN USE (QUANTITIES): zero or nought or nothing (The answer is zero/nought/nothing).
THE NAME OF THE SYMBOL: a nought or a zero (Write a nought/a zero).
A DIGIT IN A LARGER NUMBER: oh (602= six-oh-two) or (increasingly) zero
A SCORE IN FOOTBALL, RUGBY & HOCKEY: nil.
A SCORE IN CRICKET: nought or (occasionally) zero or (informally) a duck.
.
In tennis both American and British English use the word love except in tie-breaks, where zero is preferred.

.

2. Differences between Spoken and Written Dates

Dates – common in newspapers and history descriptions (see 282. Features of History Writing, #15) – are quite often different in speech and writing, and there are also differences between American and British practice.

In American English, the month is given first. Writers may use its name (often abbreviated) or just its number, the punctuation differing in each case, e.g. Dec(ember) 10, 2013 (note the comma) versus 12-10-2013. The rest of written dates is usually just numbers. In American speech, the name of the month is usually preferred to the number, and is combined with number words for the day and year: December ten, twenty thirteen”. Sometimes the day number has (the) -st, -nd, -rd or -th, e.g. (the) tenth instead of ten.

British English dates begin with the day number. In writing, the subsequent month name will be either another number, separated by a slash, e.g. 10/12/2013, or a word without any punctuation, e.g. 10(th) Dec(ember) 2013. In speaking, it is customary to put the before the starting day number, -th or similar after it, and then of before the names of the month and year (e.g. “the tenth of December twenty thirteen”).

These differences between speaking and writing are maintained even during reading aloud. In other words, if you are reading aloud and you encounter a date, you should ignore the way it is written and say it according to the appropriate rule for spoken English dates (see 91. Pronunciation in Reading Aloud).

.

3. Grammar of “dozen”, “hundred”, “thousand” and “million”

The main grammatical questions regarding these words are their uses with a, one and -s. In many situations, a is preferred to one, like this:

(a) There were at least a dozen/a hundred/a thousand/a million people present.

One is used instead of a in the following situations:

i.   For emphasis, for example to mean “one not two” (see 263. Uses of “One” and “Ones”)
.
ii.   For stylistic formality (see 46. How to Avoid”I”, “We” and “You”), for example in laboratory instructions:
.
(b) Add one gram of powder.
.
iii. Inside numbers beginning thousand or higher. For example, in 5132 it would be normal to say “one (not a) hundred”, and 3,125,109 would be three million, one hundred and twenty-five thousand, one hundred and nine.
.
iv. At the start of numbers beginning thousand or higher, provided the next digit is not zero. Thus, 1101 would usually be one thousand, one hundred and one, whereas 1001 could start with either a or one.
.
The plural of thousand etc. sometimes has -s and sometimes does not. The choice depends on whether or not there is another number (or a) in front.  If there is (e.g. “two dozen” or “five thousand”) -s is absent; but otherwise -s must be used, along with a following of (e.g. “millions of people”).

.

4. Use of “and” in Spoken Numbers

Spoken numbers in American English do not have to have and, but in British English they mostly do when they are greater than 100. Thus, American speakers can say a hundred twenty two, but British speakers would add and after hundred. The British English rule is that and is usually necessary after every mention of the word hundred. But note:
.
(i) It is dropped when the next two digits are 00 (e.g. 6, 200 = “six thousand, two hundred”).
.
(ii) It is said after “thousand” when the last three digits of the number are less than 100 (e.g. 5,026 = “five thousand and twenty-six”). If there are no thousands either (e.g. 5,000, 047), and will follow million (“five million and forty-seven”).
.
Here are some more numbers to practise with (answers in brackets):
.
1. 7,916  (Seven thousand, nine hundred and sixteen
2. 56,018 (Fifty-six thousand and eighteen)
3. 4,202,881 (Four million, two hundred and two thousand, eight hundred and eighty-one)
4. 1,020,693  (One million, twenty thousand, six hundred and ninety-three)
5. 26,300,805 (Twenty six million, three hundred thousand, eight hundred and five)
6. 100,600 (One hundred thousand, six hundred)
7. 300,403,001 (three hundred million, four hundred and three thousand and one)

.

5. Pronunciation of “and”

When and is spoken, it should be fast and weak (i.e. without stress – see 125. Stress and Emphasis). In other words, the “a” vowel must be pronounced not/æ/ but /ə/, and might even disappear altogether. The final /d/ is often not said either. Possible pronunciations of and are thus /ənd/ or /nd/ or /ən/ or even /n/. For more, see 144. Words that are Often Heard Wrongly.

.

6. Spoken Telephone Numbers

British English speakers say telephone number digits individually – not grouped into pairs. Thus, 0801 849 1765 is said “oh eight oh one, eight four nine, one seven six five”. The spaces/commas indicate major divisions within the number (e.g. between country and area codes), where there should be a pause in speaking. The only numbers that need not be pronounced individually are -00 at the end of the first division. For example, 0800… is likely to be pronounced “oh eight hundred” rather than “oh eight oh oh”.

Quite often today, oh in British telephone numbers is replaced by zero, probably out of a fear that oh might be mistaken as a letter.
.

7. Decimal Numbers

Where some languages use a comma to separate a whole number from a decimal fraction, English uses a full stop, e.g. 39.256, but gives it the special name “decimal point”.

When a decimal number is spoken, the decimal point is indicated by the word “point”. Before it, one can say the number either in the normal way (“thirty nine …”) or digit by digit (“three nine …”). From the decimal point onwards, however, only digit-by-digit enunciation is possible (“… point two five six”). Countable nouns used after a decimal number usually need the plural form, for example “nought/zero point five litres (see 204. Grammatical Agreement, #2).

.

8. Distinguishing between “-teen” and “-ty”

It is a common misconception that the main difference between pairs like 15 and 50 is the /n/ sound at the end of the former. In fact, the preceding vowel is probably the main differentiator, since the pronunciation of “-n” is often weakened.

A common cause of the /n/ sound being unreliable is the influence of a following consonant. For example, before /b/ and /p/ (e.g. 15 people), the “n” of -teen is likely to be pronounced as a barely noticeable /m/ (see 243. Pronunciation Secrets, #1). The reason why English speakers find the preceding /i:/ sound more helpful is that to their ears the difference between it and the weaker /ɪ/ of 50 is very strong and noticeable. For a further important contrast that depends on /i:/ versus /ɪ/, see 144. Words that are Often Heard Wrongly, #8.

.

9. Arithmetical Symbols

The basic symbols + (plus), (minus), x (multiplied by or times), ÷ (divided by), = (equal/s) and / (slash) are the most likely to be encountered. Note that the first syllable of minus is pronounced like my, not me (see the end of 86. The Pronunciation of “e” and “i”, #G). For more about plus, see 236. Tricky Word Contrasts 9, #7. For more about equal/s, see 231. Confusions of Similar Structures 3, #4.

Fractions are describable in two different ways. Say the top number and then either over + the bottom number, or just the bottom number + -ths. Thus, 3/7 is either three over seven or three sevenths. The -th option applies to all fractions except ½ , ¼ and ¾, which have half or quarter(s) instead.