When m is m

A common OCR induced error is to identify in as m, as was the case in the original OCR text in Brougham_Carnot_Defense_ER_25_1815, line 347:

who would have preferred death to any place m Robespierre's Committee,—and, for

It would be nice to think that we could autocorrect all such examples of an isolated, lower case m to in. However, there are real examples, such as this one taken from
Barrow_Humboldt_American_Researches_QR_15_30_1816 describing the sounds and letters of Portuguese (or Portugueze as Barrow spells it), line 319:

alphabet, is, in ours, equivalent to sh, and that of m to ng ; so that,

Or this example, on a humming sound emanating from a waterfall as described in Southey_Lewis-Clarke_American_Travels_QR_12_24_1815, line 1728:

upon the letter m; for otherwise Timm looks as little like the

Full simplistic autocorrection is not the answer here…

Rather we need to consider context. However, to build up a library of suitable examples is beyond the scope of our current work, though this project does contribute to a library of curated contextual examples of the use of an isolated m in texts.

Even then, it is questionable how accurate such an automatic correction system could ever be given the range of possible conditions in which an isolated m could be valid. Therefore, a semi-automated approach in which the change is highlighted for manual review rather than made automatically, may well be preferable.

This approach is still beneficial, for the dumb computer will relentlessly identify all isolated ms in a text, whereas the intelligent human reviewer may well ‘autocorrect’  the m into in, for example, when reading the text because that is what they expect to see, and so miss the need to correct the mistake.

Leave a Reply

Your email address will not be published. Required fields are marked *