Raw text corpus

To supplement our curated corpus of 85 articles drawn from the Edinburgh Review and Quarterly Review,  we have published the raw texts from which the corpus was prepared.

Typically, the OCR process is imperfect, especially on older texts. The contents of this collection provide the uncorrected raw text
to set against the project’s curated corpus; which together can be used to develop and evaluate new programmatic correction
techniques.

The raw text corpus’ DOI is 10.21954/ou.rd.7176377.
The curated corpus’ DOI is 10.21954/ou.rd.6850865.

They may be downloaded from the project’s online data site, and are freely available for reuse on a CC BY-SA 4.0 licence.