Digitisation

Platform and Interface(s)

The BLT19 Project website was built using WordPress and a variety of free plugins. While the platform has its limitations in terms of asset size and management, its flexibility, cheapness and ease of use made it suitable for a project such as this one inspired by the “poor art” of the 1970s – we want to show what can be done with really cheap resources and give as much bang for the buck as possible.

One of the key things we want to do is to show not quite that “the medium is the message” (as Marshall McLuhan famously wrote) but to show in a very clear way how the medium strongly influences the message. This is why we give users more than one way to interact with and download the materials, including an embedded and downloadable PDFs, downloadable single-page images, and, not least, our own words that contextualise and thereby alter the meanings and uses of the original paper copies that circulated every week or every month. IN many cases too we also offer an alternative interface ( Metabotnik, a project funded by the The Netherlands Organisation for Scientific Research), which allows users to zoom and browse a single, high-quality image of the entire periodical run (more on all these below).

By offering different kinds of what are remediated versions of paper periodicals, we wanted to show in a very practical, simple, iterable way, how easy it is to create different stories with different technologies: what stories we are inclined to tell using a single image we can copy and paste into a document along with others we choose is very different from the PDFs of single issues where unless we make a concerted effort, the sequence of pages and images is fixed, and different again from the huge sequence of pages we can see in Metabotnik.

Scanning the periodicals

The periodical images on the BLT19 site were initially scanned by Hollingworth and Moss, a commercial print and digitisation provider. They used non-destructive scanning technology that automatically turns pages and does not require de-binding of volumes. A variety of file formats was produced, including JPEG, TIFF, and PDF. Since 2021, we have invested in a domestic desk-top scanner, a Fujitsi Scansnap SV600, and generated our scans with that. It means we have to spend our own time scanning twice (once to PDF and once for JPEGs) but the costs are much reduced and we can manually adjust files to demonstrate the points we want to make. The hardware came with software which allows the editing of PDFs and JPEGs as well as updated Abbyy FineReader OCR software (see below).

Images on the screen

Obviously, paper texts are very very different from electronic ones. You can’t do the kind of thing in the video so easily with an electronic text – and what would be the equivalent of a fold out?

Paper Periodicals: dynamic, fragile, easy to browse, harder to search, laid out in a hierarchy to try to control our bodies so that we’ll look at some things more than others.

Listen to the sound the paper makes and imagine the pressure of the paper on fingers and hands, its textures and smell (maybe you will now pay attention to the resistance of the computer keyboard and to its clicks – very different from paper). And look how very fragile, how tearable the copies are – at least of some of the cheap periodicals. Electronic copies don’t last for ever either, but unless you hack into sites it’s harder to destroy them (unless they are your own and you forget to save them, which we have done all too often).

What’s even less clear from the screen is the size of the pages of each periodical. Who would guess from the website alone how big the British Workman is compared to other periodicals? Seeing the images online just doesn’t generate the haptic shock of the paper.

The *British Workman* is 41cm tall; the *Swan Lane Gazette* and the *Caterer* are 25cm, while the *Teachers’ Assistant* is just 19cm.

The different sizes of the different periodicals suggest different kinds of use and users: the huge British Workman front-page print begs to be displayed on the wall; the humble Teachers’ Assistant can be put away into a bag after a quick swot up before a lesson, while the other two are designed for longer periods of perusal – for a reader who wants to find out the latest ways of making a profit by titillating customers’ appetites in new ways for example. It’s not really possible to understand these distinctions as quickly in digital reproductions as we do when we have a paper copy in our hands.

The different qualities of paper also suggest to the reader a different attitude to the contents – cheap paper = low status; expensive paper implies the opposite. Many trade periodicals were printed on cheap paper that disintegrates all too easily and quickly. Very often that means that they do not survive. You can see examples in the video above (which contrasts a journals for hairdressers with one directed to railway engineers) but even in the two images below, one of a journal for women homeworkers desperate to make a penny and the other for pharmacists, the difference paper quality makes is very obvious:

It’s not hard to tell even from these electronic images how one is very delicate, frayed at the edges and torn, while the other is still robust after 150 years.

While the paper quality can tell us a lot about how long the information they encode was supposed to last and how wealthy its purchasers were, it is perhaps even important to think about not just the quality of paper but the very general question of what kind of knowledge and what stories paper enables us to tell that other technologies don’t. A its most basic, the stories we tell with our mouths are not the same as what we write. There is a lot of research on this kind of thing and this is not the place to explore it, though we have more material about the topic here.

One contrasting aspect of paper versus digital platforms that we do want to address here though is the differences between how we search for information using paper documents versus when we use electronic ones.

Online it seems very easy to find a word – we are so used to telling our digital assistants or search engines to “find BLT19” and so on. By contrast, it’s a lot of boring hard work to do a word search in paper texts. That said, if we aren’t talking films or recorded sound (music or recording of speaking) but only written words, we seem to prefer (and take in better) long-form narratives in paper format rather than online – that applies to those brought up as “digital natives” as much as those of us who were brought up on analogue media. So how do we search in a paper publication? The most obvious thing to do is to use the Index and the Contents list. These really can help searches – but they aren’t word searches as such: they are based on what the indexer – a real person – thinks is important. There are several indexes and contents lists in the periodicals and it’s interesting to reflect on the choices made by the human beings who constructed them compared to what a computer might do with the same material. They indexes and contents lists may seem simple and not worthy of thought, but actually they result from unspoken stories and are intended to create more stories by highlighting certain elements and downplaying (or omitting) others. For example, where will you find the contents lists on this site concern for ecological sustainability? Nowhere we think. Now, though, it’s very likely that a journal devoted to the trades would have a section devoted to it.

With all the above in mind, therefore, it won’t surprise you to know that we have been very careful indeed in how we have digitally reproduced paper periodicals on the site.

We have used four main kinds of image reproduction on this site with the express intention of helping the user to reflect on the advantages and disadvantages of each and above all, to think about what kinds of knowledge – what kinds of conclusions – we tend to draw when the software changes, even if the digitised materials themselves don’t.

various stages of zooming in the *British Workwoman* metabotnik.

1. One of the most spectacular ways we chose to display the periodicals is via the ZOOM & BROWSE INTERFACE Metabotnik, a software developed by researchers at the University of Amsterdam and the publishers Brill, and funded by the Netherlands Organisation for Scientific Research. We got in touch with them early in the process of developing this site (it was quite early in their lifetime too). Besides the real pleasures metabotniks can give (we won’t pretend we don’t love playing with the zoom) we are also concerned to reflect on what can we learn from this kind of reproduction that we can’t from the others? What kind of “searches” can we perform? It is also technically much much less work to generate a metabotnik than the following, and in a few cases we have decided to make a metabotnik available before uploading the rest.

the PDF followed by JPEGs of individual pages of the *Meat Trades Journal*

2. In reproductions of separate or single numbers we have – where feasible – provided JPEG images of individual pages as well as…

3. PDFs of the whole issue. PDFs connect the pages together and give a different experience from disconnected individual pages. They also suggest ready-made contexts for individual images and texts. We wanted to give both JPEgs of individual pages and PDFs of whole issues – where possible – because they serve different purposes. For a start, it’s easier to copy and paste a single JPEG image of a page into another website, article or story than to do the same from a PDF where you have to copy the page or the segment of text or an image. The quality of the image is much sharper too. Yet what are the implications of reading a single page or a single article isolated from the rest of the periodical? It might be that readers of a single page might get completely the wrong idea (I’m afraid a lot of people who just do word searches in documents without looking at the wider context do just that).

4. The 4th form we use consists of uncorrected OCR of the PDFs: that is complicated so we have a separate section below on it.

5. There is also a 5th method used just for issue 1 of the British Workman: a kind of “tagging” of sections that many digital projects use. It proved too time-consuming to continue given its likely usefulness: however interesting it was, its cost-benefit ratio was just too ineffective for the kind of low-cost-maximum-return enterprise we are.

OCR

OCR (optical character recognition) is an automated process whereby a computer translates squiggles (letters, numbers, words etc) in one text into versions of those squiggles in another that a computer can recognise when we search for them.

A lot of the time the OCR is “invisible” under the image we see on the screen but – with the lot of luck – a Google search might pick it up. We have given an example of what the OCRd text of a Victorian periodical looks like here in our digitisation of the first issue of the British Workman. It’s immediately obvious how very inaccurate it is. While improvements in the accuracy of OCR are constantly being made (and have certainly improved from the 2009 National Library of Australia’s report), the claims made by commercial companies to 100% accuracy only apply to clean, crisply-printed recent texts: certainly not nineteenth-century newspapers and periodicals!

The OCRd text on some of the issues of the British Workman on the BLT19 site was produced using Abbyy FineReader Pro for Mac software. While FineReader can run automatic scans, it also allows manual scan ordering, which was essential when it came to the British Workman‘s inconsistent layout. Because of the pilot project’s limited timeframe and very limited budgets, the OCR’d text has either been only lightly corrected to minimise inconsistent line breaks or not corrected at all. In more recent uploads (since 2021) we have used updated versions of Abbyy FineReader encoded into the PDF where this was possible.

Leaving the incorrect text in a visible manner as we have done in some issues shows very clearly how inaccurate OCR generally is when applied to nineteenth-century periodicals, even in 2021. That, of course, has huge implications for what we can find when we search online: we can only tell stories about the world based on what the software can find for us. We do not necessarily need wilful interference by someone to miss or misinterpret important information: it may well be that the software has simply not recognised what we really needed to know and given us wrong answer instead.

That means we need to be very careful with our digital searches without even thinking about whether the answers and stories we are given is deliberately skewed in some way!

We must never forget that the stories we make about our histories and ourselves are only as accurate as what information and stories we can find – or think to look for.

Platform and Interface(s)

Scanning the periodicals

Images on the screen

OCR

BLT19: AK with huge thanks to AMH

Leave a Reply Cancel reply