Family Tree Magazine

January 2006

 

 

 

 

 

 

 

 

British Data Archive, now the largest publisher of census material on CD, recognised early on that not many of us were lucky enough to have the census material we needed for our research available locally, greatly hampering our efforts. Buying a reader and microfilm from The National Archives (TNA) is too costly for most people, especially when they already own a computer, so publishing on CD was an obvious move.

British Data Archive embarked on a major project to digitise all the English and Welsh census returns and make them widely available to researchers around the world providing availability, for a one-off cost equivalent to a couple of research trips.

Digitising the images onto CD to sell at an affordable price was a large undertaking for a small company. It not only required the purchase of microfilm (TNA doesn’t loan film, you have to buy it) and specialist equipment, but also additional staff. British Data Archive committed themselves to completing the entire project- all areas and years- but initially it was decided to omit the 1881 Census, which had long been available in transcript form. The images for the London 1881 Census have now been digitised and the CD will be available shortly.

Coverage

Money raised from each CD release has been reinvested in the business to buy more film records and pay the extra staff needed to complete

the digitisation of all years in a sensible timescale. With the 1861, 1871 and 1891 Censuses already completed, all counties now have a minimum of three census years available. In fact half the country has four years completed and all the years are available for the largest areas ( London, Lancashire and Yorkshire), plus seven other counties. Presently about 70 percent of 1851 and 1841 have been completed, and only the smaller counties need to be done. They should appear fairly rapidly, completing the digitising project.

Image quality

Much care is taken to ensure the highest image quality possible. First the original TNA films are scanned, then processed to optimise the image quality,

by de-skewing, despeckling, resizing and generally cleaning up the images. The images are then converted into portable document format (PDF) files, usually containing one census piece per PDF file. These are bookmarked and TNA place name and street indexes are added. Every image is then checked for readability and a report made of any poor ones. These are individually re-scanned to obtain the best results, which often requires scanning in greyscale (rather than black-and-white) to capture the maximum amount of detail. If the images are still poor, new copies of the pages are ordered from TNA and re-scanned. The bad images are then replaced in the PDF files, which are then ready for burning onto CD.

The filmed image quality of the later census years generally doesn’t present problems, but the 1841 Census books

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

were written in pencil and the paper has discoloured. This makes the entire census difficult to digitise, not just the odd page, so all the census films are obtained on silver halide film, rather than the cheaper diazo films, which don’t retain as much detail. The silver halide films are then scanned as greyscale images, for the best possible image quality.

Indexes

Finding your ancestors in the enumerators’ books can be a challenge, even if you have the entire county set. Name indexes make the task easier, but except for the 1881 Census transcript and some census name indexes produced by family history societies (mainly for the 1851 Census), these weren’t available

 

 

 

 

 

 

 

 

 

when the CD sets first started being produced. The initial answer was to create a volunteer indexing project with owners of the CD sets each transcribing a portion. This was successful and is ongoing, but the rate is fairly slow, and an answer to speeding up the process was provided by the decision to provide census images and transcripts online. The initial transcription work is done in India, potential problems are then verified by volunteers, and much in-house checking and correction is done to the resulting transcripts before they are ready for use.

In order to create name indexes to use with census CDs, custom written software tools check transcripts for high and low occurrences of unusual

 

 

 

 

 

 

 

 

 

names. These entries are flagged for checking and correction by the experienced in-house indexing team. Once corrected they are ready for use in the census name indexes. This is only the first stage in the process of ensuring the accuracy of the full transcripts, which are later published online. They have to go through several more stages of

 

software checking, and at least two stages of manual checking of the anomalies that these reveal.

www.thegenealogist.co.uk

Once the name indexes have been created, they are made available as online subscriptions or on CD. The CD indexes are fully searchable using a combination of surname, forename and age, with wildcard search options available. Online subscription indexes available on www.thegenealogist.co.uk offer further search facilities including nickname and variant name searching.

British Data Archive have succeeded in providing users with good quality census images on readily available and affordable CDs. They are now following this achievement with a project to provide census indexes and transcripts of the highest accuracy available to complement them.