The Format Puzzle: Overcoming micro- and macrostructural variations in place-name editions in place-name databases

Bo Nissen Knudsen

The first volume in the printed place-name series Danmarks Stednavne (Place-names of Denmark) was published in 1922 – 12 years after the establishment of Stednavneudvalget (the Place-Name Commission) in 1910. In 2013 volume 26 is due to be published, and still only about 2/3 of the area of Denmark is covered by the printed edition. In light of this, establishing a comprehensive web-database of Danish place-names in 3 years seems an almost cocky ambition.

The web-database now published at www.danmarksstednavne.dk obviously draws heavily on the printed edition – which has been digitalised through scanning and human-assisted character recognition. But the century-long effort of publishing in printed form has spawned a series of challenges to a strict database integration; first of all variations in microstructure making the parsing into information categories (i.e. database fields) quite difficult.

A challenge which might seem even greater, though, is the macrostructural variation: As mentioned, about 1/3 of the country is not covered by the printed edition at all, and the areas covered are covered with shifting principles of the selection of names. To overcome this challenge, the printed series has been supplemented by an official cadastral name database as well as a database of medieval settlement names.

And in order to obtain some consistency, new information categories such as name generics have been added, geo-coding from the cadastral database has been applied to the names from the printed series when possible – and a web-tool for manual geo-coding of the remaining names has been developed.