Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2004-10-15 16:10:27
Subject: Re: Bold I & II in BT Entries

> Rarely are the roman numerals that designate sections (under entries with more than one sections) in BT marked on the txt with bold tags, although they appear to be bold. Is it only my imagination or do you agree? In the scanned pages of course these sections have the added advantage of clarity in their spacing, but will have a single html space (unless nbsp were added).
>
> -Matthew

Matthew,

This is a very good question. You've raised an issue that I wasn't aware of. I actually hadn't noticed before that those numerals were bold, but I just went and looked at the pages, and you're definitely right. Thanks for pointing this out.

I see that BT use capital Roman numerals to number multiple definitions under one headword. They use lowercase Roman numerals in abbreviations such as Lchdm. iii. to mean "the third volume of the Leechdoms". It looks like only the upper-case ones are bold. Is that what you observed also?

So the question is what to do about it. I see three choices:

1. Mark them as bold now. If we do this, then I should do a global replacement soon to add bold tags to capital Roman numerals, to cut down on hand-correction work. However, this particular global change would be tricky. For example, in the abbreviation L. I. P., the I. should not be made bold, because it's not a Roman numeral. So I could write a program that makes I. bold unless it occurs between L. and P. But I'm sure there must be other special cases like this, and it could be really hard to identify all of them (or even most of them) so that the program doesn't wrongly bolden a lot of things that shouldn't be. Also, we'd have to go back and fix all the pages that are already done, which is a little annoying but probably shouldn't be the deciding factor.

2. Don't mark them as bold now. This might mean less work during the hand-corrections now (what do you guys think? Would it be?) On the other hand, it might also confuse other folks who are doing the hand-corrections, because the numerals are in fact bold in the original text.

3. Don't set a policy. Globally normalize this later. (I don't like this idea too much because it is messy, even if the mess is temporary; I think there should be a right way even if we're not pushing it too hard.)

When we're all done with the hand corrections, a complicated program will be written to parse the text and mark it up as accurately as possible using TEI tags to mark the headword, etymology, etc. Having the Roman numerals all in bold would make them less ambiguous and therefore somewhat easier to parse. On the other hand, it's not an insurmountable problem to parse the text even if the Roman numerals are not bold. So if doing it one way is much easier in terms of the hand-corrections, we might want to go that direction (though it's not clear to me which is easier).

I'm wavering here, but I think I'm leaning toward the first option, because it's simpler if we just make the online text match the printed text, and not have new volunteers have to learn a bunch of exceptions. Comments?

--Sean

Messages in this threadNameCollege/UniversityDate
Bold I & II in BT Entries Matthew Carver 2004-10-14 14:53:28
Re: Bold I & II in BT Entries Sean Crist Swarthmore College 2004-10-15 16:10:27