|
Germanic Lexicon Project
Message Board
|
|
|
Author: Sean Crist (Nuance Communications)
Email: kurisuto at panix dot com
Date: 2008-04-29 16:36:38
Subject: Re: Error report: Cleasby/Vigfusson, pages b0001 thru b0099 - bold-face missing
> I would like to correct some of these pages, but since they are already marked
> as "corrected", they are not available for further corrections.
This is a real shortcoming of the current system, and there's not any simple fix. Others have brought this up with me. I think the design of the current system made sense in terms of what I could develop with very limited resources, at a time when the text hadn't been corrected at all; it was better than nothing. Now that the first round of correction is mostly done, there's obviously a problem, because there's no mechanism for correcting remaining errors.
In my spare time, I'm working on a totally new version of the system which will allow corrections to be made on an ongoing basis, Wiki-style, along with many other features not found in any existing system that I know of. However, the system is so large and complex that I estimate that it will be a minimum of a year before it is deployed (if I ever do manage to finish it).
The biggest problem is a near-total lack of resources. While I was still in academia, I submitted six major grant proposals over a period of several years to funding agencies such as the NEH and the NSF. While the project always received favorable reviews, it never made the final cut. Historical linguistics is just not the priority of the field at present. Now I have gone to industry, so the project is something I work on in my spare time. (So, I hope everyone appreciates that the project is even still on the air at all!)
Ondrej Tichy and his team in the Czech Republic have largely taken over the ongoing work on Bosworth/Toller. For the other texts, and for the programming infrastructure, there is really no institutional support or resources, so progress is very slow.
If you yourself are interested in taking over management of Cleasby/Vigfussion on a major scale, comparable to what Ondrej is doing for Bosworth/Toller, we can discuss how this could be handled. However, I just can't take on the work of accepting second-stage corrections to individual pages, because I'd have to go into the guts of the system and add each re-corrected page by hand. Under the circumstances, I think it's more important that I spend my limited time on the long-term solution.
> Another question:
> When are you changing the encoding of the texts to Unicode? It seems that
> most operating systems and editors now support Unicode in one form or
> another, and going Unicode shouldn't be a problem anymore:)
The actual conversion of the texts to Unicode is trivial, because there is already a clearly defined mapping (click on "About" and then "Character encoding standards" for the table). Back around 2001, when I set up the entities, there wasn't yet widespread support for Unicode, but it was clear that we'd be moving to Unicode in the long run, so every entity was set up with a clearly defined Unicode equivelent (except for the handful of characters, such as thorn-bar (
), which are in the Unicode approval pipeline but haven't yet been accepted into the Unicode standard).
The current system isn't set up to handle Unicode. If you try to give it Unicode text, there are things which would not work correctly. So, I can't currently serve up Unicode text.
If and when the new system is finished, it will be a fully Unicode-based system (specifically, UTF-8).
--Sean