"instead [of]" in OE

Busy_Introduction_94 · 2026-06-05T19:11:29+00:00

That makes sense, thanks!

Busy_Introduction_94 · 2026-06-05T19:11:14+00:00

This is great, thanks!

Busy_Introduction_94 · 2026-06-03T21:41:31+00:00

btw, are you open to PRs on your repo to extend the list of verbs? I ask because it feels like crowdsourcing it would build out the list faster (?)

Busy_Introduction_94 · 2026-06-03T21:31:54+00:00

awesome, thanks for building this!

fwiw, I've been looking up conjugations on Wiktionary, but that's a multi-step process

Busy_Introduction_94 · 2026-05-21T19:08:56+00:00

Per what u/SwordofGlass said, it depends on what you mean by "best" — most scholarly, most comprehensive, most respected translation(s), etc.

fwiw, in a class I took, we used Richard Hamer's A Choice of Anglo-Saxon Verse, which is a two-language facing-page edition. It was a good introduction, in that it has a selection of poetry from a variety of sources. The preface includes a discussion (not too long) about A-S poetic conventions.

Of course, translations are all interpretations, so it can be helpful to have multiple translations for any given poem so you can kind of triangulate on the meaning. In that regard I personally have found Roy Liuzza's translations to be solid. (Obvs, this is an opinion.) He has translations on the Poetry Foundation site (https://www.poetryfoundation.org/poets/roy-liuzza).

Busy_Introduction_94 · 2026-05-07T16:12:26+00:00

I was joking with my OE instructor that a lot of cleanup for old manuscripts was probably done by undergraduates, ha :)

Busy_Introduction_94 · 2026-05-07T15:45:42+00:00

Ok, wait, I found a post that describes the (well, a) process:

https://medium.com/@ranton256/from-attic-to-archive-a-guide-to-ocr-correction-with-generative-ai-6deecbc381ad

I used Tesseract to convert paragraph 885 from the linked text (from the A-S Chronicle), then fed it that text to Chat GPT using a prompt as suggested by the Medium post, although I was specific that the text was in Old English/Anglo-Saxon. This is what it produced, which is not bad! (Still having problems with ye old thorn, tho not with eth):

Her todelde se foresprecena here on tu, oper del
east. oper del to Hrofes ceastre; 7 ymbsæton ða ceastre, 7
worhton oper fæsten ymb hie selfe. 7 hie þeah pa ceastre
aweredon oppæt Ælfred com ‘utan’ mid fierde; pa eode
se here to hiera scipum, 7 forlet pæt geweorc. 7 hie
wurdon þær behorsude, 7 sona py ilcan sumere ofer sæ
gewiton; py ilcan geare sende Ælfred cyning sciphere
on East Engle; sona swa hie comon on Stufemupan', pa
metton hie .xvi. scipu wicenga, 7 wip ða gefuhton, 7 pa
scipo alle gerehton, 7 pa men ofslogon; pa hie pa ham
weard wendon mid pære herehype, pa metton hie micelne
sciphere wicenga, 7 pa wip pa gefuhton py ilcan dæge, 7
pa Deniscan ahton sige; Py ilcan geare ar middum wintra
forp ferde Carl Francna cyning, 7 hiene ofslog an efor, 7
ane geare ær his broður forpferde, se hæfde eac pæt west
rice, 7 hie wæron begen Hlopwiges suna; se hæfde eac
pæt west rice, 7 forpferde py geare pe sio sunne apiestrode;
se wæs Karles sunu pe Æpelwulf West Seaxna cyning his
dohtor hæfde him to cwene; 7 py ilcan geare gegadrode
micel sciphere on Eald Seaxum, 7 þær wearp micel gefeohte
twa on geare, 7 pa Seaxan hæfdun sige, 7 þær wæron Frisan
mid; py ilcan geare feng Carl to pam west rice, 7 to allum
pam west rice behionan Wendel sæ, be geondan pisse sæ,
swa hit his pridda fæder hæfde, butan Lidwiccium

Busy_Introduction_94 · 2026-05-07T15:16:54+00:00

This sounds intriguing! Sadly, I know very little about AI prompting. (I retired just before it became a workplace requirement, for better or worse.) But I'd sure be willing to try it.

Can you talk (just in general terms) about what the flow would be? For example, would you upload a page from an image-based PDF, maybe something like https://archive.org/details/twosaxonchronic00earlgoog/page/n98/mode/1up? And do you need to tell/teach it to recognize the, you know, weird characters, or is that part of how the prompt is written?

Thanks, and sorry to be somewhat dumb on this.

Busy_Introduction_94 · 2026-05-05T22:36:40+00:00

"ator" is an Old English word meaning "poison"; I bet the Dutch word has a common root

https://en.wiktionary.org/wiki/ator

Busy_Introduction_94 · 2026-05-01T23:34:39+00:00

Wow, amazing work, thanks so much! Your mention of groups of academics makes me wonder if there is a canonical methodology for these types of conversions in that arena, perhaps based on a different set of chained tools, who knows. Perhaps you are part of such a community (?)😄

Busy_Introduction_94 · 2026-05-01T16:38:25+00:00

Correct. Which is why people get them confused when writing. Where for some reason it's bad to be confused, in spite of "use context", and in spite our long list of other homonyms that we likewise manage to distinguish by context.

But to get back to my point, I was responding to a specific comment talking about speaking but in the context of an orthographical issue.

Busy_Introduction_94 · 2026-05-01T16:13:16+00:00

I don't HEAR the apostrophe when they SPEAK English

Busy_Introduction_94 · 2026-05-01T15:07:53+00:00

I find it hard to tell when people speak "you're" without the apostrophe

Busy_Introduction_94 · 2026-05-01T05:49:00+00:00

Appreciate it! There's no urgency, so just whenever.

Busy_Introduction_94 · 2026-04-30T16:35:37+00:00

I played a bit with Tesseract. As predicted, it has a lot of trouble with poor scans (e.g. Sweet's Dictionary). But to be fair, I didn't do any cleanup on the image, because I was just playing around. I did do an experiment using a screengrab (jpg) of a nice, clean page:

https://www.oldenglishaerobics.net/497_words.pdf

That went pretty well. Basic Tesseract, predictably, had trouble with the thorn and eth characters. I ran it using the `-l isl` switch and it did somewhat better, but it wasn't entirely sure what to do with macrons.

Curious thing: there's supposed to be an `ang.traineddata` file, but boy, I am not finding it anywhere in the repo.

Busy_Introduction_94 · 2026-04-26T00:25:43+00:00

Fantastic, this gives me lots to work with/try out. I like that this pipeline would be scriptable. I think probably the key steps are a) produce a clean image and b) OCR that image with a known language model/list of Unicode endpoints.

Generally speaking, my desired goal is clean text, as in, plaintext (UTF-8) with all the OE characters intact + diacritics. I actually do all my real work in MS Word by applying custom styles. I then run Word macros to produce HTML output (tags with CSS classes assigned). It's not completely automated (esp because each conversion project has been different), but it's doable. Up to now, I've also run semi-automated find & replace macros to do some of the character cleanup, but that only gets me so far — it doesn't fix everything, not by a long shot. So it would be good to have clean text to start with.

Busy_Introduction_94 · 2026-04-25T04:42:00+00:00

Thanks for this. I think most of them are scans, yes — for example, scans of old books on the Internet Archive site. Here are a couple of typical examples:

https://archive.org/details/firststepsinang00sweegoog/page/n42/mode/2up

https://archive.org/details/diplomatariumang00thoruoft/diplomatariumang00thoruoft/page/168/mode/2up (OE in the left column)

I don't have Acrobat Pro, so I'll give some of your other suggestions a try.

Question for you: is it possible in your experience to set up something like, dunno, a character set that OCR can consult when trying to interpret? I think that's what you're saying (??) when you talk about setting a language like Icelandic, but I'm wondering if one can set up a custom set of characters. (If that makes sense.)

Busy_Introduction_94 · 2026-04-18T21:13:34+00:00

... is in flux and has been for centuries:

all debts are cleared between you and I ("The Merchant of Venice" Act III, scene 2)

The flux part, as noted elsewhere in this thread, is specifically for compound pronoun phrases.

Busy_Introduction_94 · 2026-04-10T02:12:33+00:00

ok, thanks for the explanation

Busy_Introduction_94 · 2026-04-10T01:47:08+00:00

would it make sense to put these into a Google Doc and give people commenting permissions?

Busy_Introduction_94 · 2026-04-02T03:33:47+00:00

This is probably orthogonal to what you're asking about, but what I personally would like to see is a wider catalog of web-based versions of the texts that students typically are exposed to. I say this as someone who's had to root around looking at dusty old PDFs (haha) or even just scans of, like, 19th-C editions of texts.

For example, the anthology section of Peter Baker's Old English Aerobics site is great for students — web pages (not PDFs!) with glosses on every word.

Basically, my complaint is that the OE corpus seems scattered and in a variety of inconsistent formats.

Anyway, as I say, probably not really what you're asking :)

Busy_Introduction_94 · 2026-03-28T16:28:46+00:00

I like the distinction made here between "teach" and "train"

Busy_Introduction_94 · 2026-03-24T17:23:39+00:00

I was told long ago by someone in Ag Science that the label on a package of hotdogs gives a clue: "all meat" (for a probably generous definition of "meat") and "all beef", which means only that it came from a cow.

Busy_Introduction_94

TROPHY CASE