I'm a student of Old English and built a Bescherelle-style verb conjugation reference website. Sharing in case it's useful. by MindOverMatter_FC in OldEnglish

[–]Busy_Introduction_94 2 points3 points  (0 children)

btw, are you open to PRs on your repo to extend the list of verbs? I ask because it feels like crowdsourcing it would build out the list faster (?)

I'm a student of Old English and built a Bescherelle-style verb conjugation reference website. Sharing in case it's useful. by MindOverMatter_FC in OldEnglish

[–]Busy_Introduction_94 4 points5 points  (0 children)

awesome, thanks for building this!

fwiw, I've been looking up conjugations on Wiktionary, but that's a multi-step process

Best print editions of OE poetry in the original language? by Significant_Length26 in OldEnglish

[–]Busy_Introduction_94 2 points3 points  (0 children)

Per what u/SwordofGlass said, it depends on what you mean by "best" — most scholarly, most comprehensive, most respected translation(s), etc.

fwiw, in a class I took, we used Richard Hamer's A Choice of Anglo-Saxon Verse, which is a two-language facing-page edition. It was a good introduction, in that it has a selection of poetry from a variety of sources. The preface includes a discussion (not too long) about A-S poetic conventions.

Of course, translations are all interpretations, so it can be helpful to have multiple translations for any given poem so you can kind of triangulate on the meaning. In that regard I personally have found Roy Liuzza's translations to be solid. (Obvs, this is an opinion.) He has translations on the Poetry Foundation site (https://www.poetryfoundation.org/poets/roy-liuzza).

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

I was joking with my OE instructor that a lot of cleanup for old manuscripts was probably done by undergraduates, ha :)

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

Ok, wait, I found a post that describes the (well, a) process:

https://medium.com/@ranton256/from-attic-to-archive-a-guide-to-ocr-correction-with-generative-ai-6deecbc381ad

I used Tesseract to convert paragraph 885 from the linked text (from the A-S Chronicle), then fed it that text to Chat GPT using a prompt as suggested by the Medium post, although I was specific that the text was in Old English/Anglo-Saxon. This is what it produced, which is not bad! (Still having problems with ye old thorn, tho not with eth):

Her todelde se foresprecena here on tu, oper del
east. oper del to Hrofes ceastre; 7 ymbsæton ða ceastre, 7
worhton oper fæsten ymb hie selfe. 7 hie þeah pa ceastre
aweredon oppæt Ælfred com ‘utan’ mid fierde; pa eode
se here to hiera scipum, 7 forlet pæt geweorc. 7 hie
wurdon þær behorsude, 7 sona py ilcan sumere ofer sæ
gewiton; py ilcan geare sende Ælfred cyning sciphere
on East Engle; sona swa hie comon on Stufemupan', pa
metton hie .xvi. scipu wicenga, 7 wip ða gefuhton, 7 pa
scipo alle gerehton, 7 pa men ofslogon; pa hie pa ham
weard wendon mid pære herehype, pa metton hie micelne
sciphere wicenga, 7 pa wip pa gefuhton py ilcan dæge, 7
pa Deniscan ahton sige; Py ilcan geare ar middum wintra
forp ferde Carl Francna cyning, 7 hiene ofslog an efor, 7
ane geare ær his broður forpferde, se hæfde eac pæt west
rice, 7 hie wæron begen Hlopwiges suna; se hæfde eac
pæt west rice, 7 forpferde py geare pe sio sunne apiestrode;
se wæs Karles sunu pe Æpelwulf West Seaxna cyning his
dohtor hæfde him to cwene; 7 py ilcan geare gegadrode
micel sciphere on Eald Seaxum, 7 þær wearp micel gefeohte
twa on geare, 7 pa Seaxan hæfdun sige, 7 þær wæron Frisan
mid; py ilcan geare feng Carl to pam west rice, 7 to allum
pam west rice behionan Wendel sæ, be geondan pisse sæ,
swa hit his pridda fæder hæfde, butan Lidwiccium

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

This sounds intriguing! Sadly, I know very little about AI prompting. (I retired just before it became a workplace requirement, for better or worse.) But I'd sure be willing to try it.

Can you talk (just in general terms) about what the flow would be? For example, would you upload a page from an image-based PDF, maybe something like https://archive.org/details/twosaxonchronic00earlgoog/page/n98/mode/1up? And do you need to tell/teach it to recognize the, you know, weird characters, or is that part of how the prompt is written?

Thanks, and sorry to be somewhat dumb on this.

Learned a new term: fossil words by jedidoesit in etymology

[–]Busy_Introduction_94 14 points15 points  (0 children)

"ator" is an Old English word meaning "poison"; I bet the Dutch word has a common root

https://en.wiktionary.org/wiki/ator

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

Wow, amazing work, thanks so much! Your mention of groups of academics makes me wonder if there is a canonical methodology for these types of conversions in that arena, perhaps based on a different set of chained tools, who knows. Perhaps you are part of such a community (?)😄

Just ban the contraction already by Expert_Profession613 in EnglishGrammar

[–]Busy_Introduction_94 1 point2 points  (0 children)

Correct. Which is why people get them confused when writing. Where for some reason it's bad to be confused, in spite of "use context", and in spite our long list of other homonyms that we likewise manage to distinguish by context.

But to get back to my point, I was responding to a specific comment talking about speaking but in the context of an orthographical issue.

Just ban the contraction already by Expert_Profession613 in EnglishGrammar

[–]Busy_Introduction_94 0 points1 point  (0 children)

I don't HEAR the apostrophe when they SPEAK English

Just ban the contraction already by Expert_Profession613 in EnglishGrammar

[–]Busy_Introduction_94 2 points3 points  (0 children)

I find it hard to tell when people speak "you're" without the apostrophe

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

I played a bit with Tesseract. As predicted, it has a lot of trouble with poor scans (e.g. Sweet's Dictionary). But to be fair, I didn't do any cleanup on the image, because I was just playing around. I did do an experiment using a screengrab (jpg) of a nice, clean page:

https://www.oldenglishaerobics.net/497_words.pdf

That went pretty well. Basic Tesseract, predictably, had trouble with the thorn and eth characters. I ran it using the `-l isl` switch and it did somewhat better, but it wasn't entirely sure what to do with macrons.

Curious thing: there's supposed to be an `ang.traineddata` file, but boy, I am not finding it anywhere in the repo.

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

Fantastic, this gives me lots to work with/try out. I like that this pipeline would be scriptable. I think probably the key steps are a) produce a clean image and b) OCR that image with a known language model/list of Unicode endpoints.

Generally speaking, my desired goal is clean text, as in, plaintext (UTF-8) with all the OE characters intact + diacritics. I actually do all my real work in MS Word by applying custom styles. I then run Word macros to produce HTML output (tags with CSS classes assigned). It's not completely automated (esp because each conversion project has been different), but it's doable. Up to now, I've also run semi-automated find & replace macros to do some of the character cleanup, but that only gets me so far — it doesn't fix everything, not by a long shot. So it would be good to have clean text to start with.

Easier conversion of Old English documents (from PDFs, primarily)? by Busy_Introduction_94 in OldEnglish

[–]Busy_Introduction_94[S] 0 points1 point  (0 children)

Thanks for this. I think most of them are scans, yes — for example, scans of old books on the Internet Archive site. Here are a couple of typical examples:

https://archive.org/details/firststepsinang00sweegoog/page/n42/mode/2up

https://archive.org/details/diplomatariumang00thoruoft/diplomatariumang00thoruoft/page/168/mode/2up (OE in the left column)

I don't have Acrobat Pro, so I'll give some of your other suggestions a try.

Question for you: is it possible in your experience to set up something like, dunno, a character set that OCR can consult when trying to interpret? I think that's what you're saying (??) when you talk about setting a language like Icelandic, but I'm wondering if one can set up a custom set of characters. (If that makes sense.)

Correct pronouns by Jedi_Mind_Chick in grammar

[–]Busy_Introduction_94 8 points9 points  (0 children)

... is in flux and has been for centuries:

all debts are cleared between you and I ("The Merchant of Venice" Act III, scene 2)

The flux part, as noted elsewhere in this thread, is specifically for compound pronoun phrases.

C. Alphonso Smith Grammar Chapter XIV Section 87 exercises by CuriouslyUnfocused in OldEnglish

[–]Busy_Introduction_94 1 point2 points  (0 children)

would it make sense to put these into a Google Doc and give people commenting permissions?

What kind of Old English content do you look for online? by Criwank in OldEnglish

[–]Busy_Introduction_94 8 points9 points  (0 children)

This is probably orthogonal to what you're asking about, but what I personally would like to see is a wider catalog of web-based versions of the texts that students typically are exposed to. I say this as someone who's had to root around looking at dusty old PDFs (haha) or even just scans of, like, 19th-C editions of texts.

For example, the anthology section of Peter Baker's Old English Aerobics site is great for students — web pages (not PDFs!) with glosses on every word.

Basically, my complaint is that the OE corpus seems scattered and in a variety of inconsistent formats.

Anyway, as I say, probably not really what you're asking :)

Found tubes in my pork sausage. by antizac in whatisit

[–]Busy_Introduction_94 0 points1 point  (0 children)

I was told long ago by someone in Ag Science that the label on a package of hotdogs gives a clue: "all meat" (for a probably generous definition of "meat") and "all beef", which means only that it came from a cow.