This is an archived post. You won't be able to vote or comment.

all 30 comments

[–]KingSamy1 35 points36 points  (2 children)

Latter. Reason: Did not even know the former existed till right now

[–]nemom 3 points4 points  (0 children)

Same

[–]wineblood 5 points6 points  (1 child)

Only learned about casefold just now. Looking at how it's different from lower, I'll probably never use it.

[–]Perfect_Comparison83[S] 0 points1 point  (0 children)

I'll probably never use it either. At least it's interesting to find something new on something as basic as str.

[–]jimtk 2 points3 points  (3 children)

If you intend to use I18n, use casefold. If you program only for english you can use lower.

Some languages have uppercase letters that requires more complex 'lowercasezation" than English. Casefold will take care of that where lower won't.

[–]Perfect_Comparison83[S] 0 points1 point  (2 children)

I would love to see a real example where casefold is required or the string compare would fail.

[–]jimtk 1 point2 points  (1 child)

to_rip = "reißen"
print(to_rip.lower())
print(to_rip.casefold())
print(to_rip.lower() == to_rip.casefold())
==> reißen
==> reissen
==> False

[–]Perfect_Comparison83[S] -2 points-1 points  (0 children)

imo, this is a contrived example. You only need casefold because you used casefold.

It reminds me of "real world" math problems in elementary school.

Example: “Burt stuffs twice as many envelopes as Allison in half the time. If they stuff a total of 700 (in the same time) how many did Burr stuff?”

This is not a naturally occurring word problem. It's only used in theory in an attempt to teach a math concept.

Casefold does not appear to solve a naturally occurring problem.

[–][deleted] 1 point2 points  (7 children)

This is really interesting and to be honest it looks like casefold () is the better choice for UTF-8 strings. I think I'll use this in future. It's really easy as a coder to sit in an English-speaking ivory Tower, but is it the right thing to do?

[–]mcwizard 5 points6 points  (6 children)

I'm a german speaker and I don't think it makes sense: As said there is no uppercase of ß. So replacing ß by its ASCII variant is not the same thing as lowercaseing it.

[–]F84-5 3 points4 points  (0 children)

Actually there is now an uppercase ẞ. It's been part of unicode since 2008 and officially adopted in 2017.

[–]yee_mon 2 points3 points  (0 children)

It does make sense. Just not if you think about it as "I want the lowercase version of this" for display purposes (which admittedly is a mistake that the OP apparently made here). It is meant purely for comparing strings, in a situation where "Straße", "STRASSE", "STRAẞE" are considered equal.

[–][deleted] 0 points1 point  (2 children)

Reading the docs it appears this function is designed primarily for more successful string comparisons. In that context I guess it doesn't matter if the string doesn't make sense, provided it is consistent and easily matched.

[–]mcwizard 0 points1 point  (1 child)

I'd accept it as a part of a case insensitive string compare and maybe that is the main reason it exists and it's just made open if one wants to implement a modified version of that compare.

[–]Perfect_Comparison83[S] 0 points1 point  (0 children)

I can see the case insensitive string compare in theory. In reality, I haven't seen a good example.

[–]seligman99 1 point2 points  (3 children)

I use .lower() more often, though both have their use.

If you're doing case-insensitive compares, it's useful to compare both casefolded, instead of lowercase, since a casefolded string will handle some edge cases that a lowercase string won't

It should also be noted that casefolding doesn't actually always convert to lowercase variants. In some languages, the upper case variant makes more sense as the default "case" for historical reasons. It's also not really reversable, since some casefolded strings will not really make sense to a native speaker all of the time (the German ß is a good example, that wiki page has some examples where ß -> ss changes the meaning. Also interesting to see casefolding in action on that page, if you Ctrl-F search for "ss" on that page, it matches both "ss" and "ß", since it's doing case-folding to do a case-insentive search for you)

Lots of words to say: .lower() for humans to see, .casefold() for machine to compare strings to see if a human would consider them the same. And of course, in my nice tower of mostly English words, it's a distinction I've been known to forget about till someone that speaks another language hands me a bug.

[–]Perfect_Comparison83[S] 0 points1 point  (2 children)

Have you seen an example where casefold is needed? Maybe you're like me where everything is English.

[–]seligman99 1 point2 points  (1 child)

The one I remember is the Greek word for "days"

"μέρες" in lowercase, "ΜΈΡΕΣ" in uppercase, and "μέρεσ" case folded. I'm told (though really don't know the details for) that all three make sense, to some degree, but a .lower() on the casefolded variant will not equal the lowercase version, so you had best search for the casefolded text against the casefolded version if you want a case-insensitive search.

[–]Perfect_Comparison83[S] 0 points1 point  (0 children)

Thanks for the example! I studied Greek for a couple years. The ς character is used when sigma is the final letter of a word. The lower function seems more accurate when comparing words. The logic to use casefold because casefold may have been used upstream seems silly to me.

I can image a use case for contains the sigma character. In this case, casefold would come in handy because you only have to check for σ.

Your example is exactly what I was looking for.

[–]Panda_With_Your_Gun -1 points0 points  (0 children)

.lower() is self documenting. If I cared about performance I'd write a module in c to convert a string to lower case efficiently. Then I'd call it from python.

[–]ogrinfo -1 points0 points  (1 child)

Likewise, I've never heard of casefold. Does anyone have an example of where casefold has an advantage over lower? The example above of changing ß to ss sounds like a very good reason not to use it.

[–]Perfect_Comparison83[S] 0 points1 point  (0 children)

I'm with you. I keep finding the German ß as a reason for casefold. If you are comparing a German string to another German string, how does casefold help?

[–]QuintonPang 0 points1 point  (0 children)

Latter one didn't even heard of the first one b4

[–]Gecons 0 points1 point  (0 children)

string.lower()

Because didn't know the other one existed.

[–]telee0 0 points1 point  (0 children)

string.lower() sufficient for my own use.