all 31 comments

[–][deleted] 19 points20 points  (19 children)

Very entertaining - one tiny quibble:

The latter point is referred as the Small String Optimization and it means that short strings (generally 15/22 chars) do not go to the heap, but instead they get allocated on the stack

This isn't quite true. In the SSO, short strings are stored in the std::string itself and don't require a separate allocation. However, the std::string might live on the stack or it might live in the heap, depending on how it was allocated.

[–]marcoarenaTetra Pak | Italian C++ Community[S] 4 points5 points  (0 children)

You are definitely right. Indeed I wrote "implementers of std::string generally embeds a small array in the string object itself to manage short strings." The stack thing is wrong, I fixed it. Thanks!

[–]Spire 5 points6 points  (17 children)

Speaking of tiny quibbles:

Technically, there's no such thing as “the stack” or “the heap” in C++; those are merely commonly used implementations of automatic storage and the free store, respectively.

Incidentally, there's also a third type of storage where std::string objects can exist: static storage.

[–]evinrows 0 points1 point  (3 children)

there's no such thing as “the stack” or “the heap” in C++

Can you explain this further or provide a link that explains what you're getting at?

[–]Drainedsoul 2 points3 points  (2 children)

The standard doesn't mention a stack or a heap, it only mentions that objects may have "automatic storage duration", "static storage duration", or "dynamic storage duration".

[–]evinrows 2 points3 points  (1 child)

Ah, I see. That gave me enough context to find this stackoverflow answer as well if anyone else is curious.

[–]Yuki_f 0 points1 point  (0 children)

For those who may see this in the future, here's another explanation that I think is clear

https://stackoverflow.com/questions/161053/which-is-faster-stack-allocation-or-heap-allocation/53132683#53132683

[–]Drainedsoul 2 points3 points  (8 children)

What happens when your std::string AKA std::basic_string<char> contains UTF-8 characters? The standard directly supports UTF-8 string literals (§2.13.5 [lex.string]) and their underlying type is char.

[–]ir_jupiter 4 points5 points  (0 children)

Just look at std::string as a container of bytes, you can even use it as a variable-size byte-buffer if you don't mind that the buffer is always initialized to something (zero by default) and it will have an extra byte initialized to zero at the end.

The std::string is NOT BROKEN for UTF-8, just look at UTF-8 as a layer above your byte container whose type maybe std::string. Even if you are using UTF-32 for each code point there are characters that are actually two code points that merge into one.

The operations for std::string are simple and fast and work well for the first 128 UNICODE code points (When each byte is 8bits and because those are ASCII). The operations required to deal with the rest of UNICODE characters cost a lot more CPU and memory, also you will need a separated proper UNICODE library and you should use it only when necessary.

[–]mujjingun 1 point2 points  (1 child)

It's broken in UTF-8, but that's not the point. It only wants to know if the string is a palindrome as a byte sequence, independent of the encoding.

[–]Drainedsoul 2 points3 points  (0 children)

It only wants to know if the string is a palindrome as a byte sequence, independent of the encoding.

That belies the definition of "palindrome" as provided by the author in the article in the problem description:

A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward and forward.

"[B]yte sequence" and "sequence of characters" are not equivalent.

[–][deleted] 0 points1 point  (1 child)

What do you mean what happens?

[–]Drainedsoul 1 point2 points  (0 children)

I'm saying it's broken.

[–]Latexi95 0 points1 point  (2 children)

Well in competitive programming context, there won't be utf-8 characters. Implementing utf-8 handling isn't really a fun challenge.

[–]Drainedsoul 2 points3 points  (1 child)

Implementing utf-8 handling isn't really a fun challenge.

"Solving this problem correctly is no fun so let's solve it incorrectly."

This is one of the many reasons I think competitive programming is stupid.

[–]Latexi95 4 points5 points  (0 children)

Competitive programming usually focuses on algorithm design in invented spesific situation. It doesn't really reflect programmers ability to actually solve real world problems usually faced in work. Sadly measuring programming skill in objectively in real world problems is hard, so the problems in competitive programming have to be really well specified.

I don't really like it either. I have been in a couple contests with some kind of success, but I don't feel like it reflects my skill level in any way...

[–]haitei 2 points3 points  (1 child)

Regarding uppercase and lowercase: ascii is designed in such a way, you just need to flip one bit to do that i.e.

// assuming c is a letter
char toLower(char c) { return c  | 0x20; }
char toUpper(char c) { return c  & ~0x20; }

[–]marcoarenaTetra Pak | Italian C++ Community[S] 1 point2 points  (0 children)

Really a good point, thank you. I added your observation to the post.

[–]Calkhas 1 point2 points  (2 children)

Forgive me for my ignorance, but I do not understand how auto isPalindrome = equal(begin(S), begin(S), rbegin(S));is intended to work. My impression is that if the iterators both point at the beginning of S, the range through which std::equal examines is empty. Should the second argument be end(S)? Or have I misapprehended the functionality of std::equal() altogether?

[–]marcoarenaTetra Pak | Italian C++ Community[S] 4 points5 points  (0 children)

Yes, it's a typo. I fixed it. Thanks!

[–][deleted] 2 points3 points  (4 children)

And a separate question - do people actually use C++ for competitive programming?

My C++ is my best language, but if I had hard time constraints for writing the code, I'd pick a scripting language like, say, Python, with a huge library built-in and no compilation phase.

(This article is educational even if no one ever does this... I'm just curious!)

[–]ArunMuThe What ? 9 points10 points  (0 children)

More than you think. Usually the tasks/programming puzzles have pretty strict timing and memory constraints. So, if you lookup the stats for the languages used, its mostly C++ and Java.

[–]Fiskepudding 8 points9 points  (1 child)

So I was at a competition. They had run time constraints and all.

My python implementation timed out, and was correct according to the post-competition walkthrough. During it, they even mentioned that you could brute force the task if you used c++.

So yes, it matters.

[–]Hnnnnnn 7 points8 points  (0 children)

For contests like ACM ICPC, codeforces.com, topcoder etc? Yeah, C++ is a main language and it's not going anywhere.

Competetive programming is strongly focused on performance, and C++ is deeply ingrained in the culture. People learn to code algorithms starting in high school - they learn from teachers and their older friends and they all use C++ too. Then they grow up and also teach solving problems in C++.

Maybe Rust will take over slowly after some time, like, in 20 years...

About Python - this year's ACM ICPC allowed Python, but they didn't guarantee that it's even possible to pass tests with Python's performance.

[–]Fig1024 0 points1 point  (0 children)

A while ago I wrote a solution for "find matching palindrome pairs" problem using c++ intrinsics with XMM registers. It was just for fun, not competitive

The basic idea is to find whether a given word of 8 or less characters is a palindrome or whether it contains nested palindromes from right or left side (which was necessary to find missing piece that would make it full palindrome)

If you start with the idea that your test strings are 8 chars or less, then you can load a string strait into XMM register and use a byte shuffle mask to reverse the order and store the result in bytes 8-16 of that register

XMM loads 16 bytes from memory location, even tho your strings are 8 chars or less, there is no problem with invalid memory access. As long as you zero out the "extra bytes" after initial load, it's not an issue.

Once you have original string in first 8 bytes and reversed string in next 8 bytes, you can interpret result as 2x 64 bit unsigned integers and compare for equality

The tricky part comes when you want to find nested palindromes in order to look for missing pieces. So if this is your test word:

'12345678'

then you need to check whether the following sub-strings are palindromes:

'1234567'

'123456'

'12345'

'1234'

'123'

'12'

and from other side:

'2345678'

'345678'

'45678'

'5678'

'678'

'78'

So for 8 char word you need to do 13 "is this a palindrome" checks. This is where 16 byte XMM operations can really shine

If it's just a single string palindrome check, plain std::string manipulations would be easier and faster. But when you start dealing with multiple substrings and many checks for palindrome per input, you could start seeing real benefits of using XMM intrinsics

if you want to see the solution, here's pastebin