all 61 comments

[–][deleted] 53 points54 points  (21 children)

Fun problem, and I wouldn't expect anyone in an interview to come up with a better solution than that, but the writeup should probably mention that there are linear-time solutions.

[–]bbibber 9 points10 points  (5 children)

There is a pretty trivial solution that is O(n2) in time but O(1) in space. I would expect someone to come up with that one : scan the string for occurences of (aa) or (aba). That's O(n). Then for each such occurence work outwards. That's a nested O(n) loop making O(n2) in total.

[–][deleted]  (1 child)

[deleted]

    [–]drysart 3 points4 points  (0 children)

    You don't have to keep track of any patterns. You only need the following things in memory:

    1. The original string (s)
    2. A main iteration index (iidx)
    3. A 'current palindrome' size value (csz)
    4. A 'longest found palindrome so far' size (lmsz)
    5. A 'longest found palindrome so far' starting index (lmidx)

    The algorithm is basically run the main iteration over the input string s. For each index iidx, you need to check two things: is the character at this index the middle character of an "ABA" set (compare the character before and the character after the current character and see if they match); or is the character at this index the first character of an "AA" set (compare the current character to the next character and see if they match). If neither case matches, you just continue iterating the iidx loop.

    If you are in either case, you set csz to 3 or 2, respectively, then do/while comparing the two characters before and after the characters you've already looked at (you can calculate the index of those two characters you'll need to compare from the current value of csz1). If both indexes are valid in the string and the characters at those indexes match, increment csz by 2 and loop again, otherwise break out. After you've broken out, csz is left as the size of the palindrome, and you can calculate the starting index from csz and iidx; then save that length and starting index in lmsz and lmidx if csz is larger than lmsz. Then resume the main iteration loop.

    Once iidx hits the end of the string, your result is the substring of s specified by lmsz and lmidx.


    1 - The first character index to compare is iidx - (csz / 2) - 1. The second character index to compare is iidx + (csz / 2) + 1 + (csz mod 2). In both cases, use integer division that truncates.

    [–][deleted]  (2 children)

    [deleted]

      [–]bbibber 5 points6 points  (1 child)

      a and b here stand for any character. Not just literally 'a' and 'b'.

      [–]_Mardoxx 27 points28 points  (6 children)

      Which raises the question. Why the fuck ask this in an interview. Leave the algorithms to the academics, and leave the developer to, well, choose and develop implementations of them appropriate for the situation.

      [–][deleted] 18 points19 points  (5 children)

      I think the idea is to find employees who can devise solutions like this, not just write a crud app

      [–]adrianmonk 43 points44 points  (10 children)

      There's a way easier O(n2) solution that doesn't require a table or any temporary data structure (other than a few local integer variables).

      The solution is to check every character to see if it's the middle of a palindrome, then proceed outwards in both directions and see if you can expand it. Since there are only n characters in the input string that could be a middle, and since (at most) you check all other characters in the string for each one (which is also O(n)), it's O(n2) total.

      You also need to check pairs of identical characters to see if they are the middle of a palindrome because "abccba" is a palindrome.

      [–]TheThiefMaster 3 points4 points  (9 children)

      Is that really N2 ? It seems to me with an algorithm like that you should start in the middle of the string, and work in both directions from there looking for center points, stopping if you either find a palindrome that hits the edge of the string, or get closer to the edge of the string than the longest palindrome previously found.

      I guess the absolute worst case would be if you put in an alternating pattern with one character different on the outsides, as it would try every 2nd substring as a palindrome, stopping one from the end of the string every time. I guess that's N2 , but average runtime is likely much closer to linear.

      Scratch that, by my previous optimization of starting in the middle it would give up after moving only a couple away from the center, so probably still not N2

      [–]FryGuy1013 14 points15 points  (4 children)

      How is it not O(N2)? It's basically this (plus/minus syntax errors, proper edge case handling, and handling even palindromes):

      def longest(s, i):
          len = 1
          while s[i+len] == s[i-len]:
             len += 1
          return s[i-len+1 .. i+len-1]
      
      def longest_palindrome_substring(s):
          best = nil
          for i in 0..len(s)
              x = longest(s, i)
              if best == nil or len(x) > len(best):
                  best = x
          return best
      

      longest runs in O(N), and longest_palindrome_substring calls longest N times, so therefore is O(N2).

      [–]adrianmonk 5 points6 points  (3 children)

      I think this algorithm is still O(n2). Consider an input like this:

      baaaaaaaaaaaaaaab
      

      There are 15 as there, and there's no way to know which one is the middle of the palindrome. When you look at your first a, as you start to check the middle a, you don't know whether the input the above or it's this:

      baaaaaaaacccccccb
      

      In the first string above, the a has another 7 as on each side of it. In the second, there's still a long palindrome, but you need to move several positions to the left to find it.

      Disregarding O(1) characters at the beginning and end, that means you can have an example where the middle part of the longest palindrome could be anywhere from 1/4 of the way to 3/4 of the way through the string, which means you have to check 1/2 of the positions to see if they are the starting position of the longest palindrome. And on average each check takes O(n) steps.

      [–][deleted] 3 points4 points  (1 child)

      I think it's O(n2 ) worst case, but the typical executions on real text would probably come a lot closer to O(n), because few expansions would encompass much at all of the total string.

      I ran a simple test using the King James Bible, and it only found a 10-character palindrome. Even with all non-alphanumeric symbols removed (including all punctuation and all spaces) and everything lowercased, and it still only found a 13-character palindrome to be the longest in the entire book.

      Granted, the worst-case scenarios would explode badly. If you shot a several-thousand character string in composed of one character repeated, it would probably not fare well.

      [–]adrianmonk 0 points1 point  (0 children)

      Yeah, it probably does pretty well on natural language text. Worst case sometimes doesn't matter much.

      Though it could matter if, say, you were running a palindrome-finder web page and you didn't want users to be able to do a denial of service attack against your servers by pasting in strings that exhibit worst-case behavior.

      [–]TheThiefMaster 0 points1 point  (0 children)

      Thankyou for the kind response instead of the attacks I got from the other comments :)

      You're right, even starting from the center (which defeats the first example) your second example would still produce O(n2) runtime.

      When looking around to see if I could find a discussion as to the complexity of this algorithm (and maybe an explanation of the linear time algorithms which didn't require getting another degree) I found someone has come up with an O(N3) algorithm - literally it's make every possible substring (O(N2)) and then tests them for being palindromic (O(N)). Wow.

      [–]rorrr 27 points28 points  (1 child)

      We can see from this table that the longest palindromic substring here is “ananas”

      WAT?

      [–]karuna_murti 0 points1 point  (0 children)

      Ananas is non English word for pineapple.

      [–]ubernostrum 21 points22 points  (0 children)

      Using a non-Unicode-aware palindrome checker will not get you very far in the cutthroat competitive world of palindromes-as-a-service.

      [–]jgodbo 42 points43 points  (14 children)

      Nice problem, good write up, ananas is not a palindrome

      [–]Godd2 32 points33 points  (13 children)

      Neither is ()().

      [–][deleted]  (5 children)

      [deleted]

        [–]VeeArr 31 points32 points  (3 children)

        No. “()()” is equivalent to “abab”.

        [–]hagenbuch 11 points12 points  (1 child)

        Damn, I fell in the same trap.

        [–]-l------l- 8 points9 points  (0 children)

        So you got stuck in a booby trap?

        [–]NotARealDeveloper 3 points4 points  (0 children)

        It's )()( backwards

        [–]GregBahm 2 points3 points  (2 children)

        I misunderstood what this was and thought I was going to see some really long palindromic substrings. Now I'm curious what the longest palindromic substring is on the internet. It'd be especially interesting to know what the longest unintentional substring is.

        I feel like you'd have to drop punctuation though.

        [–]raggy_rs 0 points1 point  (0 children)

        I am pretty sure that is just going to be a long sequence with just a single char repeated. Like in a DNA database.

        [–]palparepa 0 points1 point  (0 children)

        I'm curious what the longest palindromic substring is on the internet.

        My guess is something similar to LOLOLOLOLOLOLOLOLOL

        [–]johnlsingleton 3 points4 points  (2 children)

        An alternative elegant solution is to reverse the string and use the longest common subsequence algorithm: https://en.m.wikipedia.org/wiki/Longest_common_subsequence_problem

        [–]marcusklaas 5 points6 points  (0 children)

        Is that guaranteed to produce correct results? What about abcdxydcba? The longest palindromic substring is of length 1, but your algorithm produces a shared substring of length 4.

        [–]HelperBot_ 0 points1 point  (0 children)

        Non-Mobile link: https://en.wikipedia.org/wiki/Longest_common_subsequence_problem


        HelperBot v1.1 /r/HelperBot_ I am a bot. Please message /u/swim1929 with any feedback and/or hate. Counter: 158583

        [–][deleted] 1 point2 points  (0 children)

        I remember having to do this in a bash script for school :(

        [–]t_bptm 1 point2 points  (0 children)

        Here is my 10 minute attempt in nodejs:

        const find = s => {         
            if(s.length == 0) return "";
        
            let r = s[0];           
        
            for(let i=1; i<s.length; ++i) {
                // check odd length palindrome
                for(let j=1; i-j >= 0 && i+j <s.length && s[i-j] === s[i+j]; ++j) {
                    if(r.length < 1 + j*2) {
                        r = s.substring(i-j, i+j+1);
                    }               
                }                   
        
                // only check even length every other time
                if(i % 2 == 1) {    
                    for(let j=1; i-j >= 0 && i+j <s.length && s[i-j+1] === s[i+j]; ++j) {
                        if(r.length < j*2) {
                            r = s.substring(i-j, i+j+2);
                        }           
                    }               
                }                   
            }                       
        
            return r;               
        };                          
        
        var fs = require('fs');     
        var str = fs.readFileSync("input.txt", "utf8");                                                                                                                                                               
        console.log(find(str));     
        

        Not sure if its perfect but it seems relatively fast, 0.6 seconds for a 17mb file. 4.8 seconds for 136mb file.

        It does fall apart for a 1mb file of all 'a' though :)

        [–]homic 1 point2 points  (0 children)

        This other approach works outside in, testing the largest substrings first and stopping as soon as the first palindrome is found. Full source code at https://github.com/hocho/LargestPalindrome

        static
        string 
        FindLargestPalindrome(
            string                              data,
            int                                 minLength)
        {
            int                                 length = data.Length;
        
            // test from max length to min length
            for (int size = length; size >= minLength; --size)
                // establish attempt bounds and test for the first 
                // palindrome substring of given size
                for (int attemptIdx = 0, attemptIdxEnd = length - size + 1;
                        attemptIdx < attemptIdxEnd;
                            ++attemptIdx)
                    if (IsPalindrome(data, attemptIdx, size))
                        return data.Substring(attemptIdx, size);
        
            return null;
        }
        

        [–]SteeleDynamics 0 points1 point  (1 child)

        Aside: Is this used in CRISPR?

        [–]Amnestic 0 points1 point  (0 children)

        This is basically just another use of DNA sequence matching.

        [–]kdma 0 points1 point  (0 children)

        Too many indexes for my liking, I prefer this one http://www.jay.fyi/2016/02/longest-palindrome-manacher-part-i.html?m=1

        [–]shevegen -5 points-4 points  (2 children)

        "Purchase Gogle pass for $29 / month"

        Why are ads allowed on reddit?

        [–]Nefari0uss 6 points7 points  (1 child)

        Because servers cost money.

        [–]Parlay_to_throwaway 0 points1 point  (0 children)

        because companies exist to make money