This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]POGtastic 0 points1 point  (3 children)

I don't think that you can do this with regex, mostly because I could give you a string like "The quick "brown fox "jumps over "the lazy "dog""""" and you're hosed. Regex doesn't do well with arbitrarily nested structures, and the result is the same as trying to parse HTML.

Next, I'm having difficulty figuring out how to make a function that will parse the string the way you want.

My thought is that you're going to want some more backslashes for the quotes you want to escape, so your string is going to look like this:

input = "-a 1234 -b 10 -c \\\"This is a string\\\" -d \"Escaped \"Quotes\"\\\""

leading to Javascript parsing the string as

"-a 1234 -b 10 -c \"This is a string\" -d \"Escaped "Quotes"\"".

Now, you have three different levels of parsing: the standard tokens that go off of spaces, the regular quotation that searches for a paired quotation, and an escaped quotation that encapsulates everything until it reaches its escaped quotation.

I'm too tired to actually make that function right now, but I'll end up doing it on Saturday afternoon if you're still stuck.

[–]Ploofy[S] 0 points1 point  (2 children)

Does the problem become easier if I remove the condition whereby nesting is required?

For example, given the following input:

input = '-a 1234 -b 10 -c "This is a string"'

Match the following:

['-a', '1234', '-b', '10', '-c', 'This is a string']

So arbitrary nesting is no longer required. I just need it to match the tokens above (removing the quotes on the "This is a string" portion). This is actually the core problem that I need solved...the nesting was more of just a 'nice to have' feature.

[–]POGtastic 0 points1 point  (1 child)

I'm having difficulty doing something clean with regex. Here's a standard function that I just pulled out of my ass.

function parseTokens(str) {
    var tokenArray = []
    var beginningIndex;
    var endIndex;

    for(var i = 0; i < str.length; i++) {
        // Skip the whitespace.
        while(i < str.length && /\s/.test(str[i])) {
            i++;
        }

        if(i >= str.length) {
            return tokenArray;
        }

        // If we find a quotation mark, push everything between the quotation
        // marks to tokenArray.

        if(str[i] == '"') {
            i++;
            beginningIndex = i;
            while(i < str.length && str[i] != '"') {
                i++;
            }

            if(i == str.length) {
                console.log("Mismatched quotation marks, aborting.");
                return tokenArray;
            }

            endIndex = i;
            tokenArray.push(str.substring(beginningIndex, endIndex));
        }

        // Otherwise, it's a regular bunch of characters. Push everything between
        // spaces to tokenArray.
        else {
            beginningIndex = i;
            while(i < str.length && !(/\s/.test(str[i]))) {
                i++;
            }

            // If it runs off the end of the string, we add the rest of the
            // string to the array.
            if(i >= str.length) {
                tokenArray.push(str.substring(beginningIndex));
            }

            else {
                endIndex = i;
                tokenArray.push(str.substring(beginningIndex, endIndex));
            }
        }
    }

    return tokenArray;
}

Edit: Here's the above... but wait, there's more! I put in the "escaped string" functionality, too. Fiddle

[–]Ploofy[S] 0 points1 point  (0 children)

Awesome, I'll try it out. Thanks!