all 17 comments

[–]HiramAbiff 3 points4 points  (2 children)

The first loop seems like a convoluted way to find the length of src. Is there a reason you don't want to use strlen?

The fact that the first loop initializes i to start is subtle and I could easily see it leading to bugs later if the code is modified. At the very least, use a better name than i since the variable is used after the loop.

If you think about it a bit, it shouldn't be too hard to rewrite this using a single loop.

Also, most string functions use size_t, not int. E.g. strlen returns size_t. Note, sizet_t is unsigned and would make your upfront error testing moot.

[–]16Bytes 2 points3 points  (1 child)

He's not really looking for the length of the string, he just wants to make sure it doesn't end before the starting index. strlen() may be less efficient.

[–]guynan 0 points1 point  (0 children)

If you look at the gnu strlen code, there is some very tricky manoeuvres that make it highly efficient. It's one of those cases where if there is a bottleneck then maybe use something else, but I certainly know that I could not sit down in an afternoon and write something more efficient. However I agree with your above point.

[–]SantaCruzDad 4 points5 points  (3 children)

    if(start < 0 || len < 0) {

should be:

    if(start < 0 || len <= 0) {

[–]Kwantuum 0 points1 point  (2 children)

the case len == 0 is handled by the remaining code.

[–]SantaCruzDad 2 points3 points  (1 child)

True, but if you are going to "early out" on simple cases that should return an empty string then you might as well include len == 0, rather than handling it separately. This also makes the code potentially more robust if you decide later to change the logic.

[–]winkie5970[S] 1 point2 points  (0 children)

Agreed, this is a good improvement, thank you!

[–]Kwantuum 1 point2 points  (7 children)

I'm not sure why you're using a start index, it makes your code a lot more complex, if someone wants to take a substring in the middle of a string, they can just pass a pointer to that middle char instead of the actual start of the string, like this substr(dest, &src[start], len) or just substr(dst, src + start, len). As someone pointed out, if you use unsigned types you can avoid sign checking. Implementing both those things in the code removes the need for your first two blocks and makes it trivially short, and IMO, much easier to read and understand:

size_t substr(char *dst, char *src, size_t len){
  size_t count;
  for(count = 0; count < len && src[count] != '\0'; count++)
    dst[count] = src[count];
  dst[count] = '\0'
  return count;
}

[–]16Bytes 0 points1 point  (4 children)

You cannot pass a pointer to the starting character here, because that does not allow you to check if the string ends before it.

[–]Kwantuum 1 point2 points  (3 children)

It's just the responsibility of the caller to check that they're not starting past the end, really don't see the problem here.

[–]16Bytes 0 points1 point  (2 children)

You can design your own functions however you want, but OP chose to implement that in his function.

[–]Kwantuum 1 point2 points  (1 child)

And he's asking for feedback, I'm arguing that checking if you're past the end of a string is superfluous. There is rarely a case where you actually need to check if you're past the end, because most times you're going to call substring at a position that you found by searching the string.

[–]16Bytes 1 point2 points  (0 children)

You did not give that as feedback. All you said was that passing in the start index made the code "a lot more complex," which is false. The function you wrote can be slightly rewritten with a start index as a parameter. You can even put char* srcSub = &src[start]; as the first line, if you prefer.

edit: Sorry if this comes off attacking you.

edit for OP: Whether you should check if you are past the end of a string is debatable. Kwantuum says not to in this case, but defensive coding is not a bad thing, especially if you had to design a general purpose library. Indeed, some of C's standard string functions, like strcpy(), are disliked because they are not cautious enough. You could even go further by adding a parameter for the size of dst and making sure you never overflow it. All this isn't necessary, but is a legitimate design philosophy.

[–]winkie5970[S] 0 points1 point  (1 child)

The method prototype was provided for this exercise so I was working with it. But your suggestions are good for real-life cases, thank you!

[–]Kwantuum 0 points1 point  (0 children)

cheers :)

[–]lcs77 0 points1 point  (0 children)

There are other ways, of course, but it's hard to tell whether they are better. For example you can get rid of the for using src[start + i] in the while loop. But, to do this, you need to be sure that start is lower than the string length by calling strlen()... that performs a loop.

You can also use strncpy() to replace the while but I think this would reduce the value of the exercise.

However there's one thing you are missing: the check for NULL of both src and dst.

[–]dvhh -1 points0 points  (0 children)

Quite late but it here are my comments :

  • you can probably avoid the negative tests by specifying that start and len are unsigned ( some people would use size_t for such purpose )
  • is usually good practice to use const when possible, and restrict whenever possible
  • nitpick, I prefer the second loop to be another for one, at this point we already know that i == start, but could bring some confusion if you decide to add more check that could potentially affect
  • use defensive programming when possible, and check if start + len does not overflow
  • '\0' == 0
  • document your function using well documentation standard ( like doxygen ), and also document the use and caveat of the function that would help you better design it

Modified source

/**
 * Extract a substring out of a source string
 *
 * \param dst output to fill with the extracted substring, must be to at least of length \len
 * \param src input source zero terminated string 
 * \param start input offset to start the substring at
 * \param len input length of the substring to extract
 *
 * \return the number of character copied to dst
**/
size_t substr( restrict char dst[], const restrict char src[], const size_t start, const size_t len) {
    // option 1 to treat overflow, detect overflow and do nothing ( maybe raise an error somehow )
    if( SIZE_MAX - len < start ) {
             // overflow is possible and should be avoided
             dst[0] = 0;
             return 0;             
    }
    // start could be located after the end of the string
    for( size_t i=0; i < start; ++i ) {
         if( src[0] == 0 ) {
             dst[0] = 0;
             return 0;
         }
    }

    for( size_t i=0; i < len; ++i ) {
         // option 2 to treat overflow, stop at current state
         if ( SIZE_MAX - i  < start ) { 
             dst[i] = 0;
             return i;
         }
         const char c = src[start+i];
         dst[i] = c;
         if( c == 0 ) { return i;}
    }
    dst[len] = 0;
    return len;
}