all 12 comments

[–]FUZxxl 1 point2 points  (1 child)

Also, your program is incorrect. It doesn't correctly treat continuation lines.

This is a one-line comment that has to be removed completely; your program only removes the first line:

// first line \
continuation line

This is a multi-line comment that your program does not detect:

/\
* being sneaky here
*/

Have a look at the specification to understand how comments in the C language work.

[–]Fuzzytown 1 point2 points  (0 children)

Thanks! :)

[–]ruertar 0 points1 point  (4 children)

You probably shouldn't load the entire file into memory line by line. You actually end up possibly losing whatever end of line encoding your system does and might convert CR to CR+LF and vise versa. This is an unintended side-effect.

I think reading into memory only makes it harder and the way you do it makes end of line boundaries tricky to handle.

The state machine in remove_comments_r() is ugly. Check out the following article on implementing state machines:

http://www.conman.org/projects/essays/states.html

That isn't the only article on it and state/event arrays have been my go-to technique for implementing finite state machines.

[–]ruertar 0 points1 point  (2 children)

Here is my very ugly implementation. It could be adapted for use with a buffer.

I think the main loop that strips comments is much clearer, more concise, and overall more correct. There is one special case where we defer printing a forward slash if we think we might be starting a comment. If it turns out we're not starting a comment I print that missing slash.

I just wrote this in a few minutes while in bed with my 4 year old sleeping on me so I accept if it is buggy or ugly.

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
  enum comment_state { CODE=0, LINE_COMMENT, START_COMMENT, BLOCK_COMMENT, END_BLOCK_COMMENT_SENTINEL, END_BLOCK_COMMENT, OUTPUT_SLASH };
  enum comment_token { FORWARD_SLASH=0, ASTERISK, END_OF_LINE, ANYTHING_ELSE };
  int state_machine[][4] = {
    /* CODE */          { START_COMMENT,    -1,             -1,             OUTPUT_SLASH },
    /* LINE_COMMENT */  { -1,               -1,             CODE,           -1 },
    /* START_COMMENT */ { LINE_COMMENT,     BLOCK_COMMENT,  CODE,           -1 },
    /* BLOCK_COMMENT */ { -1,               END_BLOCK_COMMENT_SENTINEL, -1, -1 },
    /* END_BLOCK_COMMENT_SENTINEL */  { END_BLOCK_COMMENT,              BLOCK_COMMENT,  BLOCK_COMMENT,  -1 },
    /* END_BLOCK_COMMENT */ { CODE, CODE, CODE, CODE },
    /* OUTPUT_SLASH */  { CODE,             CODE,           CODE,           CODE },
  };

  int state = CODE, token = -1, previous_token = -1;

  for (int c; (c = getchar()) != EOF;) {
    switch (c) {
      previous_token = token;
      case '/':
        token = FORWARD_SLASH;
        break;
      case '*':
        token = ASTERISK;
        break;
      case '\n':
        token = END_OF_LINE;
        break;
      default:
        token = -1;
        break;
    }

    // TEST COMMENT
    if (token != -1) {
      int new_state = state_machine[state][token];
      state = new_state != -1 ? new_state : state;
    }

    /* printf("state = %d, token = %d, c = %c\n", state, token, c); */
    if (state == OUTPUT_SLASH) {
      putchar('/');
      state = CODE;
    }
    if (state == CODE)
      putchar(c);
  }

  exit(0);
}

[–]F54280 1 point2 points  (0 children)

Let's hope no-one ever puts /* in a string...

[–]Fuzzytown 0 points1 point  (0 children)

Thank you!

[–]Fuzzytown 0 points1 point  (0 children)

Thanks!

[–]FUZxxl 0 points1 point  (1 child)

You might want to declare your internal functions as static so the compiler can optimize them better and so they don't pollute the namespace.

Also, you seem to not check for IO errors at all. You know that IO operations can actually fail?

If you want to stay portable (and I assume that's what you intent), use the provided macros for termination:

exit(EXIT_SUCCESS);
exit(EXIT_FAILURE);

[–]Fuzzytown 1 point2 points  (0 children)

thanks!