First a classic bug: char cur_char;
... (cur_char = getchar()) != EOF
. The variable must be int
not char
or you can't compare it with EOF
. Yeah it's really stupid that getchar
gets an int, not a char, but that's how it is.
I know this is just a simple program and performance, maintainability etc isn't import. If it was a real production quality program though, it would preferably be written differently. For the sake of learning, lets pretend it is:
Then overall, you could be checking against a look-up table rather than by using a complex series of if-else if. They are kind of hard to read, you get the various different behavior upon finding certain comment characters scattered over various nested if-else if. Also the compiler is less likely to translate the if-else if to some table look-up, more likely this would generate a bunch of branches which are very bad for loop performance.
A look-up table followed by a centralized "take action depending on result" code like for example a switch
would improve execution speed and readability/maintainability both.
The simplest form of such a table lookup would be to strchr("\"\\\n/*\'", input)
then take different actions based on if strchr
returned NULL or not.
So rather than defining all comment characters with macros, you'd rather have a typedef enum { DOUBLE_QUOTE_CHAR, BACK_SLASH, ... NO_COMMENT } comment_t;
etc corresponding to the index passed to the string literal used by strchr
. Then you can do:
const char comment_characters[] = "\"\\\n/*\'"; const char* comment_found = strchr(comment_characters, input) comment_t comm; if(comment_found) comm = (comment_t) (comment_found - comment_characters); // pointer diff arithmetic else // strchr returned NULL comm = NO_COMMENT; switch(comm) { case DOUBLE_QUOTE_CHAR: /* do double quote stuff */ break; case BACK_SLASH: /* do backslash stuff */ break; default: /* NO_COMMENT etc, do nothing */ }
You can even take readability/maintainability a bit further almost to extremes by doing this instead:
const char comment_characters [N] = // where N is some size "large enough" { [DOUBLE_QUOTE_CHAR] = '\"', [BACK_SLASH] = '\\', [N-1] = '\0', // strictly speaking not necessary but being explicit is nice };
This guarantees integrity between the string and the enum indices.