d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Need Critique On Small C Program
Add Reply New Topic New Poll
Member
Posts: 4,841
Joined: Jan 16 2008
Gold: 0.00
Sep 10 2013 12:27pm
I'm relatively new to programming and just did an exercise from the K&R book. The task is in the comment at the beginning of the file. I would like some critique on whether this is an acceptable solution and what could be improved.

image version for highlighted syntax:



copy/paste version:

Code
/* Write a program to "fold" long input lines into two or more shorter
* lines after the last non-blank character that occurs before the N-th column of input.
* Make sure your program does something intelligent with very long lines, and if
* there are no blanks or tabs before the specified column. */

/* case 1: found tab or blank at column <= N, subcases to consider:
*      1.1 found blank after last tab
*      1.2 tab after last blank, subcases:
*          1.2.1 tab before N
*          1.2.2 tab to N
*          1.2.3 tab exceeding N */

/* case 2: no blank or tab before at column <= N: break line at next blank/tab */

#include <stdio.h>
#define MAX 1000
#define N 20 // soft breakpoint
#define TAB 8

int foldline(char[], int, int, int);

int main() {
   char line[MAX + 1]; // +1 for '\0' termination

   while (foldline(line, MAX, N, TAB) > 0) {
       printf("%s", line);
   }
   return 0;
}

int foldline(char line[], int max, int bp, int tabsize) {
   /* folds a long input line and returns length of original line. bp: breakpoint */
   int chr, col, i, lastblank = -1, lasttab = -1, tmp, nextstop; // nextstop: next tab stop

   for (i = 0, col = 0; i < max && (chr = getchar()) != EOF && chr != '\n'; i++, col++) {
       line[i] = chr;
       nextstop = (col/tabsize)*tabsize + tabsize - 1;
       if ((chr == ' ' || chr == '\t') && col <= bp) { // found whitespace early enough  
           chr == ' '? (lastblank = col) : (lasttab = col);
           tmp = i; // save index for possible '\n' in line array
       }
       if (chr == '\t') col = nextstop;
       /* deal with case 1 */
       if (col >= bp && lastblank != lasttab) { // if lastblank == lasttab <=> found none
           lastblank > lasttab? (col = bp - lastblank) : (col -= nextstop);    
           line[tmp] = '\n';
           lastblank = -1;
           lasttab = -1;
       /* deal with case 2 */
       } else if (col > bp && (chr == ' ' || chr == '\t')) {
           line[i] = '\n';
           col = 0;
       }
   }
   if (chr == '\n') line[i++] = '\n';
   line[i] = '\0';
   return i;
}
Member
Posts: 237
Joined: Aug 6 2011
Gold: 6,026.00
Sep 10 2013 06:52pm
The code is functional but it feels very hard to read. From a compiler stand point writing one instruction per line won't result in a difference, so it wouldn't slow down your code, and make it all together more readable.

There's a lot of redundant if statements, checking for the same states over and over. The condition checks could be unified to avoid that.
Member
Posts: 4,841
Joined: Jan 16 2008
Gold: 0.00
Sep 11 2013 10:45am
edit: got a bug, brb

This post was edited by tt_toby on Sep 11 2013 10:47am
Member
Posts: 4,841
Joined: Jan 16 2008
Gold: 0.00
Sep 11 2013 11:21am
alright i coded a cleaner version. not worth posting. thx for reply.
Member
Posts: 4,841
Joined: Jan 16 2008
Gold: 0.00
Sep 13 2013 03:02pm
Okay here's another one, no need for another thread imo. All input is welcome. In my testing it seems to work well, but maybe I overlooked some case.
Since this a bit to read expect small donations for good answers. ;)

Code
/* Write a program to check a C program for rudimentary syntax errors like unmatched
* parentheses, brackets and braces. Don't forget about quotes, both single and double,
* escape sequences, and comments. (This program is hard if you do it in full generality.) */

#include <stdio.h>
#include <stdbool.h>
#define ON true
#define OFF false
#define MAX 5000 // max number of parens/braces/brackets in source file

int main(int numargs, char *argv[]) {
   /* check (), [] or {} depending on command line argument
    * example program call: krex1_24 ( < sourcefile.c */
   char input = *argv[1], open, close;
   
   if (input == '(' || input == ')') {
       open = '(';
       close = ')';
   } else if (input == '[' || input == ']') {
       open = '[';
       close = ']';  
   } else if (input == '{' || input == '}') {
       open = '{';
       close = '}';
   } else {
       printf("invalid input\n");
       return 0;
   }
 
   /* process input */
   char last = getchar(), c, olist[MAX], clist[MAX];
   int oindex = 0, cindex = 0, atline = 1, lastlast = '\0', i;
   bool coma = OFF, comb = OFF, dquote = OFF, squote = OFF, rightbefore = false;

   while (last != EOF && oindex < MAX && cindex < MAX) {
       if (last == '\n') {
           atline++;
       }
       c = getchar();

       /* check for comments or quotes */
       if (last == '/' && c == '*' && dquote == OFF && squote == OFF && comb == OFF) {
           coma = ON;
       } else if (coma == ON && last == '*' && c == '/') {
           coma = OFF;
       } else if (last == '/' && c == '/' && dquote == OFF && squote == OFF && coma == OFF) {
           comb = ON;
       } else if (comb == ON && c == '\n') {
           comb = OFF;
           if (last == open || last == close) {
               last = '\0'; // prevent counting ( for //comment(\n
           }
       } else if (c == '"' && dquote == OFF && squote == OFF && coma == OFF && comb == OFF) {
           dquote = ON;
           if (last == open || last == close) {
               rightbefore = true; // prevent not counting ( for ("quote"        
           }
       } else if (c == '"' && dquote == ON && (last != '\\' || lastlast == '\\')) {
           dquote = OFF;
           if (last == open || last == close) {
               last = '\0'; // prevent counting ( for "quote("
           }
       } else if (c == '\'' && dquote == OFF && squote == OFF && coma == OFF && comb == OFF) {
           squote = ON;
           if (last == open || last == close) {
               rightbefore = true; // prevent not counting ( for ('quote'        
           }
       } else if (c == '\'' && squote == ON && (last != '\\' || lastlast == '\\')) {
           squote = OFF;
           if (last == open || last == close) {
               last = '\0'; // prevent counting ( for 'quote('
           }
       }

       /* update paren/brace/bracket status, discard matching pairs */
       if ((coma == OFF && comb == OFF && dquote == OFF && squote == OFF) || rightbefore) {
           if (last == open) {
               olist[oindex] = atline;
               oindex++;
           } else if (last == close) {
               if (oindex == 0) {
                   clist[cindex] = atline;
                   cindex++;
               } else {
                   oindex--;
               }
           }
       }
       
       /* update temp vars */
       rightbefore = false;
       lastlast = last;
       last = c;
   }
   
   /* output */
   if (oindex == MAX || cindex == MAX) {
       printf("maximum number of parens/brackets/braces exceeded\n");
   } else if (oindex == 0 && cindex == 0) {
       printf("found no unmatched %c%c\n", open, close);
   } else {
       for (i = 0; i < cindex; i++) {
           printf("unmatched %c at line %d\n", close, clist[i]);
       }
       for (i = 0; i < oindex; i++) {
           printf("unmatched %c at line %d\n", open, olist[i]);
       }
   }
   return 0;
}
Member
Posts: 16,144
Joined: Mar 27 2008
Gold: 14,618.00
Sep 13 2013 05:57pm
i dont like "else if..."

i prefer to use the switch statement if possible. i guess you find out which part of your code i therefore would replace with this:
Code
switch (input) {
case '(':
case ')':
 open = '(';
 close = ')';
break;
case '[':
case ']':
 open = '[';
 close = ']';  
break;
case '{':
case '}':
 open = '{';
 close = '}';
break;
deault:
 printf("invalid input\n");
break;
}

note that i wrote no "break" by intention. so in the case '(', the stuff in case ')' will be executed ;)

edit: maybe you can take the "if" statements and write the result in a (or more) variable with a meaningful name, then you can switch through this variable/cases?
maybe you should use functions too?

This post was edited by Richter on Sep 13 2013 06:09pm
Member
Posts: 237
Joined: Aug 6 2011
Gold: 6,026.00
Sep 13 2013 07:02pm
I'd rather skip MAX and the 2 arrays to do something that can be done with a single array dynamically allocated

This is my take on that:

Quote
/* Write a program to check a C program for rudimentary syntax errors like unmatched
* parentheses, brackets and braces. Don't forget about quotes, both single and double,
* escape sequences, and comments. (This program is hard if you do it in full generality.) */

#include <stdio.h>
#include <stdlib.h>
#include <iostream>
//#include <stdbool.h>
#define ON true
#define OFF false
 
unsigned int *list=0, nlist, listsize=0;
char open, close;
void PushList(unsigned int in, char last);


int main(int numargs, char *argv[]) {
  /* check (), [] or {} depending on command line argument
    * example program call: krex1_24 ( < sourcefile.c */
  char input = *argv[1];
 
  if (input == '(' || input == ')') {
      open = '(';
      close = ')';
  } else if (input == '[' || input == ']') {
      open = '[';
      close = ']'; 
  } else if (input == '{' || input == '}') {
      open = '{';
      close = '}';
  } else {
      printf("invalid input\n");
      return 0;
  }

  /* process input */
  char last = getchar(), c;

  int oindex = 0, cindex = 0, lastlast = '\0';
  unsigned int atline = 1, i;
  bool coma = OFF, comb = OFF, dquote = OFF, squote = OFF, rightbefore = false;

  while (last != EOF) {
      if (last == '\n') {
          atline++;
      }
      c = getchar();

      /* check for comments or quotes */
      if (last == '/' && c == '*' && dquote == OFF && squote == OFF && comb == OFF) {
          coma = ON;
      } else if (coma == ON && last == '*' && c == '/') {
          coma = OFF;
      } else if (last == '/' && c == '/' && dquote == OFF && squote == OFF && coma == OFF) {
          comb = ON;
      } else if (comb == ON && c == '\n') {
          comb = OFF;
          if (last == open || last == close) {
              last = '\0'; // prevent counting ( for //comment(\n
          }
      } else if (c == '"' && dquote == OFF && squote == OFF && coma == OFF && comb == OFF) {
          dquote = ON;
          if (last == open || last == close) {
              rightbefore = true; // prevent not counting ( for ("quote"       
          }
      } else if (c == '"' && dquote == ON && (last != '\\' || lastlast == '\\')) {
          dquote = OFF;
          if (last == open || last == close) {
              last = '\0'; // prevent counting ( for "quote("
          }
      } else if (c == '\'' && dquote == OFF && squote == OFF && coma == OFF && comb == OFF) {
          squote = ON;
          if (last == open || last == close) {
              rightbefore = true; // prevent not counting ( for ('quote'       
          }
      } else if (c == '\'' && squote == ON && (last != '\\' || lastlast == '\\')) {
          squote = OFF;
          if (last == open || last == close) {
              last = '\0'; // prevent counting ( for 'quote('
          }
      }

      /* update paren/brace/bracket status, discard matching pairs */
      if ((coma == OFF && comb == OFF && dquote == OFF && squote == OFF) || rightbefore) PushList(atline, last);


      }
     
      /* update temp vars */
      rightbefore = false;
      lastlast = last;
      last = c;
  }
 
  /* output */
if (!nlist)
{
      printf("found no unmatched %c%c\n", open, close);
}
else
{
      for (i = 0; i < nlist; i++)
    {
    if(list[i] & 0x80000000) printf("unmatched %c at line %d\n", open, list[i] & 0x0FFFFFFF);
    else if(list[i] & 0x40000000) printf("unmatched %c at line %d\n", close, list[i] & 0x0FFFFFFF);
    }
  }

  free(list);


  return 0;
}

void PushList(unsigned int in, char last)
{
if(nlist>=listsize) //increment buffersize by steps of 100
{
  listsize+=100;
  list = (unsigned int*)realloc(list, sizeof(int)*listsize);
}

if(last==open) list[nlist++] = in | 0x8000000; //flag last bit to mark as opened
else if(last==close)
{
  if(nlist)
  {
  if(list[nlist-1] && 0x80000000) nlist--; //closed tags
  }
  else list[nlist++] = in | 0x40000000; //flag second to last bit to mark as closed
}
}


jsp messes up the tabs ='( dunno how to keep my code alignement T__T

Lastly, your code only processes for either parenthesis, brackets or braces at any given time. If this code is intented to parse source files as I expect it is, then priority rules can't be respected, i.e. braces>parenthesis&brackets. Your code won't react if braces are closed before nested parenthesis are. This would require some more work however. As a quick half fix, you could run the while loop and getchar() from your main then call the parsing function with for {}, () and [] for each iteration.

This post was edited by flyinggoat on Sep 13 2013 07:11pm
Member
Posts: 4,841
Joined: Jan 16 2008
Gold: 0.00
Sep 14 2013 01:44am
Ty. I'm not at the point of dynmaic allocation yet because this exercise is from the first chapter of the book. Once I have I'll go back and have a look at it.
Yeah, jsp messes up tabs, best thing to do is write the code in an editor where tabs are set to be replaced by spaces automatically and then paste it into jsp.

About priority rules: Can my program be adjusted to respect them or do I pretty much have to start over?
Member
Posts: 237
Joined: Aug 6 2011
Gold: 6,026.00
Sep 14 2013 05:51am
Quote (tt_toby @ Sep 14 2013 09:44am)
Ty. I'm not at the point of dynmaic allocation yet because this exercise is from the first chapter of the book. Once I have I'll go back  and have a look at it.
Yeah, jsp messes up tabs, best thing to do is write the code in an editor where tabs are set to be replaced by spaces automatically and then paste it into jsp.

About priority rules: Can my program be adjusted to respect them or do I pretty much have to start over?



thx for the tip =P

There actually wouldn't be much to do to adjust your code for priority rules. The first part of your loop is fine. When it's about to push to the list, you first need to verify for existing delimiters in the priority order: braces > parenthesis > brackets. Opening delimiters can be accounted for as usual. However it's preferable to maintain them all within the same list for the priority check. This is why I pointed you toward a single list array to begin with. You can save the nature of the delimiter the same way open/closed status is saved: 0x08000000 for braces, 0x04000000 for parenthesis, 0x02000000 for brackets.

Once you hit a closing delimiter things change a little. First of all you now have to keep track of properly closed couples. Dynamic allocation comes in play as you may now hit your max much sooner, since you're tracking 3 different delimiters and saving matched ones as well. You then simply stick to the priority rules:

1) With a closed brace, backtrack the list from the top until you hit an opened brace. If you hit it, mark the opened brace as matched, flagging it with 0x20000000. Don't modify any parenthesis or brackets you ran into backtraking, and make sure you don't save the closed brace in the list. If you didn't hit anything, though, save the closed brace and move on.
2) With parenthesis, backtrack again. Ignore all brackets. If you hit any type of brace (opened, closed or matched couple), this parenthesis can't be matched, move on. If you hit an opened parenthesis before any braces, mark the opened parenthesis as matches with 0x20000000. Same as before, if you hit a match, don't save the closed parenthesis, otherwise account for it.
3) With brackets, if the previous list index isn't a bracket, it can't be matched, save it. If the last item is an opened bracket, flag it, drop the closed one.

At the output, only mention elements in the list that have bit 0x20000000 unflagged. For encoding purposes, since you use the last 7 bits of the list integers for extra data, make sure atline <= 0x01FFFFFF, which is the maximum amount of lines this code can process with a 32 bits integer list.

Note that this code only renders the simpliest version of the rule: braces > parenthesis > brackets. In most languages, () pairs can be nested within []. In javascript you can see {} nested within (). If this code is intended to parse languages like php, you'll have to pay attention to <> as well. If you intent to take care of all of these and implement full nesting rules, you'll have to use your current code to maintain a full record of delimiters first, then post process, as you need to first identify all matched couples to then verify if each individual couple is nested properly. That'll be fun alright =P

Go Back To Programming & Development Topic List
Add Reply New Topic New Poll