I made simple scripting language (interpreter) in C and now, I want to make my code better.
Here is my code:
#include <stdio.h>#include <stdlib.h>#include <string.h>#define TRUE (1 == 1)#define FALSE (1 == 0)#define FUNCTION_NAME_COUNT 2#define FUNCTION_ARG_FUNCTION_NAME_COUNT 1#define MAX_FUNCTION_NAME_LENGTH 247#define MAX_FUNCTION_ARG_LENGTH 127#define MAX_STRING_LENGTH 2048char *itoa(int value, char *str, int base){ char *output; char *ptr; char *low; if (base < 2 | base > 36) { *str = '\0'; return str; } output = ptr = str; if (value < 0 & base == 10) { *ptr++ = '-'; } low = ptr; do { *ptr++ = "zyxwvutsrqponmlkjihgfedcba9876543210123456789abcdefghijklmnopqrstuvwxyz"[35 + value % base]; value /= base; } while (value); *ptr-- = '\0'; while (low < ptr) { char tmp = *low; *low++ = *ptr; *ptr-- = tmp; } return output;}char *strr(char *str, char *output, int x, int y){ int i; for (i = x + 1; i < y; i++) { output[i - (x + 1)] = str[i]; } return output;}char *equstr(char *str, char *output){ int i = 0; while (str[i]) { output[i] = str[i]; i++; } return output;}void clear_str(char *str){ int i = 0; while (str[i]) { str[i] = '\0'; i++; }}int is_string(char *str){ if (str[0] == '"' & str[strlen(str) - 1] == '"') { return TRUE; } else { return FALSE; }}int is_sint(char *str){ int i = 0; while (str[i]) { if (str[i] >= '0' & str[i] <= '9') { return TRUE; } else { return FALSE; break; } }}void lang_prints(char *s){ char output[MAX_STRING_LENGTH] = "\0"; if (is_string(s)) { strr(s, output, 0, strlen(s) - 1); printf("%s\n", output); }}void lang_printi(int i){ printf("%d\n", i);}char *lang_arg_lens(char *s){ char *output = malloc(MAX_STRING_LENGTH); if (is_string(s)) { itoa(strlen(s) - 2, output, 10); return output; }}int main(){ int i, j; char function_names[FUNCTION_NAME_COUNT][MAX_FUNCTION_NAME_LENGTH] = { "prints", "printi" }; char function_arg_function_names[FUNCTION_ARG_FUNCTION_NAME_COUNT][MAX_FUNCTION_NAME_LENGTH] = { "lens" }; char input[MAX_STRING_LENGTH]; char function_name[MAX_FUNCTION_NAME_LENGTH] = "\0"; char function_arg[MAX_FUNCTION_ARG_LENGTH] = "\0"; char tmp_function_arg[MAX_FUNCTION_ARG_LENGTH] = "\0"; char function_arg_function_name[MAX_FUNCTION_NAME_LENGTH] = "\0"; char old_function_arg[MAX_FUNCTION_ARG_LENGTH] = "\0"; printf("> "); scanf("%s", input); for (i = 0; i < FUNCTION_NAME_COUNT; i++) { clear_str(function_name); strr(input, function_name, -1, strlen(function_names[i])); strr(input, function_arg, strlen(function_names[i]), strlen(input) - 1); if (strcmp(function_name, function_names[i]) == FALSE) { if (input[strlen(function_name)] == '(' & input[strlen(function_name) + strlen(function_arg) + 1] == ')') { for (j = 0; j < FUNCTION_ARG_FUNCTION_NAME_COUNT; j++) { if (strcmp(old_function_arg, "\0") == FALSE) { equstr(function_arg, old_function_arg); } clear_str(function_arg); clear_str(function_arg_function_name); strr(function_arg, function_arg_function_name, -1, strlen(function_arg_function_names[j])); strr(old_function_arg, function_arg, strlen(function_arg_function_names[j]), strlen(old_function_arg) - 1); strr(old_function_arg, function_arg_function_name, -1, strlen(function_arg_function_names[j])); if (strcmp(function_arg_function_name, function_arg_function_names[j]) == FALSE) { if (old_function_arg[strlen(function_arg_function_name)] == '(' & old_function_arg[strlen(function_arg_function_name) + strlen(function_arg) + 1] == ')') { switch (j) { case 0: equstr(function_arg, tmp_function_arg); clear_str(function_arg); equstr(lang_arg_lens(tmp_function_arg), function_arg); break; } } } else { clear_str(function_arg); equstr(old_function_arg, function_arg); } } switch (i) { case 0: lang_prints(function_arg); break; case 1: if (is_sint(function_arg)) { lang_printi(atoi(function_arg)); } break; } } } }}Here is theGitHub repository of my scripting language.
- 2\$\begingroup\$Curious, why
#define TRUE (1 == 1)instead oftruefrom<stdbool.h>?\$\endgroup\$chux– chux2021-06-23 18:30:31 +00:00CommentedJun 23, 2021 at 18:30 - 1\$\begingroup\$@chux-ReinstateMonica Well, the latter is tied to a true boolean type. Whether that is good (proper type), bad (ABI), or doesn't matter...\$\endgroup\$Deduplicator– Deduplicator2021-06-23 18:52:00 +00:00CommentedJun 23, 2021 at 18:52
- 8\$\begingroup\$A sample of the language being parsed would greatly improve this question.\$\endgroup\$Edward– Edward2021-06-23 18:54:18 +00:00CommentedJun 23, 2021 at 18:54
- \$\begingroup\$It would be helpful if you mention which C standard is targeted by your code. This matters stuff like stdbool.h.\$\endgroup\$Limina102– Limina1022021-06-24 12:21:12 +00:00CommentedJun 24, 2021 at 12:21
4 Answers4
Potential bug
str[strlen(str) - 1] isundefined behavior whenstr[0] == 0. Instead use alogical AND.
// if (str[0] == '"' & str[strlen(str) - 1] == '"') if (str[0] == '"' && str[strlen(str) - 1] == '"')Noteis_string("\"") returns true.
Potential bug
int is_sint(char *str) has no return value whenstr[0] == 0.
Code is strange as it has awhile loop but never iterates a second time.
is_sint("-123") returns false.
Design asymmetry
if (value < 0 & base == 10) makes for base 10 aspecial case. Suggestif (value < 0) instead and then make a companionuitoa(unsigned value, char *str, int base) for a sign-less version.
Considerconst andrestrict
Bothconst (source does not changed) andrestrict (referenced data only affected via this pointer) can make for faster code and wider application.
Example:
// char *equstr(char *str, char *output)char *equstr(const char * restrict str, char * restrict output)Usestdbool.h
Rather than make upTRUE/FASLE, usebool, true, false.
Tighten up array initializations
\0 not needed.
// char output[MAX_STRING_LENGTH] = "\0";char output[MAX_STRING_LENGTH] = "";Avoid buffer overflow
scanf("%s", input); is worse thangets(). Use awidth or researchfgets().
Use size
char *itoa(int value, char *str, int base) belongs to the era when memory was expensive. Make more robust and safe string functions and pass in asize to avoid buffer overrun.
I'd re-order with destination parameters on the left and use another name asitoa() is likely to collide with other code bases.
// char *itoa(int value, char *str, int base)char *my_itoa(size_t str_size, char *str, int value, int base)Pedantic: Large string support
strr(char *str, char *output, int x, int y) relies on the length of a string not exceedingINT_MAX. The length of a string may approach the potentially much largerSIZE_MAX.
Does not apply with thismain(), but consider code re-use.
| vs.||
if (base < 2 || base > 36) is more idiomatic thanif (base < 2 | base > 36), yet makes no functional difference here.
Perhaps a shorter 0-9a-z string?
Include the usual suspects.
// Enough room INT_MIN as a binary string.#define MY_ITOA_BUF_SIZE (sizeof(int) * CHAR_BIT + 2)#define MY_ITOA_BASE_MIN 2#define MY_ITOA_BASE_MAX 36// Negative values result in a string beginning with a '-' for all bases//char* my_itoa(size_t n, char *s, int i, int base) { char buf[MY_ITOA_BUF_SIZE]; char *p = &buf[sizeof buf - 1]; assert(base >= MY_ITOA_BASE_MIN && base <= MY_ITOA_BASE_MAX); *p = '\0'; int value = i < 0 ? i : -i; // negative absolute value do { div_t qr = div(value, base); *(--p) = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[-qr.rem]; value = qr.quot; } while (value); if (i < 0) { *(--p) = '-'; } size_t n_used = (size_t) (&buf[sizeof buf] - p); // If not enough room if (n_used > n) { if (n > 0) { *s = '\0'; } return NULL; } return memcpy(s, p, n_used);}- \$\begingroup\$Perhaps you could mention
sscanfandsscanf_s(C11 and up) explicitly in your point aboutscanf?\$\endgroup\$jaskij– jaskij2021-06-24 16:18:10 +00:00CommentedJun 24, 2021 at 16:18
Boolean type
If you really want / need a boolean type, use<stdbool.h>. If you don't, just use anint as much prior code enshrines.
But for the love of all, don't define your ownTRUE. Remember that only null and zero are falsey, all else being truthy. Defining a constant invites comparison against it, which violates this convention imbued into the language itself.
DefiningFALSE while also worrying at least won't lead to compiling code which is logically wrong. But that is all it has going for itself.
Boolean logic
If you have a boolean value to return, return it directly, don't branch on it and return literals. Unless you are paid by LoC, it's a pointless obfuscation.
Unless you have a good reason, use boolean operators for boolean logic. Avoiding the short-circuiting is rarely important enough to use bitwise operators instead. And if you use it, remember it doesn't short-curcuit.
Dead code
Eliminate dead code to avoid confusing or irritating readers.
Afterbreak,return, and the like, additional code in the block without intervening jump-label (in general,goto is to be avoided, as you do) orcase-label is unreachable.
Initialize on declaration
Unless you want to run under strict C89, (just about) all compilers support mixing declarations and statements, and if you ask for any optimization, the result is generally the same.Enhancing readability and just about eliminating the chance for a whole class of bugs for no cost is a bargain, right?
See "Is declaration of variables expensive?".
Help finding bugs
Ifbase toitoa() is out-of-range, that is a programming-error which should be found and fixed in development, never to be seen in release-mode.assert() is designed for that.
If you really want, you might kill the program in release too, trading efficiency for defense in depth, but don't ever silently swallow the error. Doing so makes debugging a nightmare.
Beconst-correct
Doing so enables the compiler to help you find bugs, and makes functions more self-documenting.
Userestrict where appropriate
Doing so allows the compiler to assume memory is not aliased, eliminating potential dependencies and increasing efficiency, in addition to making functions more self-documenting.
Line-length
Long lines are hard to read, especially if they lead to wrapping by the editor or horizontal scrolling.
172 is really a bit much, below 80 would be best.
String-literals are easily wrapped, as consecutive ones are merged by the compiler.
What is a string
A string is a sequence of non-nul elements including and terminated by the first nul element.str@@-functions are expected to handle strings, though there are infamous exceptions.
Single source of truth
The outer dimension of an array is sized according to the initializer if omitted. And you can get it back using(sizeof *array / sizeof array).
Unless there is a good reason to do otherwise, instead of an array of arrays of fixed length for storing string-literals, consider an array of pointers. It might be more efficient, and eliminates the forced fixed length.
switch and single label
An if-statements seems much more appropriate, as it is simpler.
Appropriate types
size_t is the designated type for object-sizes and array-indices. Unless you have a compelling reason, use it.
String length
Don't blindly assume strings are a minimum number of characters long. Empty strings exist, and out-of-bounds access is generallyUndefined Behavior.
Additional comments for specific code
Why should
itoa()only supply the minus-sign if the base is 10?
That doesn't seem to make any sense.strr()is just bad.- It looks more like a
mem-function, as it doesn't treat the input as a string, nor outputs a string. rforreverseis very cryptic. Just spell it out.- Copying elements x to y both exclusive is a very surprising contract. If you don't use a single length-parameter as expected (the caller can adjust the source-pointer appropriately), I would expect the first to be inclusive.
- It looks more like a
equstr()is also an acquired taste.- It is not quite a string-function as it doesn't terminate the target.
- It really threw me for a loop that it modifies one argument, instead of comparing them, with the semantics of
!strcmp(a, b).
clear_str()- nulling out all of a strings elements is rarely needed. Generally, just truncating it by setting the first element to zero is sufficient.
is_string()Are you sure anything starting with and ending with
"is a string according to your grammar?- What about embedded
"? - What about the length one string
"\""? - What about escape-sequences?
- What about embedded
is_sint()- Ramp up the warning level. Your compiler should warn about the path without
return-statement. - Are you sure only the first character should be tested?
- Are you sure a signed integer must not be negative?
When fixing that, don't screw up the empty string.
- Ramp up the warning level. Your compiler should warn about the path without
lang_arg_lens()- Where is the dynamic memory allocated freed again?
MAX_STRING_LENGTHseems wasteful.
is_string
int is_string(char *str){ return str[0] == '"' && str[strlen(str) - 1] == '"';}Instead of returningTRUE orFALSE (as mentioned in another answer, usetrue andfalse), simply return the expression as it resolves to a boolean value anyway. It makes your code shorter and easier to scan through.
is_sint
int is_sint(char *str){ return str[0] >= '0' && str[0] <= '9';}Since thewhile loop doesn't iterate a second time, andi is never incremented, so we only need to evaluate the expression once, and not in the loop. Uses the same logic as above.
Include<stdbool.h>:
#define TRUE (1 == 1)#define FALSE (1 == 0)No comments. Never thought I'd find things like these.
In C99 and above,#include <stdbool.h> forbool,true, andfalse. In C23 and above,<stdbool.h> is deprecated andbool,true, andfalse are keywords.
Reserved identifier:
char *strr(char *str, char *output, int x, int y)Names beginning withstr* are reserved for the future by<string.h>. So technically, your code invokes undefined behavior.
And what even doesstrr mean? Reverse? It is not clear.
Use standard library functions:
void clear_str(char *str){ int i = 0; while (str[i]) { str[i] = '\0'; i++; }}This is simplymemset(str, '\0', strlen(str));
Avoid re-inventing the wheel:
char *equstr(char *str, char *output){ int i = 0; while (str[i]) { output[i] = str[i]; i++; } return output;}This one seems to be a simple copy.memcpy()?strcpy()?memccpy()?strncpy()?strlcpy()?strscpy()?stpcpy()?stpncpy()?C does not need a new copy function.
int is_string(char *str){ if (str[0] == '"' & str[strlen(str) - 1] == '"') { return TRUE; } else { return FALSE; }}This can be simplified to:
int is_string(char *str){ return str[0] == '"' & str[strlen(str) - 1] == '"';}Though the return type should bebool, and the argument should beconst-qualified. It would be better to make astruct str containing achar *s and asize_t len. That would make all these operations O(1); no need to iterate through the whole string looking for the null-terminator.
if (str[i] >= '0' & str[i] <= '9')This isisdigit((unsigned char)str[i]) from `<ctype.h>
Missingreturn statement:
char *lang_arg_lens(char *s){ char *output = malloc(MAX_STRING_LENGTH); if (is_string(s)) { itoa(strlen(s) - 2, output, 10); return output; }}§6.9.1 ¶12 of N1256 (C99) states:
If the
}that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
How did the compiler not complain about this? I believe you did not enable warnings.
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.


