| C Programming String manipulation | Further math |
Astring in C is merely an array of characters. The length of a string is determined by a terminating null character:'\0'. So, a string with the contents, say,"abc" has four characters:'a','b','c', and the terminating null ('\0') character.
The terminating null character has the value zero.
In C, string constants (literals) are surrounded by double quotes ("), e.g."Hello world!" and are compiled to an array of the specifiedchar values with an additional null terminating character (0-valued) code to mark the end of the string. The type of a string constant ischar [].
String literals may not directly in the source code contain embedded newlines or other control characters, or some other characters of special meaning in string.
To include such characters in a string, the backslash escapes may be used, like this:
| Escape | Meaning |
|---|---|
| \\ | Literal backslash |
| \" | Double quote |
| \' | Single quote |
| \n | Newline (line feed) |
| \r | Carriage return |
| \b | Backspace |
| \t | Horizontal tab |
| \f | Form feed |
| \a | Alert (bell) |
| \v | Vertical tab |
| \? | Question mark (used to escapetrigraphs) |
| \nnn | Character with octal valuennn |
| \xhh | Character with hexadecimal valuehh |
C supports wide character strings, defined as arrays of the typewchar_t, 16-bit (at least) values. They are written with an L before the string like this
This feature allows strings where more than 256 different possible characters are needed (although also variable lengthchar strings can be used). They end with a zero-valuedwchar_t. These strings are not supported by the<string.h> functions. Instead they have their own functions, declared in<wchar.h>.
What character encoding thechar andwchar_t represent is not specified by the C standard, except that the value 0x00 and 0x0000 specify the end of the string and not a character. It is the input and output code which are directly affected by the character encoding. Other code should not be too affected. The editor should also be able to handle the encoding if strings shall be able to be written in the source code.
There are three major types of encodings:
<string.h> standard headerBecause programmers find raw strings cumbersome to deal with, they wrote the code in the<string.h> library. It represents not a concerted design effort but rather the accretion of contributions made by various authors over a span of years.
First, three types of functions exist in the string library:
mem functions manipulate sequences of arbitrary characters without regard to the null character;str functions manipulate null-terminated sequences of characters;strn functions manipulate sequences of non-null characters.The nine most commonly used functions in the string library are:
strcat - concatenate two stringsstrchr - string scanning operationstrcmp - compare two stringsstrcpy - copy a stringstrlen - get string lengthstrncat - concatenate one string with part of anotherstrncmp - compare parts of two stringsstrncpy - copy part of a stringstrrchr - string scanning operationOther functions, such asstrlwr (convert to lower case),strrev (return the string reversed), andstrupr (convert to upper case) may be popular; however, they are neither specified by the C Standard nor the Single Unix Standard. It is also unspecified whether these functions return copies of the original strings or convert the strings in place.
strcat functionchar*strcat(char*restricts1,constchar*restricts2);
Some people recommend usingstrncat()orstrlcat()instead of strcat, in order to avoid buffer overflow.
Thestrcat() function shall append a copy of the string pointed to bys2 (including the terminating null byte) to the end of the string pointed to bys1. The initial byte ofs2 overwrites the null byte at the end ofs1. If copying takes place between objects that overlap, the behavior is undefined. The function returnss1.
This function is used to attach one string to the end of another string. It is imperative that the first string (s1) have the space needed to store both strings.
Note: |
Example:
#include<stdio.h>#include<string.h>...staticconstchar*colors[]={"Red","Orange","Yellow","Green","Blue","Purple"};staticconstchar*widths[]={"Thin","Medium","Thick","Bold"};...charpenText[20];...intpenColor=3,penThickness=2;strcpy(penText,colors[penColor]);strcat(penText,widths[penThickness]);printf("My pen is %s\n",penText);/* prints 'My pen is GreenThick' */
Before callingstrcat(), the destination must currently contain a null terminated string or the first character must have been initialized with the null character (e.g.penText[0] = '\0';).
The following is a public-domain implementation ofstrcat:
#include<string.h>/* strcat */char*(strcat)(char*restricts1,constchar*restricts2){char*s=s1;/* Move s so that it points to the end of s1. */while(*s!='\0')s++;/* Copy the contents of s2 into the space at the end of s1. */strcpy(s,s2);returns1;}
strchr functionchar*strchr(constchar*s,intc);
Thestrchr() function shall locate the first occurrence ofc (converted to achar) in the string pointed to bys. The terminating null byte is considered to be part of the string. The function returns the location of the found character, or a null pointer if the character was not found.
This function is used to find certain characters in strings.
At one point in history, this function was namedindex. Thestrchr name, however cryptic, fits the general pattern for naming.
The following is a public-domain implementation ofstrchr:
#include<string.h>/* strchr */char*(strchr)(constchar*s,intc){charch=c;/* Scan s for the character. When this loop is finished, s will either point to the end of the string or the character we were looking for. */while(*s!='\0'&&*s!=ch)s++;return(*s==ch)?(char*)s:NULL;}
strcmp functionintstrcmp(constchar*s1,constchar*s2);
A rudimentary form of string comparison is done with the strcmp() function. It takes two strings as arguments and returns a value less than zero if the first is lexographically less than the second, a value greater than zero if the first is lexographically greater than the second, or zero if the two strings are equal. The comparison is done by comparing the coded (ascii) value of the characters, character by character.
This simple type of string comparison is nowadays generally considered unacceptable when sorting lists of strings.More advanced algorithms exist that are capable of producing lists in dictionary sorted order. They can also fix problems such as strcmp() considering the string "Alpha2" greater than "Alpha12". (In the previous example, "Alpha2" compares greater than "Alpha12" because '2' comes after '1' in the character set.) What we're saying is, don't use thisstrcmp() alone for general string sorting in any commercial or professional code.
Thestrcmp() function shall compare the string pointed to bys1 to the string pointed to bys2. The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as typeunsigned char) that differ in the strings being compared. Upon completion,strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to bys1 is greater than, equal to, or less than the string pointed to bys2, respectively.
Since comparing pointers by themselves is not practically useful unless one is comparing pointers within the same array, this function lexically compares the strings that two pointers point to.
This function is useful in comparisons, e.g.
if(strcmp(s,"whatever")==0)/* do something */;
The collating sequence used bystrcmp() is equivalent to the machine's native character set. The only guarantee about the order is that the digits from'0' to'9' are in consecutive order.
The following is a public-domain implementation ofstrcmp:
#include<string.h>/* strcmp */int(strcmp)(constchar*s1,constchar*s2){unsignedcharuc1,uc2;/* Move s1 and s2 to the first differing characters in each string, or the ends of the strings if they are identical. */while(*s1!='\0'&&*s1==*s2){s1++;s2++;}/* Compare the characters as unsigned char and return the difference. */uc1=(*(unsignedchar*)s1);uc2=(*(unsignedchar*)s2);return((uc1<uc2)?-1:(uc1>uc2));}
strcpy functionchar*strcpy(char*restricts1,constchar*restricts2);
Some people recommend always usingstrncpy()instead of strcpy, to avoid buffer overflow.
Thestrcpy() function shall copy the C string pointed to bys2 (including the terminating null byte) into the array pointed to bys1. If copying takes place between objects that overlap, the behavior is undefined. The function returnss1. There is no value used to indicate an error: if the arguments tostrcpy() are correct, and the destination buffer is large enough, the function will never fail.
Example:
#include<stdio.h>#include<string.h>/* ... */staticconstchar*penType="round";/* ... */charpenText[20];/* ... */strcpy(penText,penType);
Important: You must ensure that the destination buffer (s1) is able to contain all the characters in the source array, including the terminating null byte. Otherwise,strcpy() will overwrite memory past the end of the buffer, causing a buffer overflow, which can cause the program to crash, or can be exploited by hackers to compromise the security of the computer.
The following is a public-domain implementation ofstrcpy:
#include<string.h>/* strcpy */char*(strcpy)(char*restricts1,constchar*restricts2){char*dst=s1;constchar*src=s2;/* Do the copying in a loop. */while((*dst++=*src++)!='\0');/* The body of this loop is left empty. *//* Return the destination string. */returns1;}
strlen functionsize_tstrlen(constchar*s);
Thestrlen() function shall compute the number of bytes in the string to whichs points, not including the terminating null byte.It returns the number of bytes in the string. No value is used to indicate an error.
The following is a public-domain implementation ofstrlen:
#include<string.h>/* strlen */size_t(strlen)(constchar*s){constchar*p=s;/* pointer to character constant *//* Loop over the data in s. */while(*p!='\0')p++;return(size_t)(p-s);}
Note how the line
constchar*p=s
declares and initializes a pointerp to an integer constant, i.e.p cannot change the value it points to.
strncat functionchar*strncat(char*restricts1,constchar*restricts2,size_tn);
Thestrncat() function shall append not more thann bytes (a null byte and bytes that follow it are not appended) from the array pointed to bys2 to the end of the string pointed to bys1. The initial byte ofs2 overwrites the null byte at the end ofs1. A terminating null byte is always appended to the result. If copying takes place between objects that overlap, the behavior is undefined. The function returnss1.
The following is a public-domain implementation ofstrncat:
#include<string.h>/* strncat */char*(strncat)(char*restricts1,constchar*restricts2,size_tn){char*s=s1;/* Loop over the data in s1. */while(*s!='\0')s++;/* s now points to s1's trailing null character, now copy up to n bytes from s2 into s stopping if a null character is encountered in s2. It is not safe to use strncpy here since it copies EXACTLY n characters, NULL padding if necessary. */while(n!=0&&(*s=*s2++)!='\0'){n--;s++;}if(*s!='\0')*s='\0';returns1;}
strncmp functionintstrncmp(constchar*s1,constchar*s2,size_tn);
Thestrncmp() function shall compare not more thann bytes (bytes that follow a null byte are not compared) from the array pointed to bys1 to the array pointed to bys2. The sign of a non-zero return value is determined by the sign of the difference between the values of the first pair of bytes (both interpreted as typeunsigned char) that differ in the strings being compared. Seestrcmp for an explanation of the return value.
This function is useful in comparisons, as thestrcmp function is.
The following is a public-domain implementation ofstrncmp:
#include<string.h>/* strncmp */int(strncmp)(constchar*s1,constchar*s2,size_tn){unsignedcharuc1,uc2;/* Nothing to compare? Return zero. */if(n==0)return0;/* Loop, comparing bytes. */while(n-->0&&*s1==*s2){/* If we've run out of bytes or hit a null, return zero since we already know *s1 == *s2. */if(n==0||*s1=='\0')return0;s1++;s2++;}uc1=(*(unsignedchar*)s1);uc2=(*(unsignedchar*)s2);return((uc1<uc2)?-1:(uc1>uc2));}
strncpy functionchar*strncpy(char*restricts1,constchar*restricts2,size_tn);
Thestrncpy() function shall copy not more thann bytes (bytes that follow a null byte are not copied) from the array pointed to bys2 to the array pointed to bys1. If copying takes place between objects that overlap, the behavior is undefined. If the array pointed to bys2 is a string that is shorter thann bytes, null bytes shall be appended to the copy in the array pointed to bys1, untiln bytes in all are written. The function shall return s1; no return value is reserved to indicate an error.
It is possible that the function willnot return a null-terminated string, which happens if thes2 string is longer thann bytes.
The following is a public-domain version ofstrncpy:
#include<string.h>/* strncpy */char*(strncpy)(char*restricts1,constchar*restricts2,size_tn){char*dst=s1;constchar*src=s2;/* Copy bytes, one at a time. */while(n>0){n--;if((*dst++=*src++)=='\0'){/* If we get here, we found a null character at the end of s2, so use memset to put null bytes at the end of s1. */memset(dst,'\0',n);break;}}returns1;}
strrchr functionchar*strrchr(constchar*s,intc);
Thestrrchr function is similar to thestrchr function, except thatstrrchr returns a pointer to thelast occurrence ofc withins instead of the first.
Thestrrchr() function shall locate the last occurrence ofc (converted to achar) in the string pointed to bys. The terminating null byte is considered to be part of the string. Its return value is similar tostrchr's return value.
At one point in history, this function was namedrindex. Thestrrchr name, however cryptic, fits the general pattern for naming.
The following is a public-domain implementation ofstrrchr:
#include<string.h>/* strrchr */char*(strrchr)(constchar*s,intc){constchar*last=NULL;/* If the character we're looking for is the terminating null, we just need to look for that character as there's only one of them in the string. */if(c=='\0')returnstrchr(s,c);/* Loop through, finding the last match before hitting NULL. */while((s=strchr(s,c))!=NULL){last=s;s++;}return(char*)last;}
The less-used functions are:
memchr - Find a byte in memorymemcmp - Compare bytes in memorymemcpy - Copy bytes in memorymemmove - Copy bytes in memory with overlapping areasmemset - Set bytes in memorystrcoll - Compare bytes according to a locale-specific collating sequencestrcspn - Get the length of a complementary substringstrerror - Get error messagestrpbrk - Scan a string for a bytestrspn - Get the length of a substringstrstr - Find a substringstrtok - Split a string into tokensstrxfrm - Transform stringmemcpy functionvoid*memcpy(void*restricts1,constvoid*restricts2,size_tn);
Thememcpy() function shall copyn bytes from the object pointed to bys2 into the object pointed to bys1. If copying takes place between objects that overlap, the behavior is undefined. The function returnss1.
Because the function does not have to worry about overlap, it can do the simplest copy it can.
The following is a public-domain implementation ofmemcpy:
#include<string.h>/* memcpy */void*(memcpy)(void*restricts1,constvoid*restricts2,size_tn){char*dst=s1;constchar*src=s2;/* Loop and copy. */while(n--!=0)*dst++=*src++;returns1;}
memmove functionvoid*memmove(void*s1,constvoid*s2,size_tn);
Thememmove() function shall copyn bytes from the object pointed to bys2 into the object pointed to bys1. Copying takes place as if then bytes from the object pointed to bys2 are first copied into a temporary array ofn bytes that does not overlap the objects pointed to bys1 ands2, and then then bytes from the temporary array are copied into the object pointed to bys1. The function returns the value ofs1.
The easy way to implement this without using a temporary array is to check for a condition that would prevent an ascending copy, and if found, do a descending copy.
The following is a public-domain, though not completely portable, implementation ofmemmove:
#include<string.h>/* memmove */void*(memmove)(void*s1,constvoid*s2,size_tn){/* note: these don't have to point to unsigned chars */char*p1=s1;constchar*p2=s2;/* test for overlap that prevents an ascending copy */if(p2<p1&&p1<p2+n){/* do a descending copy */p2+=n;p1+=n;while(n--!=0)*--p1=*--p2;}elsewhile(n--!=0)*p1++=*p2++;returns1;}
memcmp functionintmemcmp(constvoid*s1,constvoid*s2,size_tn);
Thememcmp() function shall compare the firstn bytes (each interpreted asunsigned char) of the object pointed to bys1 to the firstn bytes of the object pointed to bys2. The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as typeunsigned char) that differ in the objects being compared.
The following is a public-domain implementation ofmemcmp:
#include<string.h>/* memcmp */int(memcmp)(constvoid*s1,constvoid*s2,size_tn){constunsignedchar*us1=(constunsignedchar*)s1;constunsignedchar*us2=(constunsignedchar*)s2;while(n--!=0){if(*us1!=*us2)return(*us1<*us2)?-1:+1;us1++;us2++;}return0;}
strcoll andstrxfrm functionsintstrcoll(constchar*s1,constchar*s2);
size_t strxfrm(char *s1, const char *s2, size_t n);
The ANSI C Standard specifies two locale-specific comparison functions.
Thestrcoll function compares the string pointed to bys1 to the string pointed to bys2, both interpreted as appropriate to theLC_COLLATE category of the current locale. The return value is similar tostrcmp.
Thestrxfrm function transforms the string pointed to bys2 and places the resulting string into the array pointed to bys1. The transformation is such that if thestrcmp function is applied to the two transformed strings, it returns a value greater than, equal to, or less than zero, corresponding to the result of thestrcoll function applied to the same two original strings. No more thann characters are placed into the resulting array pointed to bys1, including the terminating null character. Ifn is zero,s1 is permitted to be a null pointer. If copying takes place between objects that overlap, the behavior is undefined. The function returns the length of the transformed string.
These functions are rarely used and nontrivial to code, so there is no code for this section.
memchr functionvoid*memchr(constvoid*s,intc,size_tn);
Thememchr() function shall locate the first occurrence ofc (converted to anunsigned char) in the initialn bytes (each interpreted asunsigned char) of the object pointed to bys. Ifc is not found,memchr returns a null pointer.
The following is a public-domain implementation ofmemchr:
#include<string.h>/* memchr */void*(memchr)(constvoid*s,intc,size_tn){constunsignedchar*src=s;unsignedcharuc=c;while(n--!=0){if(*src==uc)return(void*)src;src++;}returnNULL;}
strcspn,strpbrk, andstrspn functionssize_tstrcspn(constchar*s1,constchar*s2);
char*strpbrk(constchar*s1,constchar*s2);
size_tstrspn(constchar*s1,constchar*s2);
Thestrcspn function computes the length of the maximum initial segment of the string pointed to bys1 which consists entirely of charactersnot from the string pointed to bys2.
Thestrpbrk function locates the first occurrence in the string pointed to bys1 of any character from the string pointed to bys2, returning a pointer to that character or a null pointer if not found.
Thestrspn function computes the length of the maximum initial segment of the string pointed to bys1 which consists entirely of characters from the string pointed to bys2.
All of these functions are similar except in the test and the return value.
The following are public-domain implementations ofstrcspn,strpbrk, andstrspn:
#include<string.h>/* strcspn */size_t(strcspn)(constchar*s1,constchar*s2){constchar*sc1;for(sc1=s1;*sc1!='\0';sc1++)if(strchr(s2,*sc1)!=NULL)return(sc1-s1);returnsc1-s1;/* terminating nulls match */}
#include<string.h>/* strpbrk */char*(strpbrk)(constchar*s1,constchar*s2){constchar*sc1;for(sc1=s1;*sc1!='\0';sc1++)if(strchr(s2,*sc1)!=NULL)return(char*)sc1;returnNULL;/* terminating nulls match */}
#include<string.h>/* strspn */size_t(strspn)(constchar*s1,constchar*s2){constchar*sc1;for(sc1=s1;*sc1!='\0';sc1++)if(strchr(s2,*sc1)==NULL)return(sc1-s1);returnsc1-s1;/* terminating nulls don't match */}
strstr functionchar*strstr(constchar*haystack,constchar*needle);
Thestrstr() function shall locate the first occurrence in the string pointed to byhaystack of the sequence of bytes (excluding the terminating null byte) in the string pointed to byneedle. The function returns the pointer to the matching string inhaystack or a null pointer if a match is not found. Ifneedle is an empty string, the function returnshaystack.
The following is a public-domain implementation ofstrstr:
#include<string.h>/* strstr */char*(strstr)(constchar*haystack,constchar*needle){size_tneedlelen;/* Check for the null needle case. */if(*needle=='\0')return(char*)haystack;needlelen=strlen(needle);for(;(haystack=strchr(haystack,*needle))!=NULL;haystack++)if(memcmp(haystack,needle,needlelen)==0)return(char*)haystack;returnNULL;}
strtok functionchar*strtok(char*restricts1,constchar*restrictdelimiters);
A sequence of calls tostrtok() breaks the string pointed to bys1 into a sequence of tokens, each of which is delimited by a byte from the string pointed to bydelimiters. The first call in the sequence hass1 as its first argument, and is followed by calls with a null pointer as their first argument. The separator string pointed to bydelimiters may be different from call to call.
The first call in the sequence searches the string pointed to bys1 for the first byte that is not contained in the current separator string pointed to bydelimiters. If no such byte is found, then there are no tokens in the string pointed to bys1 andstrtok() shall return a null pointer. If such a byte is found, it is the start of the first token.
Thestrtok() function then searches from there for a byte (or multiple, consecutive bytes) that is contained in the current separator string. If no such byte is found, the current token extends to the end of the string pointed to bys1, and subsequent searches for a token shall return a null pointer. If such a byte is found, it is overwritten by a null byte, which terminates the current token. Thestrtok() function saves a pointer to the following byte, from which the next search for a token shall start.
Each subsequent call, with a null pointer as the value of the first argument, starts searching from the saved pointer and behaves as described above.
Thestrtok() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe.
Because thestrtok() function must save state between calls, and you could not have two tokenizers going at the same time, the Single Unix Standard defined a similar function,strtok_r(), that does not need to save state. Its prototype is this:
char *strtok_r(char *s, const char *delimiters, char **lasts);
Thestrtok_r() function considers the null-terminated strings as a sequence of zero or more text tokens separated by spans of one or more characters from the separator stringdelimiters. The argument lasts points to a user-provided pointer which points to stored information necessary forstrtok_r() to continue scanning the same string.
In the first call tostrtok_r(),s points to a null-terminated string,delimiters to a null-terminated string of separator characters, and the value pointed to bylasts is ignored. Thestrtok_r() function shall return a pointer to the first character of the first token, write a null character intos immediately following the returned token, and update the pointer to whichlasts points.
In subsequent calls,s is a null pointer andlasts shall be unchanged from the previous call so that subsequent calls shall move through the strings, returning successive tokens until no tokens remain. The separator stringdelimiters may be different from call to call. When no token remains ins, a NULL pointer shall be returned.
The following public-domain code forstrtok andstrtok_r codes the former as a special case of the latter:
#include<string.h>/* strtok_r */char*(strtok_r)(char*s,constchar*delimiters,char**lasts){char*sbegin,*send;sbegin=s?s:*lasts;sbegin+=strspn(sbegin,delimiters);if(*sbegin=='\0'){*lasts="";returnNULL;}send=sbegin+strcspn(sbegin,delimiters);if(*send!='\0')*send++='\0';*lasts=send;returnsbegin;}/* strtok */char*(strtok)(char*restricts1,constchar*restrictdelimiters){staticchar*ssave="";returnstrtok_r(s1,delimiters,&ssave);}
These functions do not fit into one of the above categories.
memset functionvoid*memset(void*s,intc,size_tn);
Thememset() function convertsc intounsigned char, then stores the character into the firstn bytes of memory pointed to bys.
The following is a public-domain implementation ofmemset:
#include<string.h>/* memset */void*(memset)(void*s,intc,size_tn){unsignedchar*us=s;unsignedcharuc=c;while(n--!=0)*us++=uc;returns;}
strerror functionchar*strerror(interrorcode);
This function returns a locale-specific error message corresponding to the parameter. Depending on the circumstances, this function could be trivial to implement, but this author will not do that as it varies.
The Single Unix System Version 3 has a variant,strerror_r, with this prototype:
int strerror_r(int errcode, char *buf, size_t buflen);
This function stores the message inbuf, which has a length of sizebuflen.
To determine the number of characters in a string, thestrlen() function is used:
#include<stdio.h>#include<string.h>...intlength,length2;char*turkey;staticchar*flower="begonia";staticchar*gemstone="ruby ";length=strlen(flower);printf("Length = %d\n",length);// prints 'Length = 7'length2=strlen(gemstone);turkey=malloc(length+length2+1);if(turkey){strcpy(turkey,gemstone);strcat(turkey,flower);printf("%s\n",turkey);// prints 'ruby begonia'free(turkey);}
Note that the amount of memory allocated for 'turkey' is one plus the sum of the lengths of the strings to be concatenated. This is for the terminating null character, which is not counted in the lengths of the strings.
string.h library are vulnerable to buffer overflow errors,some people recommend avoiding thestring.h library and "C style strings" and instead using a dynamic string API, such as the ones listed in theString library comparison.| C Programming String manipulation | Further math |