The data typestring(…) is used to store a finite sequence ofchar values.It is a special case of anarray, but unlike anarray[…]ofchar the data typestring(…) has some advantages facilitating its effective usage.
The data typestring(…) as presented here is anExtended Pascal extension, as defined in theISO standard 10206.Due to its high relevance in practice, this topic has been put into the Standard Pascal part of this Wikibook, right after the chapter onarrays.
Many compilers have a different conception of what constitutes astring. Consulttheir manual for their idiosyncratic differences. Rest assured, theGPC supportsstring(…) as explained here. |
The declaration of astring data type always entails amaximum capacity:
programstringDemo(output);typeaddress=string(60);varhouseAndStreet:address;beginhouseAndStreet:='742 Evergreen Trc.';writeLn('Send complaints to:');writeLn(houseAndStreet);end.
After the wordstring follows apositive integer number surrounded by parenthesis.This is not a function call.[fn 1]
Variables of the data typeaddress as defined above will only be able to storeup to60 independentchar values.Of course it is possible to store less, or even0, but once this limit is set it cannot be expanded.
String variables “know” about their own maximum capacity:If you usewriteLn(houseAndStreet.capacity), this will print60.Everystring variable automatically has a “field” calledcapacity.This field is accessed by writing the respectivestring variable’s name and the wordcapacity joined by a dot (.).This field is read-only:You cannot assign values to it.It can only appear in expressions.
Allstring variables have a currentlength.This is the total number of legitchar values everystring variable currently contains.To query this number, theEP standard defines a new function calledlength:
programlengthDemo(output);typedomain=string(42);varalphabet:domain;beginalphabet:='ABCDEFGHIJKLMNOPQRSTUVWXYZ';writeLn(length(alphabet));end.
Thelength function returns a non-negativeinteger value denoting the supplied string’s length.It also acceptschar values.[fn 2]Achar value has by definition a length of 1.
It is guaranteed that thelength of astring variable will always be less than or equal to its correspondingcapacity.
You can copy entire string values using the:= operator provided the variable on theLHS has the same or a greatercapacity than theRHS string expression.This is different than a regulararray’s behavior, which would require dimensions and size to matchexactly.
programstringAssignmentDemo;typezipcode=string(5);stateCode=string(2);varzip:zipcode;state:stateCode;beginzip:='12345';state:='QQ';zip:=state;// ✔// zip.capacity > state.capacity// ↯ state := zip; ✘end.
As long as no clipping occurs, i. e. the omission of values because of a too short capacity, the assignment is fine.
It is worth noting that otherwise strings are internally regarded as arrays.[fn 3]Like acharacter array you can access (and alter) every array elementindependently by specifying a valid index surrounded by brackets.However, there is a big difference with respect to validity of an index.At any time, you are only allowed to specify indices that are within the range1..length.This range may beempty, specifically iflength is currently 0.
It isnot possible to change the current length by manipulating individual string components:programstringAccessDemo;typebar=string(8);varfoo:bar;beginfoo:='AA';{ ✔ length ≔ 2 }foo[2]:='B';{ ✔ }foo[3]:='C';{ ↯: 3 > length }end. |
In addition to thelength function,EP also defines a few other standard functions operating on strings.
The following functions return strings.
In order to obtain just a part of astring (orchar) expression, the functionsubStr(stringOrCharacter,firstCharacter,count) returns a sub-string ofstringOrCharacter having the non-negative lengthcount, starting at the positive indexfirstCharacter.It is important thatfirstCharacter+count-1 is a valid character index instringOrCharacter, otherwise the function causes an error.[fn 4]
programsubstringDemo(output);beginwriteLn(subStr('GCUACGGAGCUUCGGAGUUAG',7,3));{ char index: 1 4 7 … }end.
GAG
firstCharacter index. Here we wanted to extract thethird codon. However,firstCharacter is not simply2*3 but2*3+1. Indexing characters in astring variable start at 1. Note, asophisticated implementation for encoding codons wouldnot make use ofstring, but define a customenumeration data type.Forstring-variables, thesubStr function is the same as specifyingmyString[firstCharacter..firstCharacter+count].[fn 5]Evidently, if thefirstCharacter value is somecomplicated expression, thesubStr function should be preferred to prevent any programming mistakes.
string.programsubstringOverwriteDemo(output);varm:string(35);beginm:='supercalifragilisticexpialidocious ';m[21..35]:='-yadi-yada-yada';writeLn(m);end.
supercalifragilistic-yadi-yada-yada
string.Furthermore, the third parameter tosubStr can be omitted:This will simply returnthe rest of the givenstring starting at the position indicated by the second parameter.[fn 6]
Thetrim(source) function returns a copy ofsource without anytrailing space characters, i. e.' '.InLTR scripts any blanks to the right are considered insignificant, yet in computing they take up (memory) space.It is advisable to prune strings before writing them, for example, to a disk or other long-term storage media, or transmission via networks.Concededly memory requirements were a more relevant issue prior to the 21st century.
The functionindex(source,pattern) finds the first occurrence ofpattern insource and returns the starting index.All characters frompattern match the characters insourceat the returned offset:
| 1 | 2 | 3 | ✘ | |||||
pattern | X | Y | X | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ✘ | |||||
pattern | X | Y | X | |||||
| 1 | 2 | 3 | ✔ | |||||
pattern | X | Y | X | |||||
source | Z | Y | X | Y | X | Y | X | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
Note, to obtain the second or any subsequent occurrence, you need to use a propersubstring of thesource.
Because the “empty string” is, mathematically speaking, presenteverywhere,index(characterOrString,'') always returns1.Conversely, because any non-empty string cannot occur in an empty string,index('',nonEmptyStringOrCharacter) always returns 0, in the context of strings an otherwise invalid index.The valuezero is returned ifpattern does not occur insource.This will always be the case ifpattern islonger thansource.
| expression | result |
|---|---|
'Foo'+'bar' | 'Foobar' |
''+'' | '' |
'9'+chr(ord('0')+9)+' Luftballons' | '99 Luftballons' |
Concatenation is useful if you intend to save the data somewhere.Supplying concatenated strings to routines such aswrite/writeLn, however, may possibly be disadvantageous:The concatenation, especially of long strings, first requires to allocate enough memory to accommodate for the entire resulting string.Then, all the operands arecopied to their respective location.This takes time.Hence, in the case ofwrite/writeLn it is advisable (for very long strings) to use their capability of accepting aninfinite number of (comma-separated) parameters.
Note, the commonLOC
stringVariable:='xyz'+someStringOrCharacter+…;
is equivalent to
writeStr(stringVariable,'xyz',someStringOrCharacter,…);
The latter is particularly useful if you also want to pad the result or need some conversion.Writingfoo:20 (minimum width of20 characters possibly padded with spaces' ' to the left) is only acceptable usingwrite/writeLn/writeStr.WriteStr is anEP extension.
TheGPC, theFPC and Delphi are also shipped with a functionconcat performing the very same task.Read the respective compiler’s documentationbefore using it, because there are some differences, or just stick to thestandardized+ operator.
All functions presented in this subsection return aBoolean value.
Since every character in a string has an ordinal value, we can think of a method to sort them.There aretwo flavors of comparing strings:
=,> or<=.LT, orGT.The difference lies in their treatment of strings thatvary in length.While the former will bring both strings to the same length bypadding them with space characters (' '), the latter simplyclips them to the shortest length, but taking into account which one was longer (if necessary).
| function name | meaning | operator |
|---|---|---|
EQ | equal | = |
NE | not equal | <> |
LT | less than | < |
LE | less than or equal to | <= |
GT | greater than | > |
GE | greater than or equal to | >= |
All these functions and operators are binary, that means they expect and accept only exactly two parameters or operands respectively.Theycan producedifferent results if supplied with thesame input, as you will see in the next two sub-subsections.
Let’s start with equality.
EQ function if both operands are of the same lengthand the value, i. e. the character sequence that actually make up the strings, are the same.=‑comparison, on the other hand, augments any “missing” characters in the shorter string by using the padding characterspace (' ').[fn 7]programequalDemo(output);constemptyString='';blankString=' ';beginwriteLn(emptyString=blankString);writeLn(EQ(emptyString,blankString));end.
True False
emptyString got padded to match the length ofblankString, before the actual character-by-character=‑expression took place.To putthis relationship in other words,Pascal terms you already know:
(foo=bar)=EQ(trim(foo),trim(bar))
Theactual implementation is usually different, becausetrimcan be, especially for long strings, quite resource-consuming (time, as well as memory).
As a consequence, an=‑comparison is usually used if trailing spaces are insignificant, butare stillthere for technical reasons (e. g. because you are using anarray[1..8]ofchar).OnlyEQ ensures both strings are lexicographically the same.Note that thecapacity of either string is irrelevant.The functionNE, short fornot equal, behaves accordingly.
A string is determined to be “less than” another one bysequentially reading both stringssimultaneously from left to right andcomparing corresponding characters.Ifall characters match, the strings are said to beequal to each other.However, if we encounter a differing character pair, processing isaborted and the relation of the current characters determines theoverall string’s relation.
| first operand | 'A' | 'B' | 'C' | 'D' |
|---|---|---|---|---|
| second operand | 'A' | 'B' | 'E' | 'A' |
| determined relation | = | = | < | ⨯ |
If both strings are ofequal length, theLT function and the<‑operator behave the same.LT actually even builds on top of <.Things get interesting if the supplied stringsdiffer in length.
LT function first cuts both strings to the same (shorter) length. (substring)<‑comparison, on the other, compares all remaining “missing” characters to' ', the space character. This can lead to differing results:programlessThanDemo(output);varhogwash,malarky:string(8);begin{ ensure ' ' is not chr(0) or maxChar }ifnot(' 'in[chr(1)..pred(maxChar)])thenbeginwriteLn('Character set presumptions not met.');halt;{ EP procedure immediately terminating the program }end;hogwash:='123';malarky:=hogwash+chr(0);writeLn(hogwash<malarky,LT(hogwash,malarky));malarky:=hogwash+'4';writeLn(hogwash<malarky,LT(hogwash,malarky));malarky:=hogwash+maxChar;writeLn(hogwash<malarky,LT(hogwash,malarky));end.
False True True True True True
<‑comparison, the “missing” fourth character inhogwash is presumed to be' '. The fourth character inmalarky is compared against' '.The situation above has been provoked artificially for demonstration purposes, but this can still become an issue if you are frequently using characters that are “smaller” than the regular space character, like for instance if you are programming on an 1980s 8‑bit Atari computer usingATASCII.TheLE,GT, andGE functions act accordingly.
string literalsIn Pascalstring literals start with and are terminated by the same character.Usually this is a straight (typewriter’s) apostrophe (').Troubles arise if you want to actually include that character in astring literal, because the character you want to include into your string is already understood as theterminating delimiter.Conventionally, two straight typewriter’s apostrophes back-to-back are regarded as anapostrophe image.In the produced computer program, they are replaced by asingle apostrophe.
programapostropheDemo(output);varc:char;beginforc:='0'to'9'dobeginwriteLn('ord(''',c,''') = ',ord(c));end;end.
Each double-apostrophe is replaced by a single apostrophe.The string still needsdelimiting apostrophes, so youmight end up withthree consecutive apostrophes like in the example above, or evenfour consecutive apostrophes ('''') if you want achar-value consisting of asingle apostrophe.
Astring is alinear sequence of characters, i. e. along asingle dimension.
As such the onlyillegal “character” in strings is the one marking line breaks (new lines). The string literal in the following piece of code is unacceptable, because it spans acrossmultiple (source code) lines. welcomeMessage:='Hello!All your base are belong to us.'; |
You are nevertheless allowed to use theOS-specific code indicatingEOLs, yet the onlycross-platform (i. e. guaranteed to work regardless of the usedOS) procedure iswriteLn.Althoughnot standardized, many compilers providea constant representing the environment’s character/string necessary to produce line breaks.InFPC it is calledlineEnding.Delphi hassLineBreak, which is also understood by theFPC for compatibility reasons.TheGPC’s standard moduleGPC supplies the constantlineBreak.You will first need toimport this module before you can use that identifier.
| See also the chapterInteger Division and Modulus inProgramming Fundamentals. |
The final Standard Pascal arithmetic operator you are introduced to, after learning todivide, is theremainder operatormod (short formodulo).Everyinteger division (div) may yield a remainder.This operator evaluates to this value.
i | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|---|---|
imod2 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
imod3 | 0 | 1 | 2 | 0 | 1 | 2 | 0 |
Similar to all other division operations, themod operator does not accept azero value as the second operand.Moreover, the second operand tomodmust bepositive.There are many definitions, among computer scientists and mathematicians, as regards to the result if the divisor was negative.Pascal avoids any confusion by simply declaring negative divisors as illegal.
Themod operator is frequently used to ensure a certain value remains in a specific range starting at zero (0..n).Furthermore, you will find modulo innumber theory.For example, the definition ofprime numbers says “not divisible by any other number”.This expression can be translated into Pascal like that:
| expression | is divisible by |
|---|---|
| mathematical notation | |
| Pascal expression | xmodd=0 |
odd(x) is shorthand forxmod2<>0.[fn 8] |
array[n..m]ofstring(c)?string(…) is basically a special case of anarray (namely one consisting ofchar values), you can access a single character from it just like usual:v[i,p] wherei is a valid index in the rangen..m andp refers to the character index within1..length(v[i]).string(…) is basically a special case of anarray (namely one consisting ofchar values), you can access a single character from it just like usual:v[i,p] wherei is a valid index in the rangen..m andp refers to the character index within1..length(v[i]).true if, and only if a givenstring(…) contains non-blank characters (i. e. other characters than' ').programspaceTest(input,output);typeinfo=string(20);{**\brief determines whether a string contains non-space characters\param s the string to inspect\return true if there are any characters other than ' '*}functioncontainsNonBlanks(s:info):Boolean;begincontainsNonBlanks:=length(trim(s))>0;end;// … remaining code for testing purposes only …
Note, that this function (correctly) returnsfalse if supplied with an empty string (''). Alternatively you could have written:
containsNonBlanks:=''<>s;
string(…) data type to work properly. Remember, in these exercises there is no “best” solution.programspaceTest(input,output);typeinfo=string(20);{**\brief determines whether a string contains non-space characters\param s the string to inspect\return true if there are any characters other than ' '*}functioncontainsNonBlanks(s:info):Boolean;begincontainsNonBlanks:=length(trim(s))>0;end;// … remaining code for testing purposes only …
Note, that this function (correctly) returnsfalse if supplied with an empty string (''). Alternatively you could have written:
containsNonBlanks:=''<>s;
string(…) data type to work properly. Remember, in these exercises there is no “best” solution.program that reads astring(…) and transposes every letter in it by 13 positions with respect to the original character’s place in the English alphabet, and then outputs the modified version. This algorithm is known as “Caesar cipher”. For simplicity assume all input is lower-case.programrotate13(input,output);const// we will only operate ("rotate") on these charactersalphabet='abcdefghijklmnopqrstuvwxyz';offset=13;typeintegerNonNegative=0..maxInt;sentence=string(80);varsecret:sentence;i,p:integerNonNegative;beginreadLn(secret);fori:=1tolength(secret)dobegin// is current character in alphabet?p:=index(alphabet,secret[i]);// if so, rotateifp>0thenbegin// The `+ 1` in the end ensures that p// in the following expression `alphabet[p]`// is indeed always a valid index (i.e. not zero).p:=(p-1+offset)modlength(alphabet)+1;secret[i]:=alphabet[p];end;end;writeLn(secret);end.
array[chr(0)..maxChar]ofchar) would have been acceptable, too, but care must be taken in properly populating it.Note, it is not guaranteed that expressions such assucc('A',13) will yield the expected result. The range'A'..'Z' is not necessarily contiguous, so you should not make any assumptions about it. If your solution makes use of that, you mustdocument it (e. g. “This program only runs properly on computers using theASCII character set.”).programrotate13(input,output);const// we will only operate ("rotate") on these charactersalphabet='abcdefghijklmnopqrstuvwxyz';offset=13;typeintegerNonNegative=0..maxInt;sentence=string(80);varsecret:sentence;i,p:integerNonNegative;beginreadLn(secret);fori:=1tolength(secret)dobegin// is current character in alphabet?p:=index(alphabet,secret[i]);// if so, rotateifp>0thenbegin// The `+ 1` in the end ensures that p// in the following expression `alphabet[p]`// is indeed always a valid index (i.e. not zero).p:=(p-1+offset)modlength(alphabet)+1;secret[i]:=alphabet[p];end;end;writeLn(secret);end.
array[chr(0)..maxChar]ofchar) would have been acceptable, too, but care must be taken in properly populating it.Note, it is not guaranteed that expressions such assucc('A',13) will yield the expected result. The range'A'..'Z' is not necessarily contiguous, so you should not make any assumptions about it. If your solution makes use of that, you mustdocument it (e. g. “This program only runs properly on computers using theASCII character set.”).string is apalindrome, that means it can be read forwardand backwards producing the same meaning/sound provided word gaps (spaces) are adjusted accordingly. For simplicity assume all characters are lower-case and there are no punctuation characters (other than whitespace).programpalindromes(input,output);typesentence=string(80);{\brief determines whether a lower-case sentence is a palindrome\param original the sentence to inspect\return true iff \param original can be read forward and backward}functionisPalindrome(original:sentence):Boolean;varreadIndex,writeIndex:integer;derivative:sentence;check:Boolean;begincheck:=true;// “sentences” that have a length of one, or even zero characters// are always palindromesiflength(original)>1thenbegin// ensure `derivative` has the same length as `original`derivative:=original;// the contents are irrelevant, alternatively [in EP] you could’ve used//writeStr(derivative, '':length(original));// which would’ve saved us the “fill the rest with blanks” step belowwriteIndex:=1;// strip blanksforreadIndex:=1tolength(original)dobegin// only copy significant charactersifnot(original[readIndex]in[' '])thenbeginderivative[writeIndex]:=original[readIndex];writeIndex:=writeIndex+1;end;end;// fill the rest with blanksforwriteIndex:=writeIndextolength(derivative)dobeginderivative[writeIndex]:=' ';end;// remove trailing blanks and thus shorten lengthderivative:=trim(derivative);forreadIndex:=1tolength(derivative)div2dobegincheck:=checkand(derivative[readIndex]=derivative[length(derivative)-readIndex+1]);end;end;isPalindrome:=check;end;varmystery:sentence;beginwriteLn('Enter a sentence that is possibly a palindrome (no caps):');readLn(mystery);writeLn('The sentence you have entered is a palindrome: ',isPalindrome(mystery));end.
originalstring. For demonstration purposes the example showsifnot(original[readIndex]in[' '])then. In fact anexplicit set list would have been more adequate, i. e.iforiginal[readIndex]in['a','b','c',…,'z'])then. Do not worry if you simply wrote something to the effect ofiforiginal[readIndex]<>' 'then, this is just as fine given the task’s requirements.programpalindromes(input,output);typesentence=string(80);{\brief determines whether a lower-case sentence is a palindrome\param original the sentence to inspect\return true iff \param original can be read forward and backward}functionisPalindrome(original:sentence):Boolean;varreadIndex,writeIndex:integer;derivative:sentence;check:Boolean;begincheck:=true;// “sentences” that have a length of one, or even zero characters// are always palindromesiflength(original)>1thenbegin// ensure `derivative` has the same length as `original`derivative:=original;// the contents are irrelevant, alternatively [in EP] you could’ve used//writeStr(derivative, '':length(original));// which would’ve saved us the “fill the rest with blanks” step belowwriteIndex:=1;// strip blanksforreadIndex:=1tolength(original)dobegin// only copy significant charactersifnot(original[readIndex]in[' '])thenbeginderivative[writeIndex]:=original[readIndex];writeIndex:=writeIndex+1;end;end;// fill the rest with blanksforwriteIndex:=writeIndextolength(derivative)dobeginderivative[writeIndex]:=' ';end;// remove trailing blanks and thus shorten lengthderivative:=trim(derivative);forreadIndex:=1tolength(derivative)div2dobegincheck:=checkand(derivative[readIndex]=derivative[length(derivative)-readIndex+1]);end;end;isPalindrome:=check;end;varmystery:sentence;beginwriteLn('Enter a sentence that is possibly a palindrome (no caps):');readLn(mystery);writeLn('The sentence you have entered is a palindrome: ',isPalindrome(mystery));end.
originalstring. For demonstration purposes the example showsifnot(original[readIndex]in[' '])then. In fact anexplicit set list would have been more adequate, i. e.iforiginal[readIndex]in['a','b','c',…,'z'])then. Do not worry if you simply wrote something to the effect ofiforiginal[readIndex]<>' 'then, this is just as fine given the task’s requirements.LT('','')?function that determines whether a year in the Gregorian calendar is aleap year. Every fourth year is a leap year, but every hundredth year is not, unless it is the fourth century in a row.mod operator you just saw:{\brief determines whether a year is a leap year in Gregorian calendar\param x the year to inspect\return true, if and only if \param x meets leap year conditions}functionleapYear(x:integer):Boolean;beginleapYear:=(xmod4=0)and(xmod100<>0)or(xmod400=0);end;
functionisLeapYear in Delphi’s and theFPC’ssysUtilsunit or inGPC’sGPCmodule. Whenever possible reuse code already available.mod operator you just saw:{\brief determines whether a year is a leap year in Gregorian calendar\param x the year to inspect\return true, if and only if \param x meets leap year conditions}functionleapYear(x:integer):Boolean;beginleapYear:=(xmod4=0)and(xmod100<>0)or(xmod400=0);end;
functionisLeapYear in Delphi’s and theFPC’ssysUtilsunit or inGPC’sGPCmodule. Whenever possible reuse code already available.function returning the leap year property of a year, write a binaryfunction returning the number of days in a given month and year.case-statement. Recall that there must beexactly one assignment to the result variable:type{ a valid day number in Gregorian calendar month }day=1..31;{ a valid month number in Gregorian calendar year }month=1..12;{\brief determines the number of days in a given Gregorian year\param m the month whose day number count is requested\param y the year (relevant for leap years)\return the number of days in a given month and year}functiondaysInMonth(m:month;y:integer):day;begincasemof1,3,5,7,8,10,12:begindaysInMonth:=31;end;4,6,9,11:begindaysInMonth:=30;end;2:begindaysInMonth:=28+ord(leapYear(y));end;end;end;
dateUtilsunit provide afunction calleddaysInAMonth. You are strongly encouraged to reuseit instead of your own code.case-statement. Recall that there must beexactly one assignment to the result variable:type{ a valid day number in Gregorian calendar month }day=1..31;{ a valid month number in Gregorian calendar year }month=1..12;{\brief determines the number of days in a given Gregorian year\param m the month whose day number count is requested\param y the year (relevant for leap years)\return the number of days in a given month and year}functiondaysInMonth(m:month;y:integer):day;begincasemof1,3,5,7,8,10,12:begindaysInMonth:=31;end;4,6,9,11:begindaysInMonth:=30;end;2:begindaysInMonth:=28+ord(leapYear(y));end;end;end;
dateUtilsunit provide afunction calleddaysInAMonth. You are strongly encouraged to reuseit instead of your own code.More exercises can be found in:
Notes:
' ' is achar value, whereas'' (“null-string”) or'42' are string literals. In order to write generic code,length accepts all kinds of values thatcould denote a finite sequence ofchar values.packedarray[1..capacity]ofchar.subStr('',1,0). It goes without saying that such a function call is very useless.bindable when using this notation.subStr('',1) is illegal, because there is no “character 1” in an empty string. Also,subStr('Z',1) is not allowed, because'Z' is achar-expression and as such always has a length of 1, rendering any need for a “give me the rest of/subsequent characters of” function obsolete.‑‑extended‑pascal on the command line. Otherwise, no padding occurs. The Standard (unextended) Pascal, as perISO standard 7185, does not define any padding algorithm.odd may be different. On many processor architectures it is usually something to the effect of the x86 instructionandx,1.