Movatterモバイル変換

[0]ホーム

Jump to content

Python Programming/Strings

Edit links

From Wikibooks, open books for an open world

Overview

[edit |edit source]

Strings in Python at a glance:

str1="Hello"# A new string using double quotesstr2='Hello'# Single quotes do the samestr3="Hello\tworld\n"# One with a tab and a newlinestr4=str1+" world"# Concatenationstr5=str1+str(4)# Concatenation with a numberstr6=str1[2]# 3rd characterstr6a=str1[-1]# Last character#str1[0] = "M"                # No way; strings are immutableforcharinstr1:print(char)# For each characterstr7=str1[1:]# Without the 1st characterstr8=str1[:-1]# Without the last characterstr9=str1[1:4]# Substring: 2nd to 4th characterstr10=str1*3# Repetitionstr11=str1.lower()# Lowercasestr12=str1.upper()# Uppercasestr13=str1.rstrip()# Strip right (trailing) whitespacestr14=str1.replace('l','h')# Replacementlist15=str1.split('l')# Splittingifstr1==str2:print("Equ")# Equality testif"el"instr1:print("In")# Substring testlength=len(str1)# Lengthpos1=str1.find('llo')# Index of substring or -1pos2=str1.rfind('l')# Index of substring, from the rightcount=str1.count('l')# Number of occurrences of a substringprint(str1,str2,str3,str4,str5,str6,str7,str8,str9,str10)print(str11,str12,str13,str14,list15)print(length,pos1,pos2,count)

See also chapterRegular Expression for advanced pattern matching on strings in Python.

String operations

[edit |edit source]

Equality

[edit |edit source]

Two strings are equal if they haveexactly the same contents, meaning that they are both the same length and each character has a one-to-one positional correspondence. Many other languages compare strings by identity instead; that is, two strings are considered equal only if they occupy the same space in memory. Python uses theis operator to test the identity of strings and any two objects in general.

Examples:

>>>a='hello';b='hello'# Assign 'hello' to a and b.>>>a==b# check for equalityTrue>>>a=='hello'#True>>>a=="hello"# (choice of delimiter is unimportant)True>>>a=='hello '# (extra space)False>>>a=='Hello'# (wrong case)False

Numerical

[edit |edit source]

There are two quasi-numerical operations which can be done on strings -- addition and multiplication. String addition is just another name for concatenation, which is simply sticking the strings together. String multiplication is repetitive addition, or concatenation. So:

>>>c='a'>>>c+'b''ab'>>>c*5'aaaaa'

Containment

[edit |edit source]

There is a simple operator 'in' that returns True if the first operand is contained in the second. This also works on substrings:

>>>x='hello'>>>y='ell'>>>xinyFalse>>>yinxTrue

Note that 'print(x in y)' would have also returned the same value.

Indexing and Slicing

[edit |edit source]

Much like arrays in other languages, the individual characters in a string can be accessed by an integer representing its position in the string. The first character in string s would be s[0] and the nth character would be at s[n-1].

>>>s="Xanadu">>>s[1]'a'

Unlike arrays in other languages, Python also indexes the arrays backwards, using negative numbers. The last character has index -1, the second to last character has index -2, and so on.

>>>s[-4]'n'

We can also use "slices" to access a substring of s. s[a:b] will give us a string starting with s[a] and ending with s[b-1].

>>>s[1:4]'ana'

None of these are assignable.

>>>print(s)>>>s[0]='J'Traceback(mostrecentcalllast):File"<stdin>",line1,in?TypeError:objectdoesnotsupportitemassignment>>>s[1:3]="up"Traceback(mostrecentcalllast):File"<stdin>",line1,in?TypeError:objectdoesnotsupportsliceassignment>>>print(s)

Outputs (assuming the errors were suppressed):

XanaduXanadu

Another feature of slices is that if the beginning or end is left empty, it will default to the first or last index, depending on context:

>>>s[2:]'nadu'>>>s[:3]'Xan'>>>s[:]'Xanadu'

You can also use negative numbers in slices:

>>>print(s[-2:])'du'

To understand slices, it's easiest not to count the elements themselves. It is a bit like counting not on your fingers, but in the spaces between them. The list is indexed like this:

Element:     1     2     3     4Index:    0     1     2     3     4         -4    -3    -2    -1

So, when we ask for the [1:3] slice, that means we start at index 1, and end at index 2, and take everything in between them. If you are used to indexes in C or Java, this can be a bit disconcerting until you get used to it.

String constants

[edit |edit source]

String constants can be found in the standard string module. An example is string.digits, which equals to '0123456789'.

Links:

Python documentation of "string" module -- python.org

String methods

[edit |edit source]

There are a number of methods or built-in string functions:

capitalize
center
count
decode
encode
endswith
expandtabs
find
index
isalnum
isalpha
isdigit
islower
isspace
istitle
isupper
join
ljust
lower
lstrip
replace
rfind
rindex
rjust
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill

Only emphasized items will be covered.

is*

[edit |edit source]

isalnum(), isalpha(), isdigit(), islower(), isupper(), isspace(), and istitle() fit into this category.

The length of the string object being compared must be at least 1, or the is* methods will return False. In other words, a string object of len(string) == 0, is considered "empty", or False.

isalnum returns True if the string is entirely composed of alphabetic and/or numeric characters (i.e. no punctuation).
isalpha andisdigit work similarly for alphabetic characters or numeric characters only.
isspace returns True if the string is composed entirely of whitespace.
islower,isupper, andistitle return True if the string is in lowercase, uppercase, or titlecase respectively. Uncased characters are "allowed", such as digits, but there must be at least one cased character in the string object in order to return True. Titlecase means the first cased character of each word is uppercase, and any immediately following cased characters are lowercase. Curiously, 'Y2K'.istitle() returns True. That is because uppercase characters can only follow uncased characters. Likewise, lowercase characters can only follow uppercase or lowercase characters. Hint: whitespace is uncased.

Example:

>>>'2YK'.istitle()False>>>'Y2K'.istitle()True>>>'2Y K'.istitle()True

Title, Upper, Lower, Swapcase, Capitalize

[edit |edit source]

Returns the string converted to title case, upper case, lower case, inverts case, or capitalizes, respectively.

Thetitle method capitalizes the first letter of each word in the string (and makes the rest lower case). Words are identified as substrings of alphabetic characters that are separated by non-alphabetic characters, such as digits, or whitespace. This can lead to some unexpected behavior. For example, the string "x1x" will be converted to "X1X" instead of "X1x".

Theswapcase method makes all uppercase letters lowercase and vice versa.

Thecapitalize method is like title except that it considers the entire string to be a word. (i.e. it makes the first character upper case and the rest lower case)

Example:

s='Hello, wOrLD'print(s)# 'Hello, wOrLD'print(s.title())# 'Hello, World'print(s.swapcase())# 'hELLO, WoRld'print(s.upper())# 'HELLO, WORLD'print(s.lower())# 'hello, world'print(s.capitalize())# 'Hello, world'

Keywords: to lower case, to upper case, lcase, ucase, downcase, upcase.

count

[edit |edit source]

Returns the number of the specified substrings in the string. i.e.

>>>s='Hello, world'>>>s.count('o')# print the number of 'o's in 'Hello, World' (2)2

Hint: .count() is case-sensitive, so this example will only count the number of lowercase letter 'o's. For example, if you ran:

>>>s='HELLO, WORLD'>>>s.count('o')# print the number of lowercase 'o's in 'HELLO, WORLD' (0)0

strip, rstrip, lstrip

[edit |edit source]

Returns a copy of the string with the leading (lstrip) and trailing (rstrip) whitespace removed. strip removes both.

>>>s='\t Hello, world\n\t '>>>print(s)Hello,world>>>print(s.strip())Hello,world>>>print(s.lstrip())Hello,world# ends here>>>print(s.rstrip())Hello,world

Note the leading and trailing tabs and newlines.

Strip methods can also be used to remove other types of characters.

importstrings='www.wikibooks.org'print(s)print(s.strip('w'))# Removes all w's from outsideprint(s.strip(string.lowercase))# Removes all lowercase letters from outsideprint(s.strip(string.printable))# Removes all printable characters

Outputs:

www.wikibooks.org.wikibooks.org.wikibooks.

Note that string.lowercase and string.printable require an import string statement

ljust, rjust, center

[edit |edit source]

left, right or center justifies a string into a given field size (the rest is padded with spaces).

>>>s='foo'>>>s'foo'>>>s.ljust(7)'foo    '>>>s.rjust(7)'    foo'>>>s.center(7)'  foo  '

join

[edit |edit source]

Joins together the given sequence with the string as separator:

>>>seq=['1','2','3','4','5']>>>' '.join(seq)'1 2 3 4 5'>>>'+'.join(seq)'1+2+3+4+5'

map may be helpful here: (it converts numbers in seq into strings)

>>>seq=[1,2,3,4,5]>>>' '.join(map(str,seq))'1 2 3 4 5'

now arbitrary objects may be in seq instead of just strings.

find, index, rfind, rindex

[edit |edit source]

The find and index methods return the index of the first found occurrence of the given subsequence. If it is not found, find returns -1 but index raises a ValueError.rfind and rindex are the same as find and index except that they search through the string from right to left (i.e. they find the last occurrence)

>>>s='Hello, world'>>>s.find('l')2>>>s[s.index('l'):]'llo, world'>>>s.rfind('l')10>>>s[:s.rindex('l')]'Hello, wor'>>>s[s.index('l'):s.rindex('l')]'llo, wor'

Because Python strings accept negative subscripts, index is probably better used in situations like the one shown because using find instead would yield an unintended value.

replace

[edit |edit source]

Replace works just like it sounds. It returns a copy of the string with all occurrences of the first parameter replaced with the second parameter.

>>>'Hello, world'.replace('o','X')'HellX, wXrld'

Or, using variable assignment:

string='Hello, world'newString=string.replace('o','X')print(string)print(newString)

Outputs:

Hello, worldHellX, wXrld

Notice, the original variable (string) remains unchanged after the call toreplace.

expandtabs

[edit |edit source]

Replaces tabs with the appropriate number of spaces (default number of spaces per tab = 8; this can be changed by passing the tab size as an argument).

s='abcdefg\tabc\ta'print(s)print(len(s))t=s.expandtabs()print(t)print(len(t))

Outputs:

abcdefg abc     a13abcdefg abc     a17

Notice how (although these both look the same) the second string (t) has a different length because each tab is represented by spaces not tab characters.

To use a tab size of 4 instead of 8:

v=s.expandtabs(4)print(v)print(len(v))

Outputs:

abcdefg abc a13

Please note each tab is not always counted as eight spaces. Rather a tab "pushes" the count to the next multiple of eight. For example:

s='\t\t'print(s.expandtabs().replace(' ','*'))print(len(s.expandtabs()))

Output:

 **************** 16

s='abc\tabc\tabc'print(s.expandtabs().replace(' ','*'))print(len(s.expandtabs()))

Outputs:

 abc*****abc*****abc 19

split, splitlines

[edit |edit source]

Thesplit method returns a list of the words in the string. It can take a separator argument to use instead of whitespace.

>>>s='Hello, world'>>>s.split()['Hello,','world']>>>s.split('l')['He','','o, wor','d']

Note that in neither case is the separator included in the split strings, but empty strings are allowed.

Thesplitlines method breaks a multiline string into many single line strings. It is analogous to split('\n') (but accepts '\r' and '\r\n' as delimiters as well) except that if the string ends in a newline character,splitlines ignores that final character (see example).

>>>s="""... One line... Two lines... Red lines... Blue lines... Green lines... """>>>s.split('\n')['','One line','Two lines','Red lines','Blue lines','Green lines','']>>>s.splitlines()['','One line','Two lines','Red lines','Blue lines','Green lines']

The methodsplit also accepts multi-character string literals:

txt='May the force be with you'spl=txt.split('the')print(spl)# ['May ', ' force be with you']

Unicode

[edit |edit source]

In Python 3.x, all strings (the type str) contain Unicode per default.

In Python 2.x, there is a dedicated unicode type in addition to the str type: u = u"Hello"; type(u) is unicode.

The topic name in the internal help is UNICODE.

Examples for Python 3.x:

v = "Hello Günther"
- Uses a Unicode code point directly in the source code; that has to be in UTF-8 encoding.
v = "Hello G\xfcnther"
- Specifies 8-bit Unicode code point using \xfc.
v = "Hello G\u00fcnther"
- Specifies 16-bit Unicode code point using \u00fc.
v = "Hello G\U000000fcnther"
- Specifies 32-bit Unicode code point using \U000000fc, the U being capitalized.
v = "Hello G\N{LATIN SMALL LETTER U WITH DIAERESIS}nther"
- Specifies a Unicode code point using \N followed by the unicode point name.
v = "Hello G\N{latin small letter u with diaeresis}nther"
- The code point name can be in lowercase.
n = unicodedata.name(chr(252))
- Obtains Unicode code point name given a Unicode character, here of ü.
v = "Hello G" + chr(252) + "nther"
- chr() accepts Unicode code points and returns a string having one Unicode character.
c = ord("ü")
- Yields the code point number.
b = "Hello Günther".encode("UTF-8")
- Creates a byte sequence (bytes) out of a Unicode string.
b = "Hello Günther".encode("UTF-8"); u = b.decode("UTF-8")
- Decodes bytes into a Unicode string via decode() method.
v = b"Hello " + "G\u00fcnther"
- Throws TypeError: can't concat bytes to str.
v = b"Hello".decode("ASCII") + "G\u00fcnther"
- Now it works.
f = open("File.txt", encoding="UTF-8"); lines = f.readlines(); f.close()
- Opens a file for reading with a specific encoding and reads from it. If no encoding is specified, the one of locale.getpreferredencoding() is used.
f = open("File.txt", "w", encoding="UTF-8"); f.write("Hello G\u00fcnther"); f.close()
- Writes to a file in a specified encoding.
f = open("File.txt", encoding="UTF-8-sig"); lines = f.readlines(); f.close()
- The -sig encoding means that any leading byte order mark (BOM) is automatically stripped.
f = tokenize.open("File.txt"); lines = f.readlines(); f.close()
- Automatically detects encoding based on an encoding marker present in the file, such as BOM, stripping the marker.
f = open("File.txt", "w", encoding="UTF-8-sig"); f.write("Hello G\u00fcnther"); f.close()
- Writes to a file in UTF-8, writing BOM at the beginning.

Examples for Python 2.x:

v = u"Hello G\u00fcnther"
- Specifies 16-bit Unicode code point using \u00fc.
v = u"Hello G\U000000fcnther"
- Specifies 32-bit Unicode code point using \U000000fc, the U being capitalized.
v = u"Hello G\N{LATIN SMALL LETTER U WITH DIAERESIS}nther"
- Specifies a Unicode code point using \N followed by the unicode point name.
v = u"Hello G\N{latin small letter u with diaeresis}nther"
- The code point name can be in lowercase.
unicodedata.name(unichr(252))
- Obtains Unicode code point name given a Unicode character, here of ü.
v = "Hello G" + unichr(252) + "nther"
- chr() accepts Unicode code points and returns a string having one Unicode character.
c = ord(u"ü")
- Yields the code point number.
b = u"Hello Günther".encode("UTF-8")
- Creates a byte sequence (str) out of a Unicode string. type(b) is str.
b = u"Hello Günther".encode("UTF-8"); u = b.decode("UTF-8")
- Decodes bytes (type str) into a Unicode string via decode() method.
v = "Hello" + u"Hello G\u00fcnther"
- Concatenates str (bytes) and Unicode string without an error.
f = codecs.open("File.txt", encoding="UTF-8"); lines = f.readlines(); f.close()
- Opens a file for reading with a specific encoding and reads from it. If no encoding is specified, the one of locale.getpreferredencoding() is used[VERIFY].
f = codecs.open("File.txt", "w", encoding="UTF-8"); f.write(u"Hello G\u00fcnther"); f.close()
- Writes to a file in a specified encoding.
- Unlike the Python 3 variant, if told to write newline via \n, does not write operating system specific newline but rather literal \n; this makes a difference e.g. on Windows.
- To ensure text mode like operation one can write os.linesep.
f = codecs.open("File.txt", encoding="UTF-8-sig"); lines = f.readlines(); f.close()
- The -sig encoding means that any leading byte order mark (BOM) is automatically stripped.

Links:

Unicode HOWTO for Python 3, docs.python.org
Unicode HOWTO for Python 2, docs.python.org
Processing Text Files in Python 3, curiousefficiency.org
PEP 263 – Defining Python Source Code Encodings, python.org
unicodedata — Unicode Database in Python Library Reference, docs.python.org
Get a list of all the encodings Python can encode to, stackoverflow.com

External links

[edit |edit source]

"String Methods" chapter -- python.org
Python documentation of "string" module -- python.org

Previous: Numbers

Index

Next: Lists

Retrieved from "https://en.wikibooks.org/w/index.php?title=Python_Programming/Strings&oldid=4357767"

Category:

Book:Python Programming

[8]ページ先頭