10. String Handling#
10.1. (String Processing)#
Strings are one of the most frequently used data formats in Python, alongside numeric data types. They are essential when dealing with text input, filenames, comments, text-based searches, text files, etc.
At the start of the lecture, we encountered some basic string operations:
print("this" + "that")
print(10 * "?")
Strings are sequences in Python, which means sequence methods can be used on them, such as:
print("that" in "thatsame") # => True
print("x" not in "abcdefg") # => True
Slicing works on strings as well:
print("Yes No"[:2]) # => Yes
MINI QUIZ:
Quick review: What does the following command output?
s = "all good"
print(len(s))
a) ValueError
b) 8
c) 9
d) all good
And this one?
s = "nothing is good!"
print(s[-7:])
a) is good!
b) sthcin
c) ValueError
10.2. Escape Characters - Fix the String#
Some characters or character combinations are interpreted specially by Python. For example, the following would raise an error:
print("This says "Stop"!") # => SyntaxError: invalid syntax
This can be fixed using different quotation marks:
print('This says "Stop"!') # => This says "Stop"!
Alternatively, escape sequences using a backslash (\
) can resolve such issues:
print("This says \"Stop\"!") # => This says "Stop"!
print("A backslash looks like this: \\")
Paths on Windows, for example, often require double backslashes for this reason:
path = "C:\\User\\Desktop\\"
filename = path + "testfile.txt"
Escape sequences can also insert special characters, such as \n
for a newline:
print("With this, we can write long \n texts across lines.")
10.3. String Methods#
Python provides a wide range of string methods. Let’s cover the most important ones.
When unsure of the appropriate method, refer to resources like the Python Documentation or w3schools.com.
10.3.1. Table of String Methods#
Method |
Description |
---|---|
|
Counts occurrences of a specified value in a string |
|
Returns an encoded version of the string |
|
Returns True if the string ends with the specified value |
|
Returns the position of a specified value |
|
Returns True if all characters are lowercase |
|
Returns True if all characters are uppercase |
|
Joins elements of an iterable to the end of the string |
|
Converts the string to lowercase |
|
Removes leading and trailing whitespace (or specified characters) |
|
Replaces a specified value with another value in the string |
|
Splits the string at a specified separator and returns a list |
|
Returns True if the string starts with the specified value |
|
Converts the string to uppercase |
10.3.2. .replace()
#
To replace specific characters, words, or substrings, use .replace()
:
s = "Sometimes we dislike some characters"
s.replace("i", "!")
print(s) # => Nothing happens?
String methods do not modify the original string but return a new one:
s_new = s.replace("i", "!")
print(s_new)
You can chain multiple method calls:
print(s.replace("i", "!").replace("e", "3"))
Mini Quiz!
What does the following command output?
print("abc".replace("ab", "cc").replace("c", "x"))
a) ccx
b) ab
c) xxx
d) abx
10.3.3. .upper()
and .lower()
#
These methods handle case transformations. For example:
tweet = "this is unfair"
tweet_trumpified = tweet.upper() + "!"
print(tweet_trumpified) # => THIS IS UNFAIR!
And back to lowercase with .lower()
:
tweet_moderated = tweet_trumpified.lower()
print(tweet_moderated)
To standardize user input:
entry1 = " Muster, Markus. " # Contains whitespace and a period
entry2 = "Test, Trude " # Extra whitespace
Use .strip()
to remove leading and trailing whitespace (or other characters):
print(entry1.strip())
print(entry2.strip())
10.3.4. Splitting Strings#
To split strings into smaller, meaningful parts, use .split()
. For example:
name = "Muster, Markus"
pieces = name.split(",") # Splits the string at each ','
print(pieces)
Or:
sentence = "Many sentences have many words, sometimes even punctuation!"
words = sentence.split(" ")
print(words)
print(f"This sentence has {len(words)} words.")
You can combine .replace()
with .split()
for further processing:
sentence = "Many sentences have many words, sometimes even punctuation!"
sentence_cleaned = sentence.replace(",", "").replace("!", "").replace(".", "")
words = sentence_cleaned.split(" ")
print(words)
print(f"This sentence has {len(words)} words.")
10.3.5. String Queries#
Python provides several ways to query strings:
10.3.5.1. .startswith()
and .endswith()
#
s = "name: Markus"
s.startswith("name:") # => True
s.endswith(".") # => False
10.3.5.2. .count()
Counts occurrences of a substring:#
s = "Most texts have some 'e's."
print(s.count("s")) # => 3
print(s.count("te")) # => 2
print(s.count("Te")) # => 1
10.3.5.3. .index()
Finds the position of the first occurrence:#
index_te = s.index("te")
print(f"Position in string: {index_te}")
print(s[index_te])
10.3.6. Encodings (Character Encodings)#
Characters are stored on computers using encodings. The most common is Unicode, particularly utf-8
. Another older standard is ASCII.
For now, this is primarily informational. If you encounter a UnicodeDecodeError
while working with strings, remember this topic as a starting point for troubleshooting.