Intermediate String
As discussed here, the Python programming language provides an abstraction to handle several data types, and strings are supported. Strings allow for storing, manipulating textual data. Depending on the use case, almost every usable program will process textual data, some more than others. For example, a word processor might handle enormous textual information, whereas a photo viewing application might handle tiny amounts of textual data. Nonetheless, textual processing is a significant part of most applications. Having an abstraction provided by the languages takes the strain of the software developer to handle such type of data.
From now on, every reference to string(s) in the entire tutorial will refer to textual data.
Let us take a real-world example to understand the usage of strings. Suppose you are working on a Payroll project; can you think about what information is required to manage and process payments?
Try to thinks about as many details as possible, and see how many are accurate.
Click to reveal.
Now, try to classify the data types, specifically textual data.
Click to reveal.
If you got at least 50% correct, congratulations!. As you can see, 71% of the details are strings. From my experience in software development, most software uses strings for the majority of the functionality. Notable examples include real estate management for storing tenants' information, leases, plagiarism detectors for discovering the uniqueness of content, social media websites to read posts from friends and family, your document containing homework in a word processor, etc. It is because we humans use language to communicate with each other for the most part. Hence, strings are imperative to software development enabling interaction between humans.
Strings allow the representation of information understood by humans, thus enabling meaningful interaction.
Strings are immutable, meaning once set, it's impossible to manipulate their values.
Since Python is a high-level programming language, almost everything in Python is an object. For simplicity, objects are nothing but data coupled with accessible methods. It can be perplexing at first, but things will become more apparent once we cover class and objects.
As you can from the diagram above, the representation of strings in Python is handled very differently than other lower-level programming languages such as C.
Strings in Python can be defined using enclosing single or double quotations.
Examples
You can come across situations when you're required to store strings spanning multiple lines, such as embedding in-built documentation in software, contact information, etc. To achieve that eloquently, Python provides multi-line strings that can be enclosed using single or double quotations.
Examples:
Since Python is an object-oriented language, it also provides the str() method to create strings.
Example:
24
<class 'int'> <class 'str'>
We will discuss the str() method in depth when covering Typecasting.
Python provides the in-built len() method to fetch the length of the string.
10
My phone number has the following no. of digits: 1
Where can this be useful?
You can use it to find out the no. of digits in a phone number.
To join two or more strings, Python overloads the "+" operator allowing concatenation.
Yes, this is the same operator as explained in operators, to add two or more numbers, but Python understands the context it's being used and performs the relevant operation.
Example
James Bond
Where can this be useful?
Concatenation is extremely useful in compacting different pieces of information, as illustrated in the above example.
As discussed, strings in Python are objects and provide a high level of abstraction. This abstraction allows for easy access to certain parts of strings by providing a feature known as slicing to access values between a given index range and access parts of strings.
To access a substring, use the following syntax:
str_object_name[starting_index:ending_index]
The string indexing starts from the index of 0 and not 1.
Many programmers mistakenly use the wrong index to access elements, causing segmentation faults or exceptions.
To avoid errors, always use the len() method to verify the length of the string.
Example 1
01234
Where can this be used?
We can use slicing to extract relevant data, such as accessing phone numbers, last names, account details, etc.
The variable first_five_numbers contains a reference to the string and does not store the values themselves.There can be situations where you might need to ascertain the presence of a string within another string. For example, we can use this to find country code in a phone number, a specific pattern within a string, street name in an address, etc.
There are two things we can find out:
In Python, we can verify the presence of the string by using the in and not in operators that we covered in operators.
True
False
True
True
It can be helpful when we perform operations based on the presence of the string. Its position is not relevant, just its presence.
To find the exact position of the substring, Python provides two methods.
This method will try to find the substring within a string. If found, it will return a non-zero value signifying its index. If not, it will return -1 to indicate its absence.
2
Remember, indexes start from 0 and not 1.
This method will try to index the substring. This method assumes that the substring is present. If found, it will return a non-zero value, signifying its index. If not, it will raise a "ValueError" exception.
2
Traceback (most recent call last):
File "test.py", line 5, in <module>
p = "ABCDE".index("Q")
ValueError: substring not found
Before using the index() method, use the membership operators in and not in to check for string presence.
We have covered various topics about strings, but how can we handle if the string must be in a specific format? For example, when generating a report or sending an email. It would be very tedious for the programmer to concatenate every string, and doing so could potentially lead to bugs. Fortunately, developers of Python have thought about this issue and implemented in-built and methods to format a string.
Python string object has a format method that replaces '{}' with the associated argument or value passed to it.
Let me illustrate an example to help you understand.
My name is James Bond, and I am 46 years old.
As you can understand, the format method accepted name and age and substituted those values with the corresponding '{}' (identifier), and returned a formatted string. The format method can take any data type and replace '{}' with its value.
Python provides another way of interpolating values in a string. Here is an example.
My name is James Bond, and I am 46 years old.
In the above example, the string must be immediately prefixed by f. The variable/value that must be substituted must be within curly braces {}.
The main difference between the two is that the variables need to be present beforehand executing the f-string. Whereas the values can be interpolated with the format() method when required.
Escape characters are invisible to the viewer but perform a particular task, such as formatting the text. If you look on the left of your keyboard, particularly the "Tab" key, upon pressing the key, the character "\t" is inserted into the text, specifying the presence of a tab. Similarly, pressing "enter" inserts the new-line character encoded as "\n". These characters instruct the text processor to format the string in a certain way.
Escape characters begin with a backward slash with the corresponding code ahead.
Below is the list of all escape characters.
Escape characters | Description | Example |
---|---|---|
\n | The New Line | print("A\nB\nC") |
\t | The Horizontal Tab character | print("A\tB\tC") |
\' | The single quote escape character | print("\'My Favourite sentence is this itself.\'") |
\" | The double quote escape character | print("\"My Favourite sentence is this itself.\"") |
\\ | The Backlash escape character | print("A\\B\\C") |
\r | The Carriage-return character | print("A\rB\rC") |
\b | The Backspace character | print("A\bB\bC") |
\f | The Form feed character | N/A |
\ooo | The Octal value character | print("\160\171\164\150\157\156") |
\xhh | The Hex value character | print("\xDE\xAD\xBE\xEF") |
As discussed, you can create a string by using either single or double-quotes. However, this creates a problem. What will happen when the string contains the enclosing character as part of the string, such as in quotes. Let's understand using the examples below.
File "test.py", line 2
a_string_with_error = "John said, "His name is James. James Bond""
^
SyntaxError: invalid syntax
It is because the enclosing string is the same as the character("). This confuses the interpreter, and it thinks that the string has ended. However, it hasn't. To resolve this issue, either use a multi-line string or a single quote string or vice-versa.
Additionally, you can use '\"' to escape the enclosing interpretation and consider it part of the string .
Where can it be used?
It can be used to contain characters that have special meaning such as the backspace character.
As we have learned, that using special characters in strings results in specific formatting. To ignore any escape character processing, Python provides Raw Strings. To achieve that, add the prefix "r" before the string.
Here is an example.
C:\Users\tut\Desktop\n\new_file.txt
When can it be used?
It can be used when mentioning file paths, as illustrated in the above example.
In Python, strings are represented as immutable objects. There are instances when access to underlying raw bytes is required. To convert a string object to a sequence of bytes, Python provides the encode method.
b'James Bond'
UTF-8 instructs Python to encode the string in the character encoding and return raw bytes. Each character in the returned sequence is of precisely 1 byte.
Where can it be used?
The encode() method is required whenever byte representation of the string is necessary. It becomes useful when performing cryptographic or network-related functions.
Since strings are immutable in Python, once initialized, you cannot change their value. To manipulate strings, use slicing to extract relevant parts and create a new string.
ZBCDEFGHIJ
Notice that the end index can be omitted when the rest of the string has to be accessed.
You cannot directly alter the value of strings by referring to their index.
There is also an alternate method to alter strings by converting them into a list, performing the changes, and finally converting them into the string. This will be revisited when we will go through lists.
Unicode characters extend the UTF-8 character set, which is the same as the ASCII set. They allow access to an extended character set.
Ň
To use Unicode characters, copy-paste the character/symbols directly or refer them by its encoding such as "\u2665". Here is an example.
True
Notation:
\u4_digit_hex_character
\U8_digit_hex_character
When mentioning Unicode with four hexadecimal digits, use lowercase 'u' and uppercase 'U' when specifying eight hexadecimal digits .
If the total no. of hexadecimal character is less than 4 or 8, prepend 0 to adjust.
True
You can see the complete list of Unicode characters here.
Where can this be used?
Unicode strings are instrumental when you want to develop a Multilingual User Interface.
In this chapter, we learned about what strings are in programming, their importance, usage, and immutable attribute. We also covered Python's memory model used to store strings versus other programming languages.
Additionally, we learned about defining strings in Python and different syntaxes used to define a string, finding the length of the string, joining string using the concatenation operator, finding and accessing substring, several string methods.
Furthermore, we also discussed string interpolation using the format() method and f-string technique, their advantages, and differences, escape characters, raw strings, converting strings into bytes, modifying strings, and Unicode strings.