Home
About Us Privacy Policy
 

STRINGS IN PYTHON

ADVERTISEMENT

Intermediate String



What are Strings, and why are they used?

As discussed here, the Python programming language provides an abstraction to handle several data types, and strings are supported. Strings allow for storing, manipulating textual data. Depending on the use case, almost every usable program will process textual data, some more than others. For example, a word processor might handle enormous textual information, whereas a photo viewing application might handle tiny amounts of textual data. Nonetheless, textual processing is a significant part of most applications. Having an abstraction provided by the languages takes the strain of the software developer to handle such type of data.

From now on, every reference to string(s) in the entire tutorial will refer to textual data.

Let us take a real-world example to understand the usage of strings. Suppose you are working on a Payroll project; can you think about what information is required to manage and process payments?

Try to thinks about as many details as possible, and see how many are accurate.

Click to reveal.

Now, try to classify the data types, specifically textual data.

Click to reveal.

If you got at least 50% correct, congratulations!. As you can see, 71% of the details are strings. From my experience in software development, most software uses strings for the majority of the functionality. Notable examples include real estate management for storing tenants' information, leases, plagiarism detectors for discovering the uniqueness of content, social media websites to read posts from friends and family, your document containing homework in a word processor, etc. It is because we humans use language to communicate with each other for the most part. Hence, strings are imperative to software development enabling interaction between humans.

Strings allow the representation of information understood by humans, thus enabling meaningful interaction.

Strings are immutable, meaning once set, it's impossible to manipulate their values.


ADVERTISEMENT

How are strings stored in Python vs. other languages?
Objects are part of a yet uncovered topic, i.e., class, and will be covered later.

Since Python is a high-level programming language, almost everything in Python is an object. For simplicity, objects are nothing but data coupled with accessible methods. It can be perplexing at first, but things will become more apparent once we cover class and objects.


As you can from the diagram above, the representation of strings in Python is handled very differently than other lower-level programming languages such as C.



How to define a string?

Strings in Python can be defined using enclosing single or double quotations.

Examples

'This is a string'               # single line string
"This is also a string"      # single line string
'''Even this is a string'''    # multi line string
"""Yes, this too!"""            # multi line string

You can come across situations when you're required to store strings spanning multiple lines, such as embedding in-built documentation in software, contact information, etc. To achieve that eloquently, Python provides multi-line strings that can be enclosed using single or double quotations.

Examples:

'''
A multi-line string
spanning
six
lines
'''


"""
Also, a multi-line string
spanning
six
lines
"""

Since Python is an object-oriented language, it also provides the str() method to create strings.

Example:

# Converting integer to string
my_age = 24
my_age_in_str = str(my_age)
print(my_age_in_str)       # '24'

# Printing the type using the type() method.
print(type(my_age), type(my_age_in_str))    # <class 'int'>, <class 'str'>

24
<class 'int'> <class 'str'>

We will discuss the str() method in depth when covering Typecasting.


Finding out the length of the string

Python provides the in-built len() method to fetch the length of the string.

# Example 1
my_name = "James Bond"
length = len(my_name)
print(length)   # 10

# Example 2
phone_number = "+12722031305"
no_of_digits = len(phone_number) - 1    # omit the '+' at the beginning
print("My phone number has the following no. of digits: ", no_of_digits)

10
My phone number has the following no. of digits: 1

Where can this be useful?

You can use it to find out the no. of digits in a phone number.


Joining Strings

To join two or more strings, Python overloads the "+" operator allowing concatenation.

Yes, this is the same operator as explained in operators, to add two or more numbers, but Python understands the context it's being used and performs the relevant operation.

Example

__my_first_name = "James"
__my_last_name = "Bond"
__my_full_name = __my_first_name + " " + __my_last_name
print(__my_full_name)    # James Bond

James Bond

Where can this be useful?

Concatenation is extremely useful in compacting different pieces of information, as illustrated in the above example.


Accessing parts of Strings

As discussed, strings in Python are objects and provide a high level of abstraction. This abstraction allows for easy access to certain parts of strings by providing a feature known as slicing to access values between a given index range and access parts of strings.

To access a substring, use the following syntax:
str_object_name[starting_index:ending_index]

The string indexing starts from the index of 0 and not 1.

Many programmers mistakenly use the wrong index to access elements, causing segmentation faults or exceptions.

To avoid errors, always use the len() method to verify the length of the string.

Example 1

# Access the first five characters in the string
natural_numbers = "0123456789"
first_five_numbers = natural_numbers[0:5]
print(first_five_numbers)        # 01234

01234

Where can this be used?

We can use slicing to extract relevant data, such as accessing phone numbers, last names, account details, etc.

The variable first_five_numbers contains a reference to the string and does not store the values themselves.
Finding a Substring

There can be situations where you might need to ascertain the presence of a string within another string. For example, we can use this to find country code in a phone number, a specific pattern within a string, street name in an address, etc.

There are two things we can find out:

  1. The presence of a substring within a string. E.g., Does this string exists?
  2. If this string is present, return its starting position.


ADVERTISEMENT

Checking for the presence of the string.

In Python, we can verify the presence of the string by using the in and not in operators that we covered in operators.

# Example
vowels = "AEIOUaeiou"
print('A' in vowels)     # TRUE
print('Z' in vowels)     # FALSE
print('u' in vowels)     # TRUE
print("V" not in vowels)   # TRUE

True
False
True
True

Where can this is useful?

It can be helpful when we perform operations based on the presence of the string. Its position is not relevant, just its presence.


Finding the exact position of the substring

To find the exact position of the substring, Python provides two methods.

find()

This method will try to find the substring within a string. If found, it will return a non-zero value signifying its index. If not, it will return -1 to indicate its absence.

# Example
p = "ABCD".find("C")
print(p)  # 2

2

Remember, indexes start from 0 and not 1.

index()

This method will try to index the substring. This method assumes that the substring is present. If found, it will return a non-zero value, signifying its index. If not, it will raise a "ValueError" exception.

# Example
p = "ABCD".index("C")
print(p)                             # 2

p = "ABCDE".index("Q")
print(p)   # ValueError: substring not found

2
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    p = "ABCDE".index("Q")
ValueError: substring not found

Before using the index() method, use the membership operators in and not in to check for string presence.


String interpolation

We have covered various topics about strings, but how can we handle if the string must be in a specific format? For example, when generating a report or sending an email. It would be very tedious for the programmer to concatenate every string, and doing so could potentially lead to bugs. Fortunately, developers of Python have thought about this issue and implemented in-built and methods to format a string.

The format() method

Python string object has a format method that replaces '{}' with the associated argument or value passed to it.

Let me illustrate an example to help you understand.

# Example
template = "My name is {}, and I am {} years old."
name = "James Bond"
age = 46
print(template.format(name, age))

My name is James Bond, and I am 46 years old.

As you can understand, the format method accepted name and age and substituted those values with the corresponding '{}' (identifier), and returned a formatted string. The format method can take any data type and replace '{}' with its value.

f-strings

Python provides another way of interpolating values in a string. Here is an example.

name = "James Bond"
age = 46
result = f"My name is {name}, and I am {age} years old."
print(result)

My name is James Bond, and I am 46 years old.

In the above example, the string must be immediately prefixed by f. The variable/value that must be substituted must be within curly braces {}.


Difference between f-string and format().

The main difference between the two is that the variables need to be present beforehand executing the f-string. Whereas the values can be interpolated with the format() method when required.


Escape characters

Escape characters are invisible to the viewer but perform a particular task, such as formatting the text. If you look on the left of your keyboard, particularly the "Tab" key, upon pressing the key, the character "\t" is inserted into the text, specifying the presence of a tab. Similarly, pressing "enter" inserts the new-line character encoded as "\n". These characters instruct the text processor to format the string in a certain way.

Escape characters begin with a backward slash with the corresponding code ahead.

Below is the list of all escape characters.

Escape characters Description Example
\n The New Line
print("A\nB\nC")
\t The Horizontal Tab character
print("A\tB\tC")
\' The single quote escape character
print("\'My Favourite sentence is this itself.\'")
\" The double quote escape character
print("\"My Favourite sentence is this itself.\"")
\\ The Backlash escape character
print("A\\B\\C")
\r The Carriage-return character
print("A\rB\rC")
\b The Backspace character
print("A\bB\bC")
\f The Form feed character N/A
\ooo The Octal value character
print("\160\171\164\150\157\156")
\xhh The Hex value character
print("\xDE\xAD\xBE\xEF")

As discussed, you can create a string by using either single or double-quotes. However, this creates a problem. What will happen when the string contains the enclosing character as part of the string, such as in quotes. Let's understand using the examples below.

# Example
a_string_with_error = "John said, "His name is James. James Bond""
print(a_string_with_error)

# SyntaxError: invalid syntax

File "test.py", line 2
    a_string_with_error = "John said, "His name is James. James Bond""
                                       ^
SyntaxError: invalid syntax

It is because the enclosing string is the same as the character("). This confuses the interpreter, and it thinks that the string has ended. However, it hasn't. To resolve this issue, either use a multi-line string or a single quote string or vice-versa.

Additionally, you can use '\"' to escape the enclosing interpretation and consider it part of the string .

Where can it be used?

It can be used to contain characters that have special meaning such as the backspace character.


Raw Strings

As we have learned, that using special characters in strings results in specific formatting. To ignore any escape character processing, Python provides Raw Strings. To achieve that, add the prefix "r" before the string.

Here is an example.

file_location = r"C:\Users\tut\Desktop\n\new_file.txt"
print (file_location)

C:\Users\tut\Desktop\n\new_file.txt

When can it be used?

It can be used when mentioning file paths, as illustrated in the above example.


Extracting Raw Bytes

In Python, strings are represented as immutable objects. There are instances when access to underlying raw bytes is required. To convert a string object to a sequence of bytes, Python provides the encode method.

# Here is an example.

my_name = "James Bond"
my_name_in_bytes = my_name.encode("UTF-8")
print(my_name_in_bytes)

b'James Bond'

UTF-8 instructs Python to encode the string in the character encoding and return raw bytes. Each character in the returned sequence is of precisely 1 byte.


Where can it be used?

The encode() method is required whenever byte representation of the string is necessary. It becomes useful when performing cryptographic or network-related functions.


ADVERTISEMENT

Modifying strings

Since strings are immutable in Python, once initialized, you cannot change their value. To manipulate strings, use slicing to extract relevant parts and create a new string.

# Example
# Change the first character to 'Z'
letters = 'ABCDEFGHIJ'
letters = 'Z' + letters[1:]
print(letters)

ZBCDEFGHIJ

Notice that the end index can be omitted when the rest of the string has to be accessed.

You cannot directly alter the value of strings by referring to their index.

There is also an alternate method to alter strings by converting them into a list, performing the changes, and finally converting them into the string. This will be revisited when we will go through lists.


Unicode strings

Unicode characters extend the UTF-8 character set, which is the same as the ASCII set. They allow access to an extended character set.

print("Ň")

Ň

To use Unicode characters, copy-paste the character/symbols directly or refer them by its encoding such as "\u2665". Here is an example.

y = '♥'
x = "\u2665"
print(y == x)     # TRUE

True

Notation:
\u4_digit_hex_character
\U8_digit_hex_character

When mentioning Unicode with four hexadecimal digits, use lowercase 'u' and uppercase 'U' when specifying eight hexadecimal digits .

If the total no. of hexadecimal character is less than 4 or 8, prepend 0 to adjust.

# Example
# 394 is Δ
print('\u0394')                     # lower case 'u'
print('\U00000394')            # upper case 'U'
print('\u0394' == '\U00000394')   # True

True

You can see the complete list of Unicode characters here.

Where can this be used?

Unicode strings are instrumental when you want to develop a Multilingual User Interface.


Conclusion

In this chapter, we learned about what strings are in programming, their importance, usage, and immutable attribute. We also covered Python's memory model used to store strings versus other programming languages.

Additionally, we learned about defining strings in Python and different syntaxes used to define a string, finding the length of the string, joining string using the concatenation operator, finding and accessing substring, several string methods.

Furthermore, we also discussed string interpolation using the format() method and f-string technique, their advantages, and differences, escape characters, raw strings, converting strings into bytes, modifying strings, and Unicode strings.


ADVERTISEMENT



All product names, logos, and brands are property of their respective owners.