Advanced File Handling
Files are documents that are stored on a long-term basis on devices such as hard-drive, SSD, pen drives, etc. Few examples of files are the text files on your desktop, images, videos, those pesky .dll files that are invariably missing, any word document, excel sheets, web page, executable software, etc. To put it simply, any information that is accessible on a long-term basis is a file.
There are two types of files, namely: text and binary files.
Examples of text files:
Examples of binary files:
At the core of any real-world application, file handling is an integral part. Let us take an example of a website to store user documents such as a cloud service backup, desktop applications such as image and video editors to save/read media files, or a mobile application such as storing user photos, contacts, documents. It allows for storing details for later use.
Unlike variables that we have been using to develop programs are stored in the memory or RAM for the duration of the program; Files on the flip side are stored on long-term storage devices such as Hard Drives, SSD.
Python provides several methods for reading, creating, updating, and removing files.
Python has several file modes to perform various tasks such as reading, writing, appending, or creating files.
Modes are case-insensitive. Meaning "r" is the same as "R".
Mode | Description |
---|---|
r | This mode will open a file in reading mode only. This is the default mode by default. |
rb | This mode will open a file in reading mode only in binary mode. |
r+ | This mode will open a file for both reading and writing operation. |
rb+ | This mode will open a file for both reading and writing operations in binary mode. |
w | This mode will open a file for writing operations. This mode will override any existing file. |
wb | This mode will open a file for writing operations in binary mode. This mode will override any existing file. |
w+ | This mode will open a file for writing operations. This mode will override any existing file. |
wb+ | This mode will open a file for writing and reading operations in binary mode. This mode will override any existing file. |
a | This mode will open a file for appending operations. If the concerned file does not exists, it will create a new file. |
ab | This mode will open a file for appending operations in binary mode. If the concerned file does not exists, it will create a new file. |
a+ | This mode will open a file for appending and reading operations. If the concerned file does not exists, it will create a new file. |
ab+ | This mode will open a file for appending and reading operations in binary mode. If the concerned file does not exists, it will create a new file. |
To establish a connection with the file, Python provides an in-built open() function. The open() method throws FileNotFoundError exception if the file is not present as per the path specified.
The open() function takes two parameters:
By default, the mode is "r".
All the available file modes are described above.
Example:
Create a file named "my_text_document.txt" in the current directory.
The above two statements can also be combined into one as follows:
If the file is successfully found, the open() function returns the filehandle object of class "_io.TextIOWrapper". Let us verify this as shown in the below example:
<class '_io.TextIOWrapper'>
Whenever we open a file resource, the operating system acknowledges this. It marks the file as being utilized until the program informs the operating system that it is no longer in use and is available for other programs to access. To notify the OS(operating system) that the file resource is no longer required, Python provides a method eloquently named close().
Here is the syntax:
close(<file_handle_instance>)
Example:
Good question!. The advantage of using an exception handling mechanism is that it allows for navigating from unintentional situations such as exceptions and errors and enabling correct methodology. The use of finally block perfectly explains the advantage of implementing code that deals with external variables, such as files.
If you have followed this tutorial, we've discussed that the finally block executes irrespective of an exception. It allows for the closing of used resources such as file handles to be appropriately terminated even in errors or exceptions. The below example demonstrates this:
How many characters do you want to read?
20
Python is a high-lev
Now let us break down every statement in the above example.
#1:
#2:
#3:
#4:
However, statement #2 tries to accept user input. What if the user entered invalid characters that cannot be interpreted as integers? It would lead to the ValueError exception being raised by Python. As a result, the program execution will halt, and the statement responsible for closing the file resource handle would not execute. As a result, our file handle would open but would never be closed. Doing so is against best programming practices.
However, let us rewrite the same program using finally block as demonstrated below:
How many characters do you want to read?
23
Python is a high-lev el
Let us break down the code in the finally block:
#1:
Remember that attempting to close a file handle that does not exist will result in an Exception.
Not doing so leads to corruption of files, as multiple programs may try to override file data.
Reading of files involves a structure as follows:
As earlier mentioned, File handling is an integral part of most practical applications. One of the most common operations a program does, it reading and writing to the file. In this section, we will cover the reading of files.
Since Python is a high-lev el language, it provides several methods that provide an abstraction to perform such complicated operations. It allows the developers to focus on solving the business problem rather than dealing with the way computers work.
As a general census, most programs manipulate textual data. To read the entire text file in memory at once, Python provides the read() method.
The read() method reads until the EOF(End-of-File) character is encountered.
Here is an example:
Create a file named "my_text_file.txt" in the current directory and place the following text in it.
Posuere. volutpat ullamcorper nunc quam. fringilla sed, neque blandit volutpat, orci et suspendisse porttitor. nisi tempus quis luctus, metus a sem maximus etiam pharetra. eu felis dignissim venenatis suspendisse arcu. varius efficitur et, sapien ac commodo orci, mauris donec nisi. ultricies in, urna porttitor sodales, lectus non integer tortor. vulputate quis ligula, vehicula eget nullam dapibus. convallis elit in nibh hendrerit vestibulum convallis. diam iaculis nec aliquam, nibh ultrices diam varius etiam imperdiet. elementum turpis id nunc at duis nisl. et vestibulum consequat, fermentum vitae porttitor dui, lectus nam ex. eu ut luctus interdum nulla in elit ut aenean amet. sit dolor elit, adipiscing consectetur ipsum lorem.
Copy the below code in a file named: "read.py" in the current directory.
Please do not read huge files as they can cause the memory to run out, possibly leading to a system crash.
As observed in the above example, we can use the read() method to read the entire content in a file. Let us revisit the above example and see how Python is managing the file data .
As the above example demonstrates, the read() method reads the file's entire contents and returns it as a string. It is because Python ignores any newline interpretation and considers it as a single sequence. It means that even if we have paragraphs, they will still be in a single line instead of being in separate lines.
We could write out own code to split paragraphs in different lines, by doing as the below example demonstrates:
Total line(s) are 4
Python is a high-lev el programming Language.
I love to program in Python.
Bye! :)
The above example shows four lines instead of three lines is because of the newline character itself. To omit such characters, consider writing the following:
The read() method returns the entire content of the file as a String object.
In the above example, we implemented our code to split the string into multiple lines. However, implementing our code to read individual lines would lead to verbosity and code duplication.
In programming, we do not want to reinvent the wheel.
Python provides a method to precisely read individual lines from a file, the readlines() method. The readlines() method returns a list. In this list, every element is a separate line. Below is an example:
There are a total of 3 lines in the file. The returned type is <class 'list'>.
Python is a high-lev el programming Language.
I love to program in Python.
Bye! :)
As stated above, reading huge files can result in the program running out of memory. To avoid such scenarios, Python provides the ability to read files in specific amount of characters and provides the read() method to do precisely that. Here is an example:
10 Python is
Good question!. Even when I was learning to program, I asked myself this question several times. Fortunately, you won't have to go through the same loop again. If you try to read more characters than what the file consists of, Python will only read as many characters as available as return them.
In more technical terms, the read() method will read the number of specified characters or until the EOF(End-of-File) character, whichever is earlier.
Here is an example to demonstrate this:
82 Python is a high-lev el programming Language.
I love to program in Python.
Bye! :)
As mentioned earlier, Python allows for accessing files in binary mode. To access a binary mode file, append "b" to the selected mode. Here is an example:
5289384
Your answer will probably be different, depending on the file chosen.
In the prior sections, we have covered the reading of files and some of their pitfalls. In this section, we'll be going to learn how to write files in Python. Writing to a file is a highly complex task when operating on a low level. However, Python provides abstractions that handle such overwhelming tasks.
Some examples of writing to a file are:
Python provides several methods to handle writing files. Writing to a file involves almost the same structure that we have to follow when reading a file.
Here is the structure we have to follow:
To perform write operations, Python provides the write() method. The write() method takes the stream as an argument. In the text mode, the write() method accepts a string as an argument. Below is an example:
If everything was successful, you must have a file "my_contacts.txt" with contacts in it.
Let's break down each statement in the above example. Shall we?
Whenever your program deals with external factors such as a Network, Files, Databases, things out of your control, always implement the relevant code inside an exception handling mechanism.
In the concerned example, we are implementing our logic inside the Try-Catch-Finally block.
#1:
#2:
This statement defines a tuple containing strings as items. The strings have information in the format of name, contact_number.
#3:
This statement iterates over the tuple contains the contact info, appends a newline to the string, and writes the result onto the file.
#4:
This block will catch any Exception because the Exception class is the base class of any exception. If an Exception is raised, it will display the exception information.
#5:
This block will execute irrespective of the Exception status. If the file handle exists, meaning a successful connection was established, it will close the filehandle .
Python also provides a writelines() method to write. The writelines() method accepts Lists and Tuples as an argument. However, all the items must be of type String. Below are some examples:
The previous section discussed that the writelines() method accepts iterable types, namely, List and Tuples, and must contain items of only String type. However, this is particularly restricting when creating real-world applications as we deal with values of several data types such as Numerical Data, Dictionaries, Sets.
Suppose we have a multi-dimensional list that contains a name and cell phone number, and we want to write that information to the disk. How are we going to do that? I'll give you some time to ponder. When you're done click to reveal.
Here is an example to do precisely that:
Please copy the above code, execute it and observe the output.
It did not go as expected. The formatting is all screwed up. The writelines() method internally calls the write() method and does not append a newline at the end. So the $1 question is, How do we fix the formatting?
Well, it's straightforward. Iterate over the list and either:
Below is the fix using formatting the string:
Please re-write the above example using string concatenation.
We have covered writing textual data in files in the above section. However, knowing to write binary data is just as essential. In this section, we will discuss writing binary files. Below are some examples of applications that write binary data:
To write the file in binary mode, append "b" to the mode. Here is an example:
The byte value(s) must always be between 0 and 255. It is because a byte can store only 8 bits. Hence, the minimum and maximum values that a byte can store are 0 and 255, respectively. Below is the binary representation of 0 and 255 in binary.
BitPos: 7 6 5 4 3 2 1 0
0: 0 0 0 0 0 0 0 0
255: 1 1 1 1 1 1 1 1 1
In the next segment, we will discuss appending to files.
Before, we delve into appending data. Let's understand the difference between appending and writing mode. Whenever we open a filehandle in WRITE(W) mode file, a new file gets created. It will override with a blank file of the same name. However, when we open a filehandle in APPEND(A) mode, it won't override the file; instead, it will write data from the end of the file descriptor., i.e., write from the very end of the file.
Here is an example:
Here is an example:
Here are some of the useful miscellaneous methods. You can refer the Python's standard library for in-depth coverage of methods for diverse use-cases.
Checking whether a file or folder exists or not using the exists() method.
To delete a file, Python provides the delete() method in the os module. Here is an example:
To delete an Empty Folder, Python provides the rmdir() method in the os module. Here is an example:
To delete a Non Empty Folder, Python provides the rmtree() method in the shutil module. Here is an example:
The with statement in Python is used in exception handling to make the code concise, easier to maintain, debug, and faster to develop. Since version 3, Python incorporated it to simplify the management of resources such as Files. Below is an example to demonstrate the difference:
Management of the file using the with statement
The version using the with statement is 30% smaller than the try-except-finally method. The with statement automatically handles proper resource management such as connection(if the file is present) and closing the file. As a software developer, you should always follow the best methodology to write clear, concise code to ensure faster development, fewer bugs, and a maintainable codebase.
We have been discussing the usage of text and binary file modes. What we haven't covered yet is the actual difference of operation between the two. In my experience, many tutorials/books often overlook and move on, just like your ex. However, I won't.
To understand the differences between binary and text modes, we must know how they will be used and interpreted. First, we have to understand how text files are stored. Most text files are either stored in ASCII or UTF-8 encoding. You can read more about ASCII and UTF-8 here. ASCII and UTF-8 are the same.
However, UTF-16 and UTF-32 vastly expand the amount and type of characters that they can represent. In this tutorial, we will be understanding UTF-8 encoding.
Now, let us delve a bit deeper into binary data. Let us take the binary string of 1000001. If we convert this into a number, it would be 65. If string, then this would become the character uppercase 'A'. How do we know when the binary sequence 1000001 should be considered the numerical value 65 or the character 'A'?. This is where encoding comes into the picture. When we save the text file on our text editor, it is encoded as UTF-8 or ASCII and read back as such. We can even create our encoding and make our own character set, where the value of 65 represents 'Z'. It is up to the encoding and the software how to interpret it.
However, when we perform file handling using binary encoding, the numerical value of 65 gets written. Its exact binary value. It is up to the software that will utilize it, interpret it, and perform necessary actions. The value of 65 could mean an instruction for the CPU, a pixel value, or audio data. Additionally, special characters such as newline, tabs, spaces, carriage return, line-feed do not get processed and are read as they are, i.e., sequence of bits.
In programming, Streams are the flow of data. They could be an incoming stream(reading a file) or an outgoing stream(writing to the file).
The above illustration shows that the Python program accepts an input stream by accepting user keystrokes from the keyboard, read file from the disk, and receives packets from the network; processes the input data and outputs the result to the screen, write to the disk or send the response to the network request.
We have covered reading, writing, and appending file data. However, we have been processing in streams, i.e., continuous data flow, whether reading or writing. What if we want to access data in buffers for efficient processing, which is crucial in applications such as video players, video editors, and large database files?
To achieve such tasks, Python provides the seek() method. Before learning about the seek() method, we must understand more about streams and file pointers.
The above diagram illustrates how the data flows in a stream and how the file pointer points to the current byte offset and returns that value.
In the illustration, the file pointer is pointing to offset 4, accessing the value E. When the file pointer is incremented or decremented, it will move to the corresponding byte offset and read the respective value at the offset.
Armed with this knowledge, let us dive into the seek() method.
The seek() method accepts the following definition:
seek(offset, whence)
offset: This is the position of the file pointer within the file.
whence: This can have three values:
Now, let us write a program to read the first two characters from the middle of the file.
Create a text file called "my_test_file.txt" and save the following content in it:
0123456789
Create a file named seek.py and save the following content in it:
As you can observe, from the results, we printed two characters from the middle.
Now let us understand the statement responsible.
The following statement
The statement
Finally, the statement
In this chapter, we learned about what files are, establishing a connection to a file in text or binary mode, pitfalls, successfully closing a file, the benefits of using the with statement, differences between text mode and file mode. The consumption of file data in streams and moving the file pointer using the seek() method.