Tuesday, April 22, 2025

Reading from and writing into files

 

Files and file paths

The programs/applications need to perform various operations on files existing in the computer systems. It is the operating system that connects your programs to the file system. The program writer/application developer can use the os package in Python for this purpose.  The os package has many functions and subpackages, which are illustrated in this section.

Some os package functions

The following os functions are useful to move around the file system of the computer system. 

getcwd(), chdir() and makedir() functions

The getcwd() is used to find the current working directory, whereas chdir() is used to change the directory from current to the specified directory. The following code lines illustrate the same

import os
current_dir = os.getcwd()    # returns the current working directory as str 
print(current_dir)

os.chdir('C:\\windows\\my_dir')        # change directory
print(os.getcwd())            # prints C:\\windows\\my_dir

But Python throws FileNotFoundError if you try to change the cwd to a directory that does not exist in your system. The following code lines show how the makedirs() function is used to create a directory in the file system of your computer system.

os.makedirs('C:\\Users\\hp\\Desktop\\Create_dir')

Look at your desktop for the newly created directory.

File path

The file path is the address of the file in the computer's file system. There are two ways of specefiyin the path of the file: absolute path and relative path

Absolute path: The Absolute path specifies the complete route for any file starting from the root directory (which may be C:\\ \ or D:\\ \ in windows)
For example, C:\\windows\\my_dir\\my_file.py
Relative path: The absolute path is the path of the file relative to the current working directory. It uses a dot (.) to represent the current directory and double dots (..) to represent the parent directory of the current working directory. Consider the following scenario to understand the relative path.

The cwd is C:\\windows\\my_folder.
The folder has my_file.py

Now the relative path for 
--    the file my_file.py is                        .\my_file.py
--    the cwd my_folder is                        .\
--    parent directory windows is              ..\

There are plenty of functions in os package to establish an interface with the operating system. Consult Python documentation if necessary.

os.path module

This module has functions for relative path, absolute path, to compute file size, and aggregating file sizes, listing directory contents, etc.
abspath() and isabs() functions: The abspath() function returns the absolute path for its argument of relative path. Both argument and return type are str. 

import os.path
abs_path = os.path.abspath('.')        # dot represents cwd (assume cwd is my_folder)
print(abs_path)                        #  prints C:\\windows\\users\my_folder

os.path.isabs('.')                                # returns False; 
os.path.isabs(abs_path)                    # returns True

The function os.path.isabs()  function verifies whether its argument is an absolute path of str type. If yes, returns True.

The os.path.relpath() returns the relative path of its absolute argument.

os.path.relpath('C:\\Users\\hp\\Desktop\\Create_dir')

The above code line returns its relative path as '.\' if the current directory is Create_dir. 
The above code line returns its relative path as  '.\Create_dir' if the cwd is Desktop.

basename(), split() and dirname() functions: The basename() function returns the filename of the absolute path argument (str), whereas the dirname() returns the directory name (absolute path) of its absolute path argument. The split() function returns a tuple of the directory name and filename. The following code line illustrates the same.

>>> import os.path
>>> file = os.path.basename('C:\\Users\\hp\\Desktop\\Create_dir\\sample.py')  
'sample.py'                                                                                    # file name
>>> dir = os.path.dirname('C:\\Users\\hp\\Desktop\\Create_dir\\sample.py') 
'C:\\Users\\hp\\Desktop\\Create_dir                # directory and its path
>>> both = os.path.split('C:\\Users\\hp\\Desktop\\Create_dir\\sample.py')
('C:\\Users\\hp\\Desktop\\Create_dir', 'sample.py)        # tuple

Next, you can learn about finding the file size and directory contents.

getsize() and listdir() functions: The os.path.getsize() function returns the size of the file (passed as an argument) in bytes. The listdir() function returns a list of files and subdirectories in the directory passed as an argument. The following code lines illustrate the same.
import os
print(os.path.getsize('C:\\Users\\hp\\Desktop\\PythonScripts'))
print(os.path.getsize('C:\\Users\\hp\\Desktop\\PythonScripts\prettypython.py'))
print(os.listdir('C:\\Users\\hp\\Desktop\\PythonScripts'))

Observe the getsize() and listdir() functions inside the print functions. They display the following on the screen
4096                # size of entire directory (aggregate of all file sizes) in bytes
575                    # size of prettypython.py file in bytes
['autoML_decision trees.zip',  'New folder', 'new.pdf', prettypython.py]

Checking the existence (validity) of files and directories: The functions exists(), isdir(), and isfile() in os.path are used to check the existence of files and directories in your computer system.

print(os.path.exists('C:\\Users\\hp\\Desktop\\PythonScripts'))
print(os.path.isdir('C:\\Users\\hp\\Desktop\\PythonScripts'))
print(os.path.isfile('C:\\Users\\hp\\Desktop\\PythonScripts'))

Observe the functions inside the print function. The print functions print the result of these os.path module functions. The result is
True                # The path exists
True                # yes, the argument is a directory
False                # no, the argument is not a file

Operations on files

Files can be opened and used in two different modes: text mode (also known as string mode) or binary mode. Python offers several functions for performing various operations on files. They are discussed in this post. These are discussed first with text mode and then with binary mode.

File operations in text mode

File open and close

The function open() is used to open a file in various modes: read mode, write mode, read and write mode, and append mode. The function has many parameters, but the open() is shown next with three most important parameters.

The function signature is as follows.
open (file, mode, encoding)        # all three parameters are string (str) type

file: path of the file to be opened as a str type
mode: the mode in which the file to be opened ( 'r', 'w', 'a', 'r+')
encoding: the default type depends on the system. The de facto standard is 'utf-8'.

Some examples of opening the file in the current working directory are as follows.

f1 = open('file_name1', 'r', encoding = 'utf-8')        # file opened in read mode
f2 = open('file_name2', 'w', encoding = 'utf-8')        # file opened in write mode
f3 = open('file_name3', 'r+', encoding = 'utf-8')        # file opened in read and write mode
f4 = open('file_name4', 'a', encoding = 'utf-8')        # file opened in append mode

The open() function returns the file object (eg. f1) on which various functions are invoked to perform operations such as read() and write(). After completing the purpose of opening the file, the opened files must be closed. The function used is close() on the file object. The file_name1 opened to read its content can be closed using the following line of code.

f1.close()    # it closes the file file_name1. 

File read()

It is better to use file functions and file objects along with the 'with' keyword. It automatically closes the file after the completion of the listed file operations. The following code snippet opens the file 'file_name1' using the 'with' keyword to read its contents.

with open('file_name1', 'r', encoding='utf-8') as f:
    file_content = f.read()            

The file read() function reads the entire file and returns it as a string.  The file_content holds everything of the file as a string value. The 'with' keyword automatically closes the file  'file_name1'. This can be verified using the following code line using the Python interpreter.

>>> f.closed()
True                        

The file once closed, must be opened again to perform any operations on that. The following line of code throws a ValueError because it is closed.

>>> f.read()  # throws ValueError because f is closed.

readline()  and readlines() functions

The file contents can be read line by line using the readline() function as follows. Assuming the file is already open and f is the file object,

>>> f.readline()
'first line is this \n'            # returns the read line as a string
>>> f.readline()
'second line is this \n'       # the returned string last character is always \n
....
>>> f.readline()
''                                # empty string indicates no more lines to read

But, there is an efficient and fast code lines in Python to read text line by line. It is as follows.

for text_line in f:
    print(text_line, end=' ')            # prints the read text_line; 

The end argument in the print function is to avoid its newline.

The readlines() function reads all lines from the file and returns as a list with each line as a string list item.

>>> lt =  f.readlines()
>>> print(lt)

The file contents are printed as list items in string form.

file write() function

The write() function argument is a string value. It writes the string to the designated file through file object. The following code lines show all these.

f = open('file_name1', 'w')         # file_name1 opened in write mode

f.write('This is first line')            # string is passed directly
st = 'This is second line.'
f.write(st)                                    # string variable is passed as an argument 

Copy contents from one file to another file:

f1 = open('file_name1', 'r')
f2 = open('file_name2', 'w')

file_content = f1.read()
f2.write(file_content)

seek() and tell() functions

The tell() function is used to find the current position of the cursor from where you can continue to operate. The following code lines help you to understand the tell() function.

>>> f = open('file_name1', 'r')
>>> f.tell() 
0                                    # because nothing is read from the file
>>> f.readline()
>>> f.tell()                    # indicates total number of characters read
15                                               
>>> f.readline()            # read another line
35                   # reflects the total number of characters from the 1st and 2nd line

The seek() function is useful for moving the cursor (reference) to the specified position from the beginning of the file. 

>>> f = open('file_name1', 'r')
>>> f.seek(14)        # move the reference point to the 15th character
>>> f.tell()                # validation for seek() function
15
>>> f.seek(0,2)           # moves the reference to the end of the file

These functions are useful to update the file content.

File operations in binary mode

The set of functions used in text mode can be used in binary mode as well. The character 'b' must be appended to 'r' or 'w'. The data in the file is read as byte objects and written as byte objects. You cannot specify encoding when the file is opened in binary mode

f1 = open('file_name1', 'rb')        # file opened in read binary mode
f2 = open('file_name2', 'rb+')     # file opened in read and write binary mode
f3 = open('file_name3', 'wb+')    # file opened in write binary mode

The binary mode is useful to copy pdf files, image files of various formats into other files. The following code lines show reading bytes from the pdffile.pdf (f1) and writing into new.pdf file (using f2 object).

f1 = open('pdffile.pdf', 'rb')
f2 = open('new.pdf', 'wb')
f2.write(f1.read())
f1.close()
f2.close()

There are many less frequently used functions for file objects. Consult the Python doc if necessary.

Saving variables values in files

You may want to reopen the app from where you left off. To achieve this, the state of the app (program) must be saved to the disk before exiting. This task is part of your program. The variables of the program may have to be stored in the file as part of this. The Python PL has modules and packages for this: the shelve module and the pprint module.

shelve module

The shelve module has many functions which are similar to file operations and a dictionary. The file is opened using open and closed using close functions of the shelve module. The variables values are stored in the file using keys. The same keys are used to read them from the file. The following code snippet helps you to understand the usage of the shelve module for the mentioned task.

import shelve
f1 = shelve.open('var_value')            # creation of shelve oobject
PL = ['C', 'C++', 'Java']                      # variable to save
st = 'programming'
f1['languages'] = PL                        # save variable PL in the file using 'languages' as key
f1['string'] = st
f1.close()

The file var_value file has two variables values stored and now it is closed. When reopened, the values can be read as follows using the keys 'languages' and 'string'.

import shelve
f1 = shelve.open('var_value')
PL = f1['languages']
st = f1['string']
print(PL)                        # verify storage and retrieval of variables
print(st)

Also, if necessary get all keys and variables in list form for any processing using keys() and values() functions of the shelve module respectively.
keys = list(f1.keys())                        # get keys of variables in list form
values = list(f1.values())                    # get values of variables in list form
print(keys)
print(values)                                    # verify all keys and values

f1.close()
After usage, close the file using close() function of the shelve module.

pprint module

The pprint is another module for saving variables values and then retrieving them whenever necessary. The following code lines show how the variable values are stored in the file (pretty.py) using this module. The function pprint.pformat() simply converts its argument to a string type. The list PL is passed as an argument and file object write() function completes saving PL into the file pretty.py

import pprint
f1 = open('pretty.py', 'w')            # pretty.py is used as module
PL = ['C', 'C++', 'Java']
PL_str = pprint.pformat(PL)
f1.write('PL=' + PL_str )        # PL = '['C', 'C++', 'Java']' is saved in pretty.py
f1.close()                        # file closed

Import the file pretty.py (it is a module) and access the saved variable value using its name (key).
lt = pretty.PL
print(lt)

There may be many other functions and packages/and modules for saving variables into the file. Check Python documentation.






No comments:

Post a Comment