Movatterモバイル変換


[0]ホーム

URL:


How to Encrypt and Decrypt PDF Files in Python

Learn how to add and remove passwords to PDF files using PyPDF4 library, as well as using pyAesCrypt to encrypt and decrypt PDF files in Python
  · · 9 min read · Updated jun 2023 ·PDF File Handling

Unlock the secrets of your code with ourAI-powered Code Explainer. Take a look!

There are many purposes where you want to encrypt your PDF file, one of which is stopping someone from copying your PDF to their computer and making it usable only with a decryption key. With an encrypted PDF file, you can prevent unwanted parties from viewing personal or credential information within a PDF file.

In this tutorial, you will learn how to encrypt PDF files by applying two protection levels:

  • Level 1: Limiting access to the PDF file by adding aDocument Open Password. A Document Open password (also known as a user password) requires a user to type a password in order to open the PDF.
  • Level 2: Encrypting the file using thepyAesCrypt library and by using theAES256-CBC encryption algorithm.

The purpose of this tutorial is to develop a lightweight command-line-based utility, through Python-based modules without relying on external utilities outside the Python ecosystem (e.g.qpdf) in order to secure PDF files in Python.

Download: Practical Python PDF Processing EBook.

Before getting started, let's install the required libraries:

$ pip install PyPDF4==1.27.0 pyAesCrypt==6.0.0

Let's import the necessary libraries in our Python file:

# Import Librariesfrom PyPDF4 import PdfFileReader, PdfFileWriter, utilsimport osimport argparseimport getpassfrom io import BytesIOimport pyAesCrypt

First, let's define a function that checks whether the PDF file is encrypted:

# Size of chunckBUFFER_SIZE = 64*1024def is_encrypted(input_file: str) -> bool:    """Checks if the inputted file is encrypted using PyPDF4 library"""    with open(input_file, 'rb') as pdf_file:        pdf_reader = PdfFileReader(pdf_file, strict=False)        return pdf_reader.isEncrypted

Second, let's make the core function, which is encrypting the PDF file:

def encrypt_pdf(input_file: str, password: str):    """    Encrypts a file using PyPDF4 library.    Precondition: File is not encrypted.    """    pdf_writer = PdfFileWriter()    pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)    if pdf_reader.isEncrypted:        print(f"PDF File {input_file} already encrypted")        return False, None, None    try:        # To encrypt all the pages of the input file, you need to loop over all of them        # and to add them to the writer.        for page_number in range(pdf_reader.numPages):            pdf_writer.addPage(pdf_reader.getPage(page_number))    except utils.PdfReadError as e:        print(f"Error reading PDF File {input_file} = {e}")        return False, None, None    # The default is 128 bit encryption (if false then 40 bit encryption).    pdf_writer.encrypt(user_pwd=password, owner_pwd=None, use_128bit=True)    return True, pdf_reader, pdf_writer

Theencrypt_pdf() function performs the following:

  • It validates that the input PDF file is not encrypted using thePyPDF4 library.
  • It iterates throughout its pages and adds them to apdf_writer object.
  • Encrypts thepdf_writer object using a given password.

Now that we have the function that is responsible for encryption, let's make the opposite, that's decryption:

def decrypt_pdf(input_file: str, password: str):    """    Decrypts a file using PyPDF4 library.    Precondition: A file is already encrypted    """    pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)    if not pdf_reader.isEncrypted:        print(f"PDF File {input_file} not encrypted")        return False, None, None    pdf_reader.decrypt(password=password)    pdf_writer = PdfFileWriter()    try:        for page_number in range(pdf_reader.numPages):            pdf_writer.addPage(pdf_reader.getPage(page_number))    except utils.PdfReadError as e:        print(f"Error reading PDF File {input_file} = {e}")        return False, None, None    return True, pdf_reader, pdf_writer

This function performs the following:

  • It validates that the input PDF file is encrypted usingPyPDF4 library.
  • It decrypts thepdf_reader object using the password (must be the correct one).
  • It iterates throughout its pages and adds them to apdf_writer object.

Let's head to level 2, encrypting the actual file:

def cipher_stream(inp_buffer: BytesIO, password: str):    """Ciphers an input memory buffer and returns a ciphered output memory buffer"""    # Initialize output ciphered binary stream    out_buffer = BytesIO()    inp_buffer.seek(0)    # Encrypt Stream    pyAesCrypt.encryptStream(inp_buffer, out_buffer, password, BUFFER_SIZE)    out_buffer.seek(0)    return out_buffer

By using thepyAesCrypt library, the above function encrypts an input memory buffer and returns an encrypted memory buffer as output.

Get Our Practical Python PDF Processing EBook

Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!

Download EBook

Let's make the file decryption function now:

def decipher_file(input_file: str, output_file: str, password: str):    """    Deciphers an input file and returns a deciphered output file    """    inpFileSize = os.stat(input_file).st_size    out_buffer = BytesIO()    with open(input_file, mode='rb') as inp_buffer:        try:            # Decrypt Stream            pyAesCrypt.decryptStream(                inp_buffer, out_buffer, password, BUFFER_SIZE, inpFileSize)        except Exception as e:            print("Exception", str(e))            return False        inp_buffer.close()    if out_buffer:        with open(output_file, mode='wb') as f:            f.write(out_buffer.getbuffer())        f.close()    return True

In thedecipher_file(), we use thedecryptStream() method frompyAesCrypt module, which accepts input and output buffer, password, buffer size, and file size as parameters, and writes out the decrypted stream to the output buffer.

For more convenient use ofencryption and decryption of files, I suggest you readthis tutorial which uses thecryptography module that is more friendly to Python developers.

Now let's combine our functions into a single one:

def encrypt_decrypt_file(**kwargs):    """Encrypts or decrypts a file"""    input_file = kwargs.get('input_file')    password = kwargs.get('password')    output_file = kwargs.get('output_file')    action = kwargs.get('action')    # Protection Level    # Level 1 --> Encryption / Decryption using PyPDF4    # Level 2 --> Encryption and Ciphering / Deciphering and Decryption    level = kwargs.get('level')    if not output_file:        output_file = input_file    if action == "encrypt":        result, pdf_reader, pdf_writer = encrypt_pdf(            input_file=input_file, password=password)        # Encryption completed successfully        if result:            output_buffer = BytesIO()            pdf_writer.write(output_buffer)            pdf_reader.stream.close()            if level == 2:                output_buffer = cipher_stream(output_buffer, password=password)            with open(output_file, mode='wb') as f:                f.write(output_buffer.getbuffer())            f.close()    elif action == "decrypt":        if level == 2:            decipher_file(input_file=input_file,                          output_file=output_file, password=password)        result, pdf_reader, pdf_writer = decrypt_pdf(            input_file=input_file, password=password)        # Decryption completed successfully        if result:            output_buffer = BytesIO()            pdf_writer.write(output_buffer)            pdf_reader.stream.close()            with open(output_file, mode='wb') as f:                f.write(output_buffer.getbuffer())            f.close()

The above function accepts 5 keyword arguments:

  • input_file: The input PDF file.
  • output_file: The output PDF file.
  • password: The password string you want to encrypt with.
  • action: Accepts"encrypt" or"decrypt" actions as string.
  • level: Which level of encryption do you want to use. Setting it to1 means only adding a password during the opening of the PDF file,2 adds file encryption as another layer of security.

Now, let's create a new class that inherits fromargparse.Action to enter a password securely:

class Password(argparse.Action):    """    Hides the password entry    """    def __call__(self, parser, namespace, values, option_string):        if values is None:            values = getpass.getpass()        setattr(namespace, self.dest, values)

It overrides __call__() method and sets thedest variable of thenamespace object to the password that the user enters using thegetpass module.

Next, let's define functions for parsing command-line arguments:

def is_valid_path(path):    """Validates the path inputted and checks whether it is a file path or a folder path"""    if not path:        raise ValueError(f"Invalid Path")    if os.path.isfile(path):        return path    elif os.path.isdir(path):        return path    else:        raise ValueError(f"Invalid Path {path}")def parse_args():    """Get user command line parameters"""    parser = argparse.ArgumentParser(description="These options are available")    parser.add_argument("file", help="Input PDF file you want to encrypt", type=is_valid_path)    # parser.add_argument('-i', '--input_path', dest='input_path', type=is_valid_path,    #                     required=True, help="Enter the path of the file or the folder to process")    parser.add_argument('-a', '--action', dest='action', choices=[                        'encrypt', 'decrypt'], type=str, default='encrypt', help="Choose whether to encrypt or to decrypt")    parser.add_argument('-l', '--level', dest='level', choices=[                        1, 2], type=int, default=1, help="Choose which protection level to apply")    parser.add_argument('-p', '--password', dest='password', action=Password,                        nargs='?', type=str, required=True, help="Enter a valid password")    parser.add_argument('-o', '--output_file', dest='output_file',                        type=str, help="Enter a valid output file")    args = vars(parser.parse_args())    # To Display Command Arguments Except Password    print("## Command Arguments #################################################")    print("\n".join("{}:{}".format(i, j)          for i, j in args.items() if i != 'password'))    print("######################################################################")    return args

Finally, writing the main code:

if __name__ == '__main__':    # Parsing command line arguments entered by user    args = parse_args()    # Encrypting or Decrypting File    encrypt_decrypt_file(        input_file=args['file'], password=args['password'],         action=args['action'], level=args['level'], output_file=args['output_file']    )

Alright, let's test our program. First, let's pass--help to see the arguments:

$ python encrypt_pdf.py --help

Output:

usage: encrypt_pdf.py [-h] [-a {encrypt,decrypt}] [-l {1,2}] -p [PASSWORD] [-o OUTPUT_FILE] fileThese options are availablepositional arguments:  file                  Input PDF file you want to encryptoptional arguments:  -h, --help            show this help message and exit  -a {encrypt,decrypt}, --action {encrypt,decrypt}                        Choose whether to encrypt or to decrypt  -l {1,2}, --level {1,2}                        Choose which protection level to apply  -p [PASSWORD], --password [PASSWORD]                        Enter a valid password  -o OUTPUT_FILE, --output_file OUTPUT_FILE                        Enter a valid output file

Awesome, let's encrypt an example PDF file (get it here):

$ python encrypt_pdf.py bert-paper.pdf -a encrypt -l 1 -p -o bert-paper-encrypted1.pdf

This will prompt for a password twice:

Password: Password:## Command Arguments #################################################file:bert-paper.pdfaction:encryptlevel:1output_file:bert-paper-encrypted1.pdf######################################################################

A new PDF file that is secured with a password will appear in the current working directory, if you try to open it with any PDF reader program, you'll be prompted by a password, like shown in the below image:

Example encrypted PDF file with Password using Python

Obviously, if you enter a wrong password, you won't be able to access the PDF file.

Next, let's decrypt it now:

$ python encrypt_pdf.py bert-paper-encrypted1.pdf -a decrypt -p -l 1 -o bert-paper-decrypted1.pdf

Output:

Password: ## Command Arguments #################################################file:bert-paper-encrypted1.pdfaction:decryptlevel:1output_file:bert-paper-decrypted1.pdf######################################################################

Awesome, you'll notice thebert-paper-decrypted1.pdf appear in your directory that is equivalent to the original (not encrypted).

Conclusion

Notice that if you choose level 2, the entire file will be encrypted, so you need to decrypt it twice, first using level 2 and then level 1.

You need to be aware that locking a PDF file by adding theDocument Open Password can be bypassed using a variety of methods, one of which iscracking the PDF password, checkthis tutorial for how to do it.

You can check the full code of this tutorialhere.

Here are some related PDF tutorials:

Finally, for more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!

Happy coding ♥

Finished reading? Keep the learning going with ourAI-powered Code Explainer. Try it now!

View Full Code Understand My Code
Sharing is caring!



Read Also


How to Compress PDF Files in Python
How to Extract All PDF Links in Python
How to Extract Tables from PDF in Python

Comment panel

    Got a coding query or need some guidance before you comment? Check out thisPython Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!





    Practical Python PDF Processing EBook - Topic - Top


    Join 50,000+ Python Programmers & Enthusiasts like you!



    Tags


    New Tutorials

    Popular Tutorials


    Practical Python PDF Processing EBook - Topic - Bottom

    CodingFleet - Topic - Bottom






    Claim your Free Chapter!

    Download a Completely Free Practical Python PDF Processing Chapter.

    See how the book can help you build handy PDF Processing tools!



    [8]ページ先頭

    ©2009-2025 Movatter.jp