Manipulating an Excel file with Python

It's one of those things that sound easy but hardly are. There are a lot of libraries and solutions out there that enable you to manipulate Excel files via code. Some of them are expensive, other are open source.

A lot of languages have a library to help you manipulate Excel files. An obvious language to choose is C#. It has a lot of options, some open source, some paying. Instead of using C#, I opted for Python. Unlike C#, Python is very light weight to start with and does not need a big IDE like Visual Studio to create a small application.

In Python there are multiple options to use. Pandas and xlsxwriter can both write to an Excel file but they both have their limitations as well. Xlsxwriter can only create new Excel files and Pandas does not have the functionality to export a sheet of the Excel file to PDF. So I opted in using the Excel COM Object that ships with Excel.

This comes with some prerequisites, the code will only work on Windows with Excel installed. There is a package to use COM objects called pywin32. It's a bit confusing, there are actually two packages, pywin32 and pypiwin32. But pypiwin32 is abandoned an the pywin32 package is the way to go. It's installed via pip:

pip install pywin32

After installing the package, the code is fairly easy. First let's create an application, this is the main object which allows to manipulate the Excel file.

from win32com import client

excelApp = client.Dispatch("Excel.Application")

Next let's load an excel file to manipulate.

books = excelApp.Workbooks.Open(r"C:\\folder\\test.xlsx")

Writing a value to Excel

Now everything is in place to read or write to the Excel file, to discover the possibilities you can always check the COM object documentation. Let's write a value to the Excel file and save it.

from win32com import client

excelApp = client.Dispatch("Excel.Application")
book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx")

workSheet = book.Worksheets('Sheet1')
workSheet.Cells(1, 1).Value = "test"

book.Save()
book.Close()

This code will write the value test to the cell A1 in the Excel file. Notice that the Cells property has a 1-based index for both the row and the column.

Generating a PDF from the Excel file

There are not a lot of libraries out there that foresee in this functionality. Using Python and the Excel COM object is one of the easiest ways to generate a PDF from an Excel file. I've used this on big Excel files and achieved very good results with it.

from win32com import client

excelApp = client.Dispatch("Excel.Application")
book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx")

sheet = book.Worksheets('Sheet1')
sheet.ExportAsFixedFormat(0, r"C:\\folder\\test.pdf")

book.Close()

ExportAsFixedFormat is the method that does all the work. The first parameter is the type of format and is based on the XlFixedFormatType enum. The PDF will be created on the destination that is specified as second parameter.

Performance

When the use case gets more complex you often bump into performance problems. Here are a few tips that can help you. The first and most obvious tip is to get rid of all the unused functions in your Excel file, make your Excel file as simple as possible and make sure it has no unused formulas.

Disable display alerts

To avoid Excel being blocked by an alert that could be shown to the user we can disable display alerts. If the Excel manipulation happens in the background the alerts won't be visible anyway.

import win32com.client

excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.DisplayAlerts = false

XLSB file type

Saving the Excel as the XSLB binary file type instead of the XLSX xml file type will give a small performance boost and make the file smaller.

Save As XSLB

Compress images

To make the Excel file as small as possible, the images that it contains can be compressed. Go to File > Save As and open the Compress Pictures option via Tools. Choosing a higher compression will result in a smaller file.

Save As Compress Pictures Dialog

Save As Compress Pictures Dialog

Reading and writing values using Range

Instead of using the Cells property to write values one by one, it's also possible to write an entire range to the Excel file. This will be more performant than using Cells. The value that is set to the Range is always an array of arrays. Let's start with a simple case

workSheet.Range('A1:A5').Value = [ [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ]

This will write values to the range A1:A5.

Write To Excel Using Range

To write to multiple columns a value can be added to the arrays for each column.

workSheet.Range('A1:C5').Value = [ [ 1, 10, 100 ], [ 2, 20, 200 ], [ 3, 30, 300 ], [ 4, 40, 400 ], [ 5, 50, 500 ] ]

The result:

Write To Excel Using Range

Disable automatic calculation

If the Excel file you are using has a lot of formulas this might slow down the manipulation considerably. Because the formulas need to be recalculated every time a value changes in the Excel. This automatic recalculating can be disabled. This can be handy if you have a lot of values to fill out and if you are only interested in the end result. First you disable automatic recalculation, you fill out all the values and then you reenable automatic recalculation. This way the recalculation will only happen once even if a lot of values are filled out.

This code will set the calculation to manual, the possible values of the Calculation property can be found in the XlCalculation enumeration.

import win32com.client

excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.Calculation = -4135

To reenable the calculation set the Calculation property to automatic:

excelApp.Calculation = -4105

Turn off screen updating

Turning off screen updating can also speed up the Excel manipulation.

import win32com.client

excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.ScreenUpdating = false

Call rejected by callee

This error can pop up sometimes, it usually happens when Excel is being called consecutive times and it is not fast enough to respond to the calls. Then it will return the error Call rejected by callee. The solution is to retry the call whenever this error occurs, most of the time Excel will have finished with the current call and it will be able to handle the next one.

To easily implement this I use a ComWrapper class, the inspiration I got for this class comes from several Stackoverflow questions.

import time

from pywintypes import com_error

class ComWrapper:

    @staticmethod
    def wrap(func, *func_args, **kwargs):
        try:
            return func(*func_args, **kwargs)
        except com_error as e:

            if e.strerror == 'Call was rejected by callee.':
                print('com_error retrying ', e)
                time.sleep(0.5)
                self.wrap(func, *func_args, **kwargs)

            raise

This class will wrap an existing method and when the pywin com_error with the message Call was rejected by callee. occurs it will wait for half a second and retry the same function with the same arguments.

If we want to wrap writing a value to a cell, this is how it is done:

def writeValue(workSheet, rowNumber, columnNumber, value):
    workSheet.Cells(rowNumber, columnNumber).Value = value

ComWrapper.wrap(writeValue, workSheet, 1, 1, "test")

First the property to set a certain value is wrapped in the writeValue method. This way the method can be passed to the ComWrapper. The first argument for the wrapper class is the method itself. Note that we are not calling that method, we pass it to the wrapper which will call it. Then all arguments that are normally passed to the method itself are passed to wrapper as well. The wrapper will retry the method call as long as the Call was rejected by callee. error is thrown by Excel.