Manipulating an Excel file with Python
It's one of those things that sound easy but hardly are. There are a lot of libraries and solutions out there that enable you to manipulate Excel files via code. Some of them are expensive, other are open source.
A lot of languages have a library to help you manipulate Excel files. An obvious language to choose is C#. It has a lot of options, some open source, some paying. Instead of using C#, I opted for Python. Unlike C#, Python is very light weight to start with and does not need a big IDE like Visual Studio to create a small application.
In Python there are multiple options to use. Pandas and xlsxwriter can both write to an Excel file but they both have their limitations as well. Xlsxwriter can only create new Excel files and Pandas does not have the functionality to export a sheet of the Excel file to PDF. So I opted in using the Excel COM Object that ships with Excel.
This comes with some prerequisites, the code will only work on Windows with Excel installed. There is a package to use COM objects called pywin32
. It's a bit confusing, there are actually two packages, pywin32
and pypiwin32
. But pypiwin32
is abandoned an the pywin32
package is the way to go. It's installed via pip
:
pip install pywin32
After installing the package, the code is fairly easy. First let's create an application, this is the main object which allows to manipulate the Excel file.
from win32com import client
excelApp = client.Dispatch("Excel.Application")
Next let's load an excel file to manipulate.
books = excelApp.Workbooks.Open(r"C:\\folder\\test.xlsx")
Writing a value to Excel
Now everything is in place to read or write to the Excel file, to discover the possibilities you can always check the COM object documentation. Let's write a value to the Excel file and save it.
from win32com import client
excelApp = client.Dispatch("Excel.Application")
book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx")
workSheet = book.Worksheets('Sheet1')
workSheet.Cells(1, 1).Value = "test"
book.Save()
book.Close()
This code will write the value test
to the cell A1 in the Excel file. Notice that the Cells
property has a 1-based index for both the row and the column.
Generating a PDF from the Excel file
There are not a lot of libraries out there that foresee in this functionality. Using Python and the Excel COM object is one of the easiest ways to generate a PDF from an Excel file. I've used this on big Excel files and achieved very good results with it.
from win32com import client
excelApp = client.Dispatch("Excel.Application")
book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx")
sheet = book.Worksheets('Sheet1')
sheet.ExportAsFixedFormat(0, r"C:\\folder\\test.pdf")
book.Close()
ExportAsFixedFormat
is the method that does all the work. The first parameter is the type of format and is based on the XlFixedFormatType
enum. The PDF will be created on the destination that is specified as second parameter.
Performance
When the use case gets more complex you often bump into performance problems. Here are a few tips that can help you. The first and most obvious tip is to get rid of all the unused functions in your Excel file, make your Excel file as simple as possible and make sure it has no unused formulas.
Disable display alerts
To avoid Excel being blocked by an alert that could be shown to the user we can disable display alerts. If the Excel manipulation happens in the background the alerts won't be visible anyway.
import win32com.client
excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.DisplayAlerts = false
XLSB file type
Saving the Excel as the XSLB binary file type instead of the XLSX xml file type will give a small performance boost and make the file smaller.
Compress images
To make the Excel file as small as possible, the images that it contains can be compressed. Go to File > Save As
and open the Compress Pictures
option via Tools
. Choosing a higher compression will result in a smaller file.
Reading and writing values using Range
Instead of using the Cells
property to write values one by one, it's also possible to write an entire range to the Excel file. This will be more performant than using Cells
. The value that is set to the Range is always an array of arrays. Let's start with a simple case
workSheet.Range('A1:A5').Value = [ [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ]
This will write values to the range A1:A5
.
To write to multiple columns a value can be added to the arrays for each column.
workSheet.Range('A1:C5').Value = [ [ 1, 10, 100 ], [ 2, 20, 200 ], [ 3, 30, 300 ], [ 4, 40, 400 ], [ 5, 50, 500 ] ]
The result:
Disable automatic calculation
If the Excel file you are using has a lot of formulas this might slow down the manipulation considerably. Because the formulas need to be recalculated every time a value changes in the Excel. This automatic recalculating can be disabled. This can be handy if you have a lot of values to fill out and if you are only interested in the end result. First you disable automatic recalculation, you fill out all the values and then you reenable automatic recalculation. This way the recalculation will only happen once even if a lot of values are filled out.
This code will set the calculation to manual, the possible values of the Calculation
property can be found in the XlCalculation enumeration.
import win32com.client
excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.Calculation = -4135
To reenable the calculation set the Calculation
property to automatic:
excelApp.Calculation = -4105
Turn off screen updating
Turning off screen updating can also speed up the Excel manipulation.
import win32com.client
excelApp = win32com.client.Dispatch("Excel.Application")
excelApp.ScreenUpdating = false
Call rejected by callee
This error can pop up sometimes, it usually happens when Excel is being called consecutive times and it is not fast enough to respond to the calls. Then it will return the error Call rejected by callee
. The solution is to retry the call whenever this error occurs, most of the time Excel will have finished with the current call and it will be able to handle the next one.
To easily implement this I use a ComWrapper
class, the inspiration I got for this class comes from several Stackoverflow questions.
import time
from pywintypes import com_error
class ComWrapper:
@staticmethod
def wrap(func, *func_args, **kwargs):
try:
return func(*func_args, **kwargs)
except com_error as e:
if e.strerror == 'Call was rejected by callee.':
print('com_error retrying ', e)
time.sleep(0.5)
self.wrap(func, *func_args, **kwargs)
raise
This class will wrap an existing method and when the pywin com_error
with the message Call was rejected by callee.
occurs it will wait for half a second and retry the same function with the same arguments.
If we want to wrap writing a value to a cell, this is how it is done:
def writeValue(workSheet, rowNumber, columnNumber, value):
workSheet.Cells(rowNumber, columnNumber).Value = value
ComWrapper.wrap(writeValue, workSheet, 1, 1, "test")
First the property to set a certain value is wrapped in the writeValue
method. This way the method can be passed to the ComWrapper
. The first argument for the wrapper class is the method itself. Note that we are not calling that method, we pass it to the wrapper which will call it. Then all arguments that are normally passed to the method itself are passed to wrapper as well. The wrapper will retry the method call as long as the Call was rejected by callee.
error is thrown by Excel.