It's one of those things that sound easy but hardly are. There are a lot of libraries and solutions out there that enable you to manipulate Excel files via code. Some of them are expensive, other are open source.
A lot of languages have a library to help you manipulate Excel files. An obvious language to choose is C#. It has a lot of options, some open source, some paying. Instead of using C#, I opted for Python. Unlike C#, Python is very light weight to start with and does not need a big IDE like Visual Studio to create a small application.
In Python there are multiple options to use. Pandas and xlsxwriter can both write to an Excel file but they both have their limitations as well. Xlsxwriter can only create new Excel files and Pandas does not have the functionality to export a sheet of the Excel file to PDF. So I opted in using the Excel COM Object that ships with Excel.
This comes with some prerequisites, the code will only work on Windows with Excel installed. There is a package to use COM objects called
pywin32. It's a bit confusing, there are actually two packages,
pypiwin32 is abandoned an the
pywin32 package is the way to go. It's installed via
pip install pywin32
After installing the package, the code is fairly easy. First let's create an application, this is the main object which allows to manipulate the Excel file.
from win32com import client excelApp = client.Dispatch("Excel.Application")
Next let's load an excel file to manipulate.
books = excelApp.Workbooks.Open(r"C:\\folder\\test.xlsx")
Writing a value to Excel
Now everything is in place to read or write to the Excel file, to discover the possibilities you can always check the COM object documentation](<https://docs.microsoft.com/en-us/visualstudio/vsto/excel-object-model-overview?redirectedfrom=MSDN>). Let's write a value to the Excel file and save it.
from win32com import client excelApp = client.Dispatch("Excel.Application") book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx") workSheet = book.Worksheets('Sheet1') workSheet.Cells(1, 1).Value = "test" book.Save() book.Close()
This code will write the value
test to the cell A1 in the Excel file. Notice that the
Cells property has a 1-based index for both the row and the column.
Generating a PDF from the Excel file
There are not a lot of libraries out there that foresee in this functionality. Using Python and the Excel COM object is one of the easiest ways to generate a PDF from an Excel file. I've used this on big Excel files and achieved very good results with it.
from win32com import client excelApp = client.Dispatch("Excel.Application") book = excelApp.Workbooks.open(r"C:\\folder\\test.xlsx") sheet = book.Worksheets('Sheet1') sheet.ExportAsFixedFormat(0, r"C:\\folder\\test.pdf") book.Close()
ExportAsFixedFormat is the method that does all the work. The first parameter is the type of format and is based on the
XlFixedFormatType enum. The PDF will be created on the destination that is specified as second parameter.
When the use case gets more complex you often bump into performance problems. Here are a few tips that can help you. The first and most obvious tip is to get rid of all the unused functions in your Excel file, make your Excel file as simple as possible and make sure it has no unused formulas.
Disable display alerts
To avoid Excel being blocked by an alert that could be shown to the user we can disable display alerts. If the Excel manipulation happens in the background the alerts won't be visible anyway.
import win32com.client excelApp = win32com.client.Dispatch("Excel.Application") excelApp.DisplayAlerts = false
XLSB file type
Saving the Excel as the XSLB binary file type instead of the XLSX xml file type will give a small performance boost and make the file smaller.
To make the Excel file as small as possible, the images that it contains can be compressed. Go to
File > Save As and open the
Compress Pictures option via
Tools. Choosing a higher compression will result in a smaller file.
Reading and writing values using
Instead of using the
Cells property to write values one by one, it's also possible to write an entire range to the Excel file. This will be more performant than using
Cells. The value that is set to the Range is always an array of arrays. Let's start with a simple case
workSheet.Range('A1:A5').Value = [ [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ] ]
This will write values to the range
To write to multiple columns a value can be added to the arrays for each column.
workSheet.Range('A1:C5').Value = [ [ 1, 10, 100 ], [ 2, 20, 200 ], [ 3, 30, 300 ], [ 4, 40, 400 ], [ 5, 50, 500 ] ]
Disable automatic calculation
If the Excel file you are using has a lot of formulas this might slow down the manipulation considerably. Because the formulas need to be recalculated every time a value changes in the Excel. This automatic recalculating can be disabled. This can be handy if you have a lot of values to fill out and if you are only interested in the end result. First you disable automatic recalculation, you fill out all the values and then you reenable automatic recalculation. This way the recalculation will only happen once even if a lot of values are filled out.
This code will set the calculation to manual, the possible values of the
Calculation property can be found in the XlCalculation enumeration.
import win32com.client excelApp = win32com.client.Dispatch("Excel.Application") excelApp.Calculation = -4135
To reenable the calculation set the
Calculation property to automatic:
excelApp.Calculation = -4105
Turn off screen updating
Turning off screen updating can also speed up the Excel manipulation.
import win32com.client excelApp = win32com.client.Dispatch("Excel.Application") excelApp.ScreenUpdating = false
Call rejected by callee
This error can pop up sometimes, it usually happens when Excel is being called consecutive times and it is not fast enough to respond to the calls. Then it will return the error
Call rejected by callee. The solution is to retry the call whenever this error occurs, most of the time Excel will have finished with the current call and it will be able to handle the next one.
import time from pywintypes import com_error class ComWrapper: @staticmethod def wrap(func, *func_args, **kwargs): try: return func(*func_args, **kwargs) except com_error as e: if e.strerror == 'Call was rejected by callee.': print('com_error retrying ', e) time.sleep(0.5) self.wrap(func, *func_args, **kwargs) raise
This class will wrap an existing method and when the pywin
com_error with the message
Call was rejected by callee. occurs it will wait for half a second and retry the same function with the same arguments.
If we want to wrap writing a value to a cell, this is how it is done:
def writeValue(workSheet, rowNumber, columnNumber, value): workSheet.Cells(rowNumber, columnNumber).Value = value ComWrapper.wrap(writeValue, workSheet, 1, 1, "test")
First the property to set a certain value is wrapped in the
writeValue method. This way the method can be passed to the
ComWrapper. The first argument for the wrapper class is the method itself. Note that we are not calling that method, we pass it to the wrapper which will call it. Then all arguments that are normally passed to the method itself are passed to wrapper as well. The wrapper will retry the method call as long as the
Call was rejected by callee. error is thrown by Excel.