Modifying or Updating DataFrames returned to Excel – PyXLL

Using PyXLL you can return a DataFrame to Excel as a range of data by writing a worksheet function. This works really well for returning data to Excel, but because it's a worksheet function the data cannot be edited after it has been returned.

This makes a lot of sense when you think about it. If you could edit the data it would cause all sorts of problems, like what would happen when the function was called again? How could you tell from looking at the spreadsheet which values were returned from the function, and which were entered manually?

Some add-ins solve this problem by writing the values back to Excel using a macro. If you write a macro that copies the values back to Excel instead of using a worksheet function then those values can then be overriden in Excel.

You can do this using PyXLL, but it doesn't solve the problems of what happens if the macro is called again overwriting the manually entered values, or how can you tell what are original values and what are overrides? This approach can make for error prone spreadsheets.

A better approach

A much better approach is to have your function return a DataFrame and then have an extra function that takes that DataFrame and some overrides and produces a modified DataFrame. This way you can provide your own override values to the results of your function, but in a way that Excel can still track everything in its dependency graph so if you recalculate the sheet you will get the same results.

For example, imagine you had the following Python functions:

@xl_func(": dataframe<index=True>")
def load_data():
    df = load_dataframe(...)
    return df

@xl_func("dataframe<index=True> df, var[][] updates: dataframe<index=True>")
def update_dataframe(df, updates):
    # clone the dataframe and apply the updates
    new_df = df.copy()

    # updates is a table of (index, column, value)
    for idx, col, value in updates:
        new_df.loc[idx, col] = value

    # return the updated dataframe
    return new_df

Now in Excel you can call the first function "load_data" and have a separate table of overrides (a list of index, column and value), and use "update_dataframe" to apply the overides to the original data.

A1: =your_original_function(...) // returns the original data
A10: =apply_oviderrides(#A1, ...) // returns the merged data

It can be cumbersome to pass around large DataFrames between functions in Excel. PyXLL can also pass around Python objects (including pandas DataFrames) as object handles which only occupy a single cell but can still be passed to other Python functions. Changing the above code to work this way will reduce the amount of data shown in the sheet, and we can add additional functions to examine the DataFrame object handles when we need to.

This can be acheived by changing the return type of our two functions to "object" and adding a new function to expand a DataFrame object handle to an array:

@xl_func(": object")
def load_data():
  # as above

@xl_func("dataframe<index=True> df, var[][] updates: object")
def update_dataframe(df, updates):
    # as above

@xl_func("object: dataframe<index=True>", auto_expand=True)
def df_expand(df):
    if not isinstance(df, pd.DataFrame):
        raise Exception("DataFrame expected")

    # PyXLL will handle the conversion to an Excel range for us
    return df

We still call these functions in the same way, but now we need to call df_expand if we want to see all the values in either the original DataFrame or the modified DataFrame.

A1: =your_original_function(...) // returns the original data as an object
A2: =apply_oviderrides(A1, ...) // returns the merged data as an object
A3: = df_expand(A2) // shows the full updated dataframe

The other way

If you prefer to write your data to Excel from a macro or ribbon function and allow the user to direcly overwrite these written values then you can of course do that using PyXLL.

This is covered by this FAQ article Writing arrays back to Excel.

A better approach

The other way

Related articles